Keynote: From Inference to Agents: Where Open Source AI Is Headed

Overview

This panel was one of those sessions that ties the whole conference together. They laid out a clear roadmap for where open-source AI infrastructure is headed — and more importantly, where the gaps still are.

The core “point”: the AI market is splitting into three distinct tracks, and the cloud-native community needs to own the infrastructure layer before proprietary stacks lock everyone.

The Three Tracks

Three areas the community should focus on:

Robust open-source models — the reasoning and capabilities layer
Agentic loops — moving from laptop-scale coding agents to secure, governed deployments
The public-domain inference stack — how we actually operationalize and serve AI at scale

The third one is where the Kubernetes community has the most to contribute, and the panel spent most of their time here. The stack they described is already taking shape:

At the base sits vLLM as the single-node inference engine, then LLMD, KServe, and KGateway handle cluster-scale serving and routing on top of Kubernetes. The key point: this is a community-built stack from the ground up, deliberately independent from hardware vendors or big model providers dictating the infrastructure.

Key impressions

Mark Collier made a strong case that the pace of AI innovation is challenging how open-source communities traditionally collaborate. The CNCF and PyTorch Foundation need to work together and break down silos — not just coexist.

Lin Sun raised what I think is the most subestimated challenge right now: we’re all running AI coding agents on our laptops (Cursor, Copilot, Claude Code, Codex), but nobody has figured out how to deploy these agents at enterprise scale with proper security and governance. She hinted at Istio and service meshes as potential building blocks for agentic networking, which makes a lot of sense.

What I Would Have Liked More Of

The panel painted a compelling vision but stayed at a high level, understandable considering the keynote model.

vLLM — the most widely used open-source inference engine, hosted by the PyTorch Foundation
LLMD — newly announced CNCF project for cluster-scale inference, tightly integrated with vLLM
KServe — model serving on Kubernetes with autoscaling and multi-framework support
KGateway — Kubernetes-native API gateway evolving alongside the inference stack
Ray — distributed compute framework used alongside vLLM by companies like Uber to train and serve thousands of models
Istio — service mesh that could provide security and governance layers for agentic networking

Keynote: From Inference to Agents: Where Open Source AI Is Headed

Overview

The Three Tracks

Key impressions

What I Would Have Liked More Of

Related Projects