GPU
43 sessions
13:43 Project Lightning Talk: Hami: Dynamic, Smart, Stable GPU-Sharing Middleware In Kubernetes Elicium 2 · Mengxuan Li → 13:50 Project Lightning Talk: Still Burning GPUs On Debugging? Scale AI In One Line Elicium 2 · Anna Kramar → 15:34 Project Lightning Talk: From Idle to Ideal: Cross‑Cluster GPU Sharing with CoHDI Elicium 2 · Takao Indoh → 08:37 Keynote: Rules of the Road for Shared GPUs: AI Inference Scheduling at Wayve Hall 12 · Mukund Muralikrishnan → 08:49 Keynote: Making Kubernetes for AI Optimized and Reproducible ▶ Hall 12 · Nathan Taber, Mark Chmarny → 10:15 No Pain No Drain: Lessons From Node Drains at Scale Forum · Ryan Hallisey, Natalie Bandel → 11:00 Breaking the Monolith: Decomposing and Governing Giant LLM Jobs Across Clusters Elicium 2 · Kevin Wang → 11:00 Intelligent Routing for Optimized Inference Hall 7 | Room B · Antonio Berben, Felipe Vicens → 13:30 Cloud Native Theater | Istio Day: Running State of the Art Inference with Istio and LLM-D Hall 1-5 | Tram Zone | Cloud Native Theater · Jackie Maertens → 14:15 Amplifying End User Voices: Platform Architects on the Future of Kubernetes Hall 7 | Room C · Rajas Kakodkar, Zach Shepherd, Kevin Klues, Elias Tarn, Dawn Chen → 14:15 To Swap or Not To Swap: Memory Management Design Patterns for AI Workloads in Kubernetes 1.34+ Amtrium 1+2 · Nic Vermande → 15:15 Lessons Learned Orchestrating Multi-Tenant GPUs on OpenShift AI with NVIDIA KAI (G/H200) Hall 8 | Room F · Luca Berton → 15:15 SIG API Machinery: SIG Updates and Deep Dive in the AI/ML Era E103-105 · Stefan Schimanski → 15:15 Towards Building an Open Source AI Reference Stack for EU Sovereign Cloud Hall 7 | Room C · Madhav Bhargava, Sanjay Chatterjee → 15:15 🚨 Contribfest: Getting Started in the Tinkerbell Playground G107 · Jacob Weinstock → 16:00 SIG Network: The State of Networking for AI on Kubernetes E103-105 · David Martin, Haiyan Meng, Bowei Du, Kellen Swain, Nadia Pinaeva → 16:15 Gold Sponsor In-Booth Demos Hall 1-5 | Solutions Showcase → 10:00 GPUs on Kubernetes: What Actually Happens When You Request Nvidia.com/gpu: 1 Hall 8 | Room G · Gulcan Topcu, Daniele Polencic → 10:00 Hacking GPU Observability: eBPF & Ephemeral Containers in Action on Kubernetes Hall 8 | Room F · Brandon Kang → 10:00 Serverless GPUs in Production: How Cerebrium Built a Globally Efficient Low-Latency AI Platform with Knative F002-005 · Dave Protasowski, Elijah Roussos → 12:15 Sponsored Demo: Building cross-cloud AI inference on Kubernetes with OSS Hall 1-5 | Tram Zone | Demo Theater → 12:15 🪧 Poster Session: Kubernetes as the Universal GPU Control Plane for AI Workloads Hall 1-5 | Gouda Zone | Poster Pavilion · Satyam Soni, Rudraksh Karpe → 13:15 Improving Pod Disruption and Node Lifecycle F002-005 · Filip Křepinský, Lucy Sweet, Ryan Hallisey → 13:15 📚 Tutorial: DRA-matically Simple: On-Demand GPUs for MLOps Elicium 1 · Doug Smith, Miguel Duarte Barroso → 14:00 Peeking Into the GPU Black Box: Continuous Profiling on Kubernetes With eBPF Auditorium · Zahari Dichev → 15:00 Optimizing LLM Inference for the Rest of Us F002-005 · Abdel Sghiouar → 15:45 Explore TAG Workloads Foundation: Advancing Cloud Native Execution From Core Runtime To Applications F002-005 · Stephen Rust, Yuan Tang, Marlow Warnicke, Kante Yin → 16:30 Let Your Network Speak! G102-103 · Nadia Pinaeva, Joel Takvorian → 16:30 Pay Less for More: A Practitioner's Playbook for Kubernetes Autoscaling Elicium 2 · Malgorzata Widelicka, Lukasz Ogrodowczyk → 16:30 The Latest in GPU, TPU, NIC and Other Device Support - WG Device Management E106-108 · John Belamaric, Patrick Ohly → 16:34 ⚡ Lightning Talk: The $100K GPU Mystery: Why Your AI Training Dies at 99% Auditorium · Michael Ifeanyi → 10:00 Achieving Resilient Multi-Cluster AI Inference on Kubernetes With Karmada and KubeRay Auditorium · Wei-Cheng Lai, Han-Ju Chen → 10:45 From Idle to Savings: Building a Global Scheduler for Cost‑Efficient Data Processing on K8s Hall 7 | Room A · Rainie Li, Ang Zhang → 10:45 Cloud Native Theater | KubeVirt Summit: KubeVirt on GB200: Virtualizing a Rack-Scale Supercomputer Hall 1-5 | Tram Zone | Cloud Native Theater · Fan Zhang → 11:55 Cloud Native Theater | KubeVirt Summit: Measuring KubeVirt Performance and Scale with KWOK Hall 1-5 | Tram Zone | Cloud Native Theater · Sreeja Varnam → 12:45 Cloud Native at the Far(m) Edge: Running Kubernetes and AI on Tractors Auditorium · Mauro Morales, Jordan Karapanagiotis → 12:45 KubeVirt's Evolution: Governance, Features, and Community Growth E103-105 · Sreeja Varnam, Luboslav Pivarc → 13:30 BoF: Infrastructure Optimization for GPUs / Inference / Training / Networking G106 → 13:30 Making Topology-Aware Scheduling Practical for AI Workloads: From Discovery to Simulation at Scale Hall 8 | Room D · Weizhou Lan → 13:30 Optimizing Error Recovery for Cost-Efficient Distributed AI Model Training with Kubernetes Elicium 2 · Radostin Stoyanov, Andrey Velichkevich, Viktória Spišáková → 13:30 Virtualizing Large Scale GPU Cluster for Sovereign AI: Petasus AI Cloud Journey with Kubernetes Auditorium · Jian Li → 14:15 Collisions in the Dark: Illuminating the 95% of Kubeflow You Can't See Hall 7 | Room A · Amine Lahouel, Laura Llinares → 14:15 K8s-sigs NFD × SYLVA: Declarative Image-to-Node Compatibility for Telco Clouds. Hall 8 | Room D · Eduardo Arango Gutierrez, Chaoyi Huang →