NVIDIA Democratizes AI Infrastructure with Open-Source Resource Allocation Driver
Artificial intelligence has rapidly become a critical workload in modern computing, with Kubernetes emerging as the dominant platform for its deployment, scaling, and management. To enhance the efficiency and transparency of AI infrastructure management for developers worldwide, NVIDIA is donating the NVIDIA Dynamic Resource Allocation (DRA) Driver for GPUs to the Cloud Native Computing Foundation (CNCF). This move, announced at KubeCon Europe in Amsterdam, transitions the driver from vendor governance to full community ownership under the Kubernetes project, fostering broader collaboration and innovation.
“NVIDIA’s deep collaboration with the Kubernetes and CNCF community to upstream the NVIDIA DRA Driver for GPUs marks a major milestone for open source Kubernetes and AI infrastructure,” said Chris Aniszczyk, chief technology officer of CNCF. “By aligning its hardware innovations with upstream Kubernetes and AI conformance efforts, NVIDIA is making high-performance GPU orchestration seamless and accessible to all.”
Simplifying AI Infrastructure Management
Managing GPUs, the engines powering AI, has historically been a complex undertaking. The NVIDIA DRA Driver aims to simplify this process, offering several key benefits:
- Improved Efficiency: The driver enables smarter sharing of GPU resources, maximizing computing power utilization, and supports technologies like NVIDIA Multi-Process Service and NVIDIA Multi-Instance GPU.
- Massive Scale: It natively supports connecting systems, including those utilizing NVIDIA Multi-Node NVlink interconnect technology, essential for training large AI models on NVIDIA Grace Blackwell systems and next-generation infrastructure.
- Flexibility: Developers can dynamically reconfigure hardware to meet evolving needs, adjusting resource allocation on the fly.
- Precision: The software supports fine-tuned requests, allowing users to specify the exact computing power, memory settings, and interconnect arrangements required for their applications.
Expanding Confidential Computing and Open-Source Initiatives
Beyond the DRA Driver, NVIDIA is expanding support for confidential computing by introducing GPU support for Kata Containers, lightweight virtual machines that enhance workload isolation and security. This allows AI workloads to run with increased protection, facilitating the implementation of confidential computing to safeguard sensitive data.
NVIDIA’s commitment to open source extends to several other projects, including NVSentinel, a GPU fault remediation system, and AI Cluster Runtime, an agentic AI framework, both unveiled at GTC last week. The NVIDIA NemoClaw reference stack and NVIDIA OpenShell runtime for secure autonomous agent execution were also recently announced.
The KAI Scheduler, NVIDIA’s AI workload scheduler, has also been onboarded as a CNCF Sandbox project, encouraging broader collaboration and ensuring its evolution aligns with the cloud-native ecosystem. Developers can use and contribute to the KAI Scheduler today.
Industry Collaboration Drives Innovation
NVIDIA is collaborating with industry leaders – including Amazon Web Services, Broadcom, Canonical, Google Cloud, Microsoft, Nutanix, Red Hat, and SUSE – to advance these features for the benefit of the cloud-native community.
“Open source will be at the core of every successful enterprise AI strategy, bringing standardization to the high-performance infrastructure components that fuel production AI workloads,” said Chris Wright, chief technology officer and senior vice president of global engineering at Red Hat. “NVIDIA’s donation of the NVIDIA DRA Driver for GPUs helps to cement the role of open source in AI’s evolution, and we look forward to collaborating with NVIDIA and the broader community within the Kubernetes ecosystem.”
“Open source software and the communities that sustain it are a cornerstone of the infrastructure used for scientific computing and research,” said Ricardo Rocha, lead of platforms infrastructure at CERN. “For organizations like CERN, where efficiently analyzing petabytes of data is essential to discovery, community-driven innovation helps accelerate the pace of science. NVIDIA’s donation of the DRA Driver strengthens the ecosystem researchers rely on to process data across both traditional scientific computing and emerging machine learning workloads.”
Dynamo Ecosystem Expansion
Following the release of NVIDIA Dynamo 1.0, NVIDIA is expanding the ecosystem with Grove, an open-source Kubernetes API for orchestrating AI workloads on GPU clusters. Grove, which enables developers to define complex inference systems in a single declarative resource, is being integrated with the llm-d inference stack for wider adoption within the Kubernetes community.
Developers and organizations can commence using and contributing to the NVIDIA DRA Driver today. Visit the NVIDIA booth at KubeCon to witness live demonstrations of this technology.