GreenBoost: Expand NVIDIA GPU Memory with System RAM & NVMe SSDs

by Anika Shah - Technology
0 comments

GreenBoost: Expanding GPU Memory with System RAM and NVMe for AI Inference

A new open-source Linux kernel module, GreenBoost, aims to tackle a growing challenge in the world of artificial intelligence: the limited video memory (vRAM) of even high-conclude graphics cards. Developed by Ferran Duarri, GreenBoost allows NVIDIA GPUs to leverage system RAM and NVMe storage to effectively increase memory capacity, enabling the execution of larger AI models that would otherwise be impossible to run.

The Problem: VRAM Constraints and Large Language Models

Large Language Models (LLMs) are rapidly increasing in size, demanding ever-greater amounts of vRAM. Many users find themselves unable to run these models locally due to their GPU’s memory limitations. While techniques like model quantization and offloading layers to the GPU exist, they often reach with trade-offs – reduced model quality or decreased performance. GreenBoost offers an alternative approach by creating a CUDA caching layer that transparently expands accessible memory.

How GreenBoost Works

GreenBoost operates as a complementary driver to NVIDIA’s official Linux kernel drivers, not a replacement. It consists of two main components:

  • Kernel Module (greenboost.ko): This module allocates pinned DDR4 pages in system RAM and makes them accessible to the GPU as CUDA external memory. The PCIe 4.0 x16 interface facilitates data transfer at speeds of approximately 32 GB/s. A sysfs interface (/sys/class/greenboost/greenboost/pool_info) provides real-time usage monitoring. A watchdog thread monitors RAM and NVMe pressure, alerting the user before potential issues arise.
  • CUDA Shim (libgreenboost_cuda.so): This library, injected via LD_PRELOAD, intercepts CUDA memory allocation functions (cudaMalloc, cudaMallocAsync, etc.). Small allocations are passed through to the standard CUDA runtime, while larger allocations—such as those required for KV caches and overflowing model weights—are redirected to the kernel module. The shim also intercepts dlsym to ensure compatibility with applications like Ollama that resolve GPU symbols internally.

Benefits of GreenBoost

  • Increased Memory Capacity: Enables running larger AI models that exceed the GPU’s vRAM.
  • Transparency: Doesn’t require modifications to existing CUDA software.
  • Multi-Tiered Approach: Leverages both system RAM and NVMe storage for flexible memory expansion.
  • Performance Considerations: Aims to mitigate performance drops associated with traditional offloading methods by maintaining CUDA coherence.

Availability and Licensing

GreenBoost is an open-source project licensed under the GPLv2. The experimental code is available on GitLab. The project was announced on March 5, 2026, in the NVIDIA Developer Forums and detailed by Phoronix on March 14, 2026.

Looking Ahead

GreenBoost represents a promising step towards making large AI models more accessible to a wider range of users. As LLMs continue to grow in complexity, solutions like GreenBoost will be crucial for enabling local AI inference and reducing reliance on cloud-based services.

Related Posts

Leave a Comment