Open Consortium Aims to Accelerate AI Infrastructure with Optical Interconnects
A new industry consortium, the Optical Compute Interconnect (OCI) Multi-Source Agreement (MSA) group, has formed with founding members including AMD, Broadcom, Meta, Microsoft, NVIDIA, and OpenAI. This collaboration signals a significant move towards an open ecosystem designed to foster the development of multi-vendor supply chains for advanced optical interconnects crucial for scaling AI infrastructure.
The Need for Optical Interconnects
As large language models (LLMs) advance, traditional copper-based connectivity is reaching its limitations in terms of range, impacting the architecture of AI clusters. The OCI aims to facilitate a transition from copper to optical-based interconnects, overcoming bottlenecks and enabling greater scalability. This shift is driven by the increasing demands for bandwidth and performance in modern AI systems.
OCI Specifications: Power, Latency, and Cost Optimization
The OCI specifications, available at www.oci-msa.org, are engineered for optimal power consumption, low latency, and cost-effectiveness. The technology combines non-return to zero (NRZ) modulation and wavelength division multiplexing (WDM) optical technology, moving towards a silicon-centric connectivity model. This tighter integration of optics with computing and networking silicon promises substantial improvements in bandwidth density and system scalability, while maintaining power efficiency.
Interoperability and a Plug-and-Play Ecosystem
By establishing interoperable optical interface protocols, the OCI MSA aims to create a “plug-and-play” ecosystem. This open specification allows hyperscalers to connect multiple high-end processor engines (XPUs) and high-end switches via a shared optical physical layer (PHY), combining leading-edge computing with advanced optical technologies. A standardized roadmap is intended to reduce integration risks and shorten development cycles for the entire AI rack supply chain.
Scalability and Future Roadmap
The OCI MSA provides a scalable roadmap for optical interconnects, supporting multi-vendor deployment across multiple generations of hardware. Key features include:
- Standardized Interface Density: Supporting OCI GEN1 4λ × 50Gbps NRZ (200Gbps per direction) and OCI GEN2 400Gbps per direction bidirectional (BiDi) technology, reaching up to 800Gbps per fiber.
- Massive Scalability: Plans to increase wavelengths and data rates up to 3.2Tbps per fiber and beyond, accommodating larger numbers of GPUs and increased bandwidth per GPU.
- Interoperable Form Factor: Supporting pluggable optics, on-board optics, and co-packaged optics (CPO).
- Efficiency at Scale: Delivering optical solutions that meet performance, power, and cost targets comparable to copper connectivity, with extended reach.
Industry Support and Vision
Founding members have expressed strong support for the OCI MSA:
- AMD: Brian Amick, Senior Vice President Technology & Engineering, highlighted the increasing need for scale-up optical interconnects to support large AI systems.
- Broadcom: Near Margalit, Vice President & General Manager, Optical Systems Division, emphasized Broadcom’s commitment to pushing the OCI specification forward leveraging their CPO platform.
- Meta: Dan Rabinovitsj, Vice President of Hardware Systems, stated the urgent need for technology to overcome power and cost constraints in AI cluster design.
- Microsoft: Saurabh Dighe, Corporate Vice President, Azure Systems and Architecture, noted that scale-up focused optical technologies are foundational for building scalable, high-performance AI computing domains.
- NVIDIA: Gilad Shainer, Senior Vice President of Networking, stated NVIDIA’s commitment to building common optical standards for global AI infrastructure.
- OpenAI: Richard Ho, Head of Hardware, added that continued AI improvements depend on scaling AI supercomputers with more petaflops, greater memory bandwidth, and higher network bandwidth, requiring longer reach and which OCI MSA will help enable.
The OCI MSA represents a collaborative effort to address the growing demands of AI infrastructure, paving the way for more powerful, efficient, and scalable AI systems.