Microsoft, Meta & Nvidia Back New Optical Interconnect for AI Scale-Up

by Anika Shah - Technology
0 comments

Hyperscalers Collaborate on Optical Compute Interconnect for AI Clusters

As artificial intelligence (AI) systems grow in complexity and scale, the demand for faster and more efficient interconnects is increasing. To address this require, Microsoft, Meta, and OpenAI have joined forces with hardware designers AMD, Broadcom, and Nvidia to develop a protocol-agnostic scale-up interconnection technology for AI clusters, centered around optical connectivity.

The Optical Compute Interconnect (OCI) Multi-Source Agreement (MSA)

The companies have established the Optical Compute Interconnect (OCI) Multi-Source Agreement (MSA) group to define an open optical connectivity specification for scale-up interconnections within large AI systems and racks. This initiative aims to enable hyperscalers to utilize optical cables instead of copper to connect more accelerators at higher speeds and with predictable power consumption. Microsoft and OpenAI have a long-standing partnership focused on advancing AI responsibly.

Key Features of the OCI Technology

  • Protocol-Agnostic Design: The OCI technology will support various protocols, including UALink (AMD and Broadcom) and NVLink (Nvidia), allowing for flexibility and interoperability.
  • Common Optical Physical Layer (PHY): The consortium will develop a common optical PHY based on Non-Return-to-Zero (NRZ) signaling and wavelength-division multiplexing (WDM).
  • Scalable Bandwidth: The initial specification targets 200 Gb/s per direction (four wavelengths × 50 Gb/s), with a roadmap to scale to 800 Gb/s per fiber and beyond, potentially reaching 3.2 Tb/s per fiber.
  • Support for Various Optical Modules: The technology will accommodate pluggable optical modules, on-board optics, and co-packaged optics (CPO) integrated directly with compute silicon.

Driving Forces Behind the Collaboration

The growing need for optical scale-up interconnects to support large AI systems is a key driver behind this collaboration. Tom’s Hardware reports that this approach will enable different processors and interconnect protocols to operate over the same fiber infrastructure, offering hyperscalers greater flexibility.

Hyperscaler-Driven Approach

Unlike many industry consortia led by independent hardware vendors, the OCI MSA is driven by hyperscalers – Microsoft, Meta, and OpenAI. This hyperscaler-focused approach is a defining characteristic of the group. The OCI MSA focuses specifically on short-reach links connecting accelerators and switches within a scale-up domain, rather than attempting to standardize a broader set of technologies.

Faster Development Through MSA Structure

As a Multi-Source Agreement (MSA) group, the OCI MSA is designed for rapid development and alignment on interfaces, enabling quicker deployment of interoperable products compared to traditional standards bodies like JEDEC or the Ultra Ethernet Consortium. OpenAI and Microsoft continue to collaborate closely on research, engineering, and product development.

Industry Support

“The growing need for optical scale-up interconnect to support large AI systems later this decade is clear,” said Brian Amick, Senior Vice President, Technology & Engineering at AMD. Broadcom and Nvidia are also founding members, emphasizing the importance of a common optical standard for global AI infrastructures.

Looking Ahead

The OCI MSA represents a significant step towards enabling the next generation of AI systems. By fostering an open and collaborative ecosystem for optical interconnects, the group aims to simplify system integration, reduce development risk, and accelerate the deployment of innovative AI hardware. The partnership between Microsoft and OpenAI remains strong, with their commercial and revenue share relationship unchanged. Recent reports suggest a shifting AI landscape, but the core partnership remains intact.

Related Posts

Leave a Comment