OpenAI GPT-5.3-Codex-Spark: Faster AI Coding with Cerebras Chips

by Anika Shah - Technology March 2, 2026

March 2, 2026 0 comments

OpenAI’s GPT-5.3-Codex-Spark: A Leap Towards Real-Time AI Coding with Cerebras

In a significant move diversifying its hardware strategy, OpenAI has launched GPT-5.3-Codex-Spark, its first production AI model deployed on Cerebras Systems’ wafer-scale chips instead of traditional Nvidia GPUs. The new model is designed to deliver improved throughput and low-latency, enabling a real-time, interactive coding experience, according to OpenAI.

A New Era of Interactive Coding

Codex-Spark runs at over 1,000 tokens per second, representing a roughly 15x speed increase compared to earlier versions, making live coding assistance and rapid iteration significantly more responsive. OpenAI designed the model “specifically for working with Codex in real-time—making targeted edits, reshaping logic, or refining interfaces and seeing results immediately,” as stated in a Cerebras blog post.

Optimized for Speed and Responsiveness

To enable real-time coding, OpenAI optimized Codex-Spark for low latency and interactive coding workflows rather than deep reasoning or general-purpose tasks. Despite this focus on speed, the model retains the ability to handle long-running processes, operating for “hours, days, and weeks without intervention.”

Performance Benchmarks

GPT-5.3-Codex-Spark demonstrated its performance on SWE-Bench Pro and Terminal-Bench 2.0, benchmarks tailored for software engineering tasks. It achieved results comparable to GPT-5.1-Codex-mini and GPT-5.3-Codex, but in a fraction of the time. OpenAI also notes that end-to-end improvements implemented to reduce latency across the full request-response pipeline will benefit all their models.

Under the Hood: Technical Enhancements

OpenAI streamlined the process of streaming responses from client to server and back, rewrote key parts of its inference stack, and reworked session initialization to ensure faster initial token appearance and sustained responsiveness during iteration. These enhancements included the introduction of a persistent WebSocket connection and optimizations in the Responses API. These improvements reduced per client/server roundtrip overhead by 80%, per-token processing time by 30%, and time-to-first-token by 50%.

The Cerebras Partnership and Wafer Scale Engine

Codex-Spark runs on Cerebras’ Wafer Scale Engine 3 (WSE-3) accelerators, which are particularly suited to low-latency, high-speed inference. This marks OpenAI’s first production deployment on Cerebras hardware, signaling a strategic diversification from its long-standing reliance on Nvidia. However, OpenAI clarified that this does not represent a departure from GPUs as the core of their training and inference pipeline, and that Cerebras accelerators can be combined with GPUs to leverage the strengths of both architectures.

Community Response and Considerations

The announcement sparked discussion online. Some users emphasized a preference for accuracy over speed, noting that waiting for more reliable results can be preferable. Others pointed out that the cumulative cost of faster iterations could potentially offset the benefits of speed. One user on X.com, Nicholas Van Landschoot, observed that the speed improvements may not be as dramatic as claimed in practical benchmarks.

Future Developments

Codex-Spark currently provides a 128k context window and supports text-only input. OpenAI plans to introduce faster models featuring larger contexts based on usage insights gathered from the developer community.

Agents AI development Large language models ML & Data Engineering open ai codex spark Open API Initiative

OpenAI GPT-5.3-Codex-Spark: Faster AI Coding with Cerebras Chips

OpenAI’s GPT-5.3-Codex-Spark: A Leap Towards Real-Time AI Coding with Cerebras

A New Era of Interactive Coding

Optimized for Speed and Responsiveness

Performance Benchmarks

Under the Hood: Technical Enhancements

The Cerebras Partnership and Wafer Scale Engine

Community Response and Considerations

Future Developments

Varna Malls Evacuated: Bomb Threats Reported in Bulgaria

Planarian Regeneration: How Stem Cells Rebuild Organs Correctly | New Study

Related Posts

Leave a Comment Cancel Reply