Xiaomi has officially launched the MiMo-V2.5-Pro-UltraSpeed, an artificial intelligence model capable of breaking the 1,000 tokens-per-second decode speed threshold for a 1-trillion-parameter architecture. The model, developed in collaboration with TileRT, is available via an application-based trial period for developers and enterprises from June 9 to June 23, 2026.
How the MiMo-V2.5-Pro-UltraSpeed Architecture Functions
The MiMo-V2.5-Pro-UltraSpeed model achieves its performance by pushing inference speeds to approximately 1,200 tokens per second, according to Xiaomi’s technical documentation. This milestone represents a shift in large-scale model deployment, moving from systems where users wait for generation to real-time, frictionless interaction. Xiaomi notes that this speed is achieved on a standard 8-GPU commodity node, challenging the necessity for specialized, high-cost infrastructure to run trillion-parameter models at competitive speeds.
Accessing the Trial Period
Xiaomi has opened a limited-time window for developers and businesses to test the model. The trial period runs from June 9, 2026, until 23:59 Beijing Time (UTC+8) on June 23, 2026.
* Application Process: Interested users must submit an application at platform.xiaomimimo.com/ultraspeed.
* Approval Criteria: Xiaomi states that it will prioritize enterprises and professional developers with documented business needs. Submission does not guarantee access due to limited high-speed inference resources.
* Trial Benefits: Approved users receive free chat access for the duration of the two-week window.
Pricing and Model Strategy

While the standard MiMo-V2.5 model series remains available for general use, the UltraSpeed variant is positioned as a premium offering. According to Xiaomi, the UltraSpeed API is priced at three times the cost of the standard MiMo-V2.5-Pro model. The company justifies this pricing structure by citing a roughly 10-fold increase in generation speed compared to its predecessor. This promotional pricing is specifically for the API, and Xiaomi has clarified that its standard Token Plan does not cover the UltraSpeed model.
Why Real-Time AI Inference Matters
The push for higher tokens-per-second (TPS) speeds in large-scale models is driven by the demand for responsive AI agents. As models grow to the 1-trillion-parameter scale, latency often becomes a barrier to fluid human-AI collaboration. By achieving 1,000 TPS, Xiaomi aims to transition its AI from a tool requiring asynchronous waiting to an extension of the user’s thought process. This development aligns with broader trends in the industry where hardware optimization—specifically through partnerships like the one with TileRT—is becoming as critical as the model training process itself.
For those seeking long-term integration, Xiaomi has directed professional inquiries regarding business partnerships to their dedicated business-mimo email address. Users who do not qualify for the limited-time trial are encouraged to continue using the established MiMo-V2.5 model series for their ongoing projects.