Nvidia Is Preparing a New Generation of GPUs to Support Millions of Contexts
13:36, 10.09.2025
Nvidia has unveiled the Rubin CPX graphics processor, designed specifically for language and multimodal models that need to store and analyze huge amounts of data. The chip is optimized to process contexts of over 1 million tokens, a figure that far exceeds the capabilities of current systems.
Disaggregated Inference Architecture
The key innovation of Rubin CPX is the use of disaggregated inference architecture. With this approach, multiple GPUs process different parts of the task and then combine the results into a single answer. This increases speed, reduces latency, and makes resource usage more efficient. This is especially useful for document analysis, multimedia content generation, and working with large code projects.
A Breakthrough for Business and Science
Nvidia notes that Rubin CPX opens up new horizons for lawyers, doctors, and developers. In law, it will help work with hundreds of pages of laws; in medicine, it will help compare large arrays of patient data; and in IT, it will help analyze entire projects instead of individual files. In the creative field, the GPU will allow you to generate long videos and complex multimedia projects.
Focus on Inference, not Training
Unlike traditional solutions, Rubin CPX is primarily aimed at optimizing inference, accelerating the performance of existing models. This makes it attractive to companies that want to implement AI into their real-world business faster while reducing costs.
Market Launch
Rubin CPX is expected to hit the market in late 2026. Experts suggest that this processor could set a new standard for the industry, where working with long contexts will no longer be a rarity but the norm.