Nvidia Is Preparing a New Generation of GPUs to Support Millions of Contexts

1m, 17s

13:36, 10.09.2025

Article Content

Disaggregated Inference Architecture
A Breakthrough for Business and Science
Focus on Inference, not Training
Market Launch

Nvidia has unveiled the Rubin CPX graphics processor, designed specifically for language and multimodal models that need to store and analyze huge amounts of data. The chip is optimized to process contexts of over 1 million tokens, a figure that far exceeds the capabilities of current systems.

Disaggregated Inference Architecture

The key innovation of Rubin CPX is the use of disaggregated inference architecture. With this approach, multiple GPUs process different parts of the task and then combine the results into a single answer. This increases speed, reduces latency, and makes resource usage more efficient. This is especially useful for document analysis, multimedia content generation, and working with large code projects.

A Breakthrough for Business and Science

Nvidia notes that Rubin CPX opens up new horizons for lawyers, doctors, and developers. In law, it will help work with hundreds of pages of laws; in medicine, it will help compare large arrays of patient data; and in IT, it will help analyze entire projects instead of individual files. In the creative field, the GPU will allow you to generate long videos and complex multimedia projects.

Focus on Inference, not Training

Unlike traditional solutions, Rubin CPX is primarily aimed at optimizing inference, accelerating the performance of existing models. This makes it attractive to companies that want to implement AI into their real-world business faster while reducing costs.

Market Launch

Rubin CPX is expected to hit the market in late 2026. Experts suggest that this processor could set a new standard for the industry, where working with long contexts will no longer be a rarity but the norm.

VPS popular offers

See all products

DDoS Protected SSD-wKVM 4096

-15.4%

CPU

4 Xeon Cores

RAM

4 GB

Space

100 GB SSD

Bandwidth

60 Mbps

Windows

€ 73 /mo

€

/mo

Billed annually

KVM-NVMe 2048

-10%

CPU

3 Epyc Cores

RAM

2 GB

Space

20 GB NVMe

Bandwidth

Unlimited

Linux

€ 14.9 /mo

€

/mo

Billed annually

KVM-HDD HK 1024

-10%

CPU

3 Xeon Cores

RAM

1 GB

Space

40 GB HDD

Bandwidth

300 Gb

Linux

€ 4.98 /mo

€

/mo

Billed annually

wKVM-HDD 1024

-5%

CPU

3 Xeon Cores

RAM

1 GB

Space

40 GB HDD

Bandwidth

Unlimited

Windows

€ 12.1 /mo

€

/mo

Billed annually

10Ge-KVM-SSD 2048

-10%

CPU

4 Xeon Cores

RAM

2 GB

Space

30 GB SSD

Bandwidth

Unlimited

Linux

€ 30.3 /mo

€

/mo

Billed annually

10Ge-wKVM-SSD 2048

-8.4%

CPU

4 Xeon Cores

RAM

2 GB

Space

75 GB SSD

Bandwidth

Unlimited

Windows

€ 37.4 /mo

€

/mo

Billed annually

Keitaro KVM 32768

CPU

8 Epyc Cores

RAM

32 GB

Space

200 GB NVMe

Bandwidth

Unlimited

CentOS

Software

Keitaro

€

/mo

Billed monthly

KVM-SSD 8192 HK

-20.6%

CPU

6 Xeon Cores

RAM

8GB

Space

100GB SSD

Bandwidth

500GB

Linux

€ 59 /mo

€

/mo

Billed annually

10Ge-KVM-SSD 4096

-10%

CPU

4 Xeon Cores

RAM

4 GB

Space

50 GB SSD

Bandwidth

Unlimited

Linux

€ 60.5 /mo

€

/mo

Billed annually

KVM-SSD 512 HK

-13.1%

CPU

2 Xeon Cores

RAM

512 MB

Space

10 GB SSD

Bandwidth

300 GB

Linux

€ 7 /mo

€

/mo

Billed annually

Nvidia Is Preparing a New Generation of GPUs to Support Millions of Contexts

Disaggregated Inference Architecture

A Breakthrough for Business and Science

Focus on Inference, not Training

Market Launch

Was this article helpful to you?

VPS popular offers

DDoS Protected SSD-wKVM 4096

KVM-NVMe 2048

KVM-HDD HK 1024

wKVM-HDD 1024

10Ge-KVM-SSD 2048

10Ge-wKVM-SSD 2048

Keitaro KVM 32768

KVM-SSD 8192 HK

10Ge-KVM-SSD 4096

KVM-SSD 512 HK

Other articles on this topic