Nvidia GB200 NVL72 Is Not Yet Ready For Training Advanced AI Models

1m, 11s

13:07, 26.08.2025

Article Content

Why Training Is Not Yet Possible
Analyst Recommendations and Nvidia's Focus
Future Outlook and Economic Considerations

Analytics agency SemiAnalysis has published an analysis of server solutions for training artificial intelligence and concluded that Nvidia H100 and H200 accelerators, as well as Google's TPUs, are currently better suited for training advanced models. GB200 NVL72 server racks with the latest Nvidia GPUs face problems due to the copper NVLink switchboard and imperfect diagnostics and debugging tools, which lead to downtime.

Why Training Is Not Yet Possible

In theory, the failure of a single chip is not critical; the NVL72 recommends training AI on 64 GB200 GPUs and keeping 8 more in reserve. However, connecting them requires quickly locating the fault, which is currently difficult due to limited diagnostic tools. As a result, the training process stops, checkpoints are rolled back, and repairs are delayed.

SemiAnalysis notes that there are currently no known examples of advanced model training completed on GB200 NVL72.

Analyst Recommendations and Nvidia's Focus

At the moment, analysts advise using GB200 NVL72 primarily for inference, running already trained models. Nvidia also emphasizes inference in its latest materials, although early announcements suggested parallel work on training and running models.

Future Outlook and Economic Considerations

SemiAnalysis predicts that Nvidia will be able to resolve issues with NVLink and software by the end of the year. However, the cost of ownership for a single GB200 GPU is 1.6–1.7 times higher than for the H100. To justify the investment in new accelerators, they must demonstrate at least 1.6 times greater performance with similar downtime.

VPS popular offers

See all products

KVM-HDD 2048

-10%

CPU

4 Xeon Cores

RAM

2 GB

Space

60 GB HDD

Bandwidth

Unlimited

Linux

€ 7.7 /mo

€

/mo

Billed annually

10Ge-wKVM-SSD 8192

-9.5%

CPU

4 Xeon Cores

RAM

8 GB

Space

100 GB SSD

Bandwidth

Unlimited

Windows

€ 121.5 /mo

€

/mo

Billed annually

KVM-SSD 65536

-10%

CPU

10 Xeon Cores

RAM

64 GB

Space

300 GB SSD

Bandwidth

Unlimited

Linux

€ 181.5 /mo

€

/mo

Billed annually

10Ge-KVM-SSD 16384

-10%

CPU

6 Xeon Cores

RAM

16 GB

Space

150 GB SSD

Bandwidth

Unlimited

Linux

€ 231 /mo

€

/mo

Billed annually

wKVM-SSD 2048

-5.9%

CPU

4 Xeon Cores

RAM

2 GB

Space

75 GB SSD

Bandwidth

Unlimited

Windows

€ 14.8 /mo

€

/mo

Billed annually

wKVM-NVMe 2048

-7.3%

CPU

3 Epyc Cores

RAM

2 GB

Space

25 GB NVMe

Bandwidth

Unlimited

Windows

€ 22 /mo

€

/mo

Billed annually

KVM-SSD 512 Metered

-15.6%

CPU

2 Xeon Cores

RAM

512 MB

Space

10 GB SSD

Bandwidth

1 TB

Linux

€ 5.33 /mo

€

/mo

Billed annually

Keitaro KVM 16384

CPU

6 Epyc Cores

RAM

16 GB

Space

150 GB NVMe

Bandwidth

Unlimited

CentOS

Software

Keitaro

€

/mo

Billed monthly

DDoS Protected SSD-KVM 4096

-16.2%

CPU

4 Xeon Cores

RAM

4 GB

Space

50 GB SSD

Bandwidth

60 Mbps

Linux

€ 67 /mo

€

/mo

Billed annually

KVM-SSD 4096 HK

-22.2%

CPU

4 Xeon Cores

RAM

4 GB

Space

50 GB SSD

Bandwidth

300 GB

Linux

€ 33 /mo

€

/mo

Billed annually

Nvidia GB200 NVL72 Is Not Yet Ready For Training Advanced AI Models

Why Training Is Not Yet Possible

Analyst Recommendations and Nvidia's Focus

Future Outlook and Economic Considerations

Was this article helpful to you?

VPS popular offers

KVM-HDD 2048

10Ge-wKVM-SSD 8192

KVM-SSD 65536

10Ge-KVM-SSD 16384

wKVM-SSD 2048

wKVM-NVMe 2048

KVM-SSD 512 Metered

Keitaro KVM 16384

DDoS Protected SSD-KVM 4096

KVM-SSD 4096 HK

Other articles on this topic