Nvidia GB200 NVL72 Is Not Yet Ready For Training Advanced AI Models

watch 1m, 11s
views 2

13:07, 26.08.2025

Article Content
arrow

  • Why Training Is Not Yet Possible
  • Analyst Recommendations and Nvidia's Focus
  • Future Outlook and Economic Considerations

Analytics agency SemiAnalysis has published an analysis of server solutions for training artificial intelligence and concluded that Nvidia H100 and H200 accelerators, as well as Google's TPUs, are currently better suited for training advanced models. GB200 NVL72 server racks with the latest Nvidia GPUs face problems due to the copper NVLink switchboard and imperfect diagnostics and debugging tools, which lead to downtime.

Why Training Is Not Yet Possible

In theory, the failure of a single chip is not critical; the NVL72 recommends training AI on 64 GB200 GPUs and keeping 8 more in reserve. However, connecting them requires quickly locating the fault, which is currently difficult due to limited diagnostic tools. As a result, the training process stops, checkpoints are rolled back, and repairs are delayed. 

SemiAnalysis notes that there are currently no known examples of advanced model training completed on GB200 NVL72.

Analyst Recommendations and Nvidia's Focus

At the moment, analysts advise using GB200 NVL72 primarily for inference, running already trained models. Nvidia also emphasizes inference in its latest materials, although early announcements suggested parallel work on training and running models.

Future Outlook and Economic Considerations

SemiAnalysis predicts that Nvidia will be able to resolve issues with NVLink and software by the end of the year. However, the cost of ownership for a single GB200 GPU is 1.6–1.7 times higher than for the H100. To justify the investment in new accelerators, they must demonstrate at least 1.6 times greater performance with similar downtime.

Share

Was this article helpful to you?

VPS popular offers

-10%

CPU
CPU
4 Xeon Cores
RAM
RAM
2 GB
Space
Space
60 GB HDD
Bandwidth
Bandwidth
Unlimited
KVM-HDD 2048 Linux

7.7 /mo

/mo

Billed annually

-9.5%

CPU
CPU
4 Xeon Cores
RAM
RAM
8 GB
Space
Space
100 GB SSD
Bandwidth
Bandwidth
Unlimited
10Ge-wKVM-SSD 8192 Windows

121.5 /mo

/mo

Billed annually

-10%

CPU
CPU
10 Xeon Cores
RAM
RAM
64 GB
Space
Space
300 GB SSD
Bandwidth
Bandwidth
Unlimited
KVM-SSD 65536 Linux

181.5 /mo

/mo

Billed annually

-10%

CPU
CPU
6 Xeon Cores
RAM
RAM
16 GB
Space
Space
150 GB SSD
Bandwidth
Bandwidth
Unlimited
10Ge-KVM-SSD 16384 Linux

231 /mo

/mo

Billed annually

-5.9%

CPU
CPU
4 Xeon Cores
RAM
RAM
2 GB
Space
Space
75 GB SSD
Bandwidth
Bandwidth
Unlimited
wKVM-SSD 2048 Windows

14.8 /mo

/mo

Billed annually

-7.3%

CPU
CPU
3 Epyc Cores
RAM
RAM
2 GB
Space
Space
25 GB NVMe
Bandwidth
Bandwidth
Unlimited
wKVM-NVMe 2048 Windows

22 /mo

/mo

Billed annually

-15.6%

CPU
CPU
2 Xeon Cores
RAM
RAM
512 MB
Space
Space
10 GB SSD
Bandwidth
Bandwidth
1 TB
KVM-SSD 512 Metered Linux

5.33 /mo

/mo

Billed annually

CPU
CPU
6 Epyc Cores
RAM
RAM
16 GB
Space
Space
150 GB NVMe
Bandwidth
Bandwidth
Unlimited
Keitaro KVM 16384
OS
CentOS
Software
Software
Keitaro
/mo

Billed monthly

-16.2%

CPU
CPU
4 Xeon Cores
RAM
RAM
4 GB
Space
Space
50 GB SSD
Bandwidth
Bandwidth
60 Mbps
DDoS Protected SSD-KVM 4096 Linux

67 /mo

/mo

Billed annually

-22.2%

CPU
CPU
4 Xeon Cores
RAM
RAM
4 GB
Space
Space
50 GB SSD
Bandwidth
Bandwidth
300 GB
KVM-SSD 4096 HK Linux

33 /mo

/mo

Billed annually

Other articles on this topic

cookie

Accept cookies & privacy policy?

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we'll assume that you are happy to receive all cookies on the HostZealot website.