A new Blackwell architecture by Nvidia – a new milestone in the evolution of GPUs

A new Blackwell architecture by Nvidia – a new milestone in the evolution of GPUs

21.03.2024
Author: HostZealot Team
2 min.
115

At the GTC March 20224 event, NVIDIA presented a new chip architecture Blackwell as well as B200 GPUs based on it together with Grace Blackwell GB200 chips where both architectures will be combined. 

The B200 GPU has 208 billion transistors compared to 80 billion H100/H200, previously used in data centers, offering 20 petaflops of AI performance per single GPU (VS. 4 petaflops of H100). Such a chip will feature 192 GB HBM3e memory with up to 8 TBps of bandwidth.

Unlike more conventional GPUs, Blackwell B200 is a kind of double processor, as it’s composed of two joint crystals working as a single CUDA processor, being connected with NV-HBI NVIDIA High Bandwidth Interface at 10 TBps. Blackwell B200 is manufactured using TSMC's 4NP process. The crystals feature HMB3e stacks, with 24 GB and 1 TBps bandwidth each. 

For now, the most powerful solution announced is the GB200 chip, consisting of two B200 GPUs.

For connecting multiple nodes, Nvidia presents the fifth NVLink chip generation with bidirectional 1,8 TBps bandwidth, consisting of 50 billion transistors and manufactured using the TSMC 4NP technical process.

Every Blackwell GPU features 18 links through NVLink, which is 18 times more than in the case of H100. Since each link has 50 GBps of bidirectional bandwidth which means 100 GBps per connection, big groups of GPU nodes will function almost as one huge GPU unit.

Furthermore, the chips with new interfaces make up the NVIDIA B200 NVL72 server, which is an 18-server full-fledged rack solution with 18 1U servers, each having GB200 chips and a Grace CPU per each two GPU B200. This means that each computing node of GB200 NVL72 has two GB200 Superchips, with each rack containing two grace CPUs and four B200 GPUs featuring 80 petaflops FP4 AI and 40 petaflops FP8 AI performance.

A full GB200 has 36 Grace CPUs and 72 Blackwell GPUs with with 720 FP8 petaflops and 1440 FP4 petaflops. The 130 TBps of multinode bandwidth of this server is capable of processing up to 27 trillion AI language model parameters.

Related Articles