A real competitor to Nvidia, and one chip instead of thousands of servers
12:27, 09.12.2025
Cerebras Systems has developed the Wafer Scale Engine, the largest processor the size of a dinner plate. This engineering innovation was created to minimise travel time between servers. The new system consists of a monolith that combines computing and memory.
Geopolitical restrictions
At the end of this year, instead of obvious success in the semiconductor sector, Cerebras Systems faced real problems with its public offering. Back in October, the company closed a private financing round worth $1.1 billion.
Cerebras showed fantastic results, with revenue reaching $500 million and Meta, AWS, and IBM among its customers. However, the situation changed dramatically when the US Foreign Investment Committee blocked its IPO due to the possible leakage of technology to China.
More than 80 percent of the company's revenue is generated by the Abu Dhabi-based holding company G42. This partnership is seen as a strategic threat due to the scale of the companies' partnership and the origin of the capital.
Thus, a company with solutions that clearly surpass Nvidia found itself hostage to the political situation.
Features of the flagship Cerebras WSE-3 chip
Cerebras WSE-3 is a plate that functions as a single superprocessor. Meanwhile, Nvidia's architecture consists of huge clusters. Also, Nvidia Blackwell uses the HBM memory standard, and it is located next to the computing crystal rather than inside it. This approach leads to delays when accessing data.
Cerebras, in turn, has completely changed its approach, and all memory is integrated into the chip structure (SRAM). As a result, the company not only minimises latency, but also will not be in a state of HBM memory shortage, since Nvidia has effectively monopolised HBM supplies.
Ultra-fast SRAM memory is integrated into the computing cores, providing a bandwidth of 21 petabytes per second — phenomenal performance compared to Nvidia's top-of-the-line solutions.
Scaling to achieve modern capabilities
Modern LLMs run on clusters of graphics cards to achieve the expected results. Nvidia spent a lot of time and resources creating the NVLink bus and ensuring the synchronous operation of thousands of chips. But clusters still use more energy to transfer results and synchronise.
WSE-3 does not have this problem, thanks to its ability to place 900,000 cores on a single piece of silicon. The new approach does not use optical transceivers, network cables, or switches, so internal delays are reduced to zero.