Record performance: 17,000 tokens per second with a new solution from startup Taalas
14:03, 20.02.2026
Canadian startup Taalas recently announced its first product, the HC1 chip with Llama 3.1 8B. The company's approach is unique in that the model is not loaded into memory but is directly soldered into the silicon during the manufacturing stage. As a result, it is possible to achieve a record result of 17,000 tokens per second per user. This result is almost 10 times faster than GPU solutions, and also achieves significant energy savings and minimizes production costs.
About Taalas
The startup was founded by Ljubiša Bajić, former director of integrated circuit design at AMD, his wife Leila Bajić (former technology manager and engineer at AMD, ATI, Altera), and Drago Ignjatović (former director of ASIC design at AMD).
The company's main approach can be described as total specification. The company plans to produce a separate chip for each model. The microchip will consist of approximately 100 layers, and only the top two will be customized as needed, with mask ROM recall fabric embedded in them. This will make it possible to produce a chip in two months instead of six. Computing and memory will also be combined on a single crystal.
At this stage, such aggressive quantization reduces quality when compared to GPU benchmarks. The startup acknowledges this fact, which is why it positions the product as a beta service. The chip's minimum flexibility is preserved due to the possibility of retraining via LaRA adapters and the presence of a context window.
The company has raised $200 million in investments and plans to release a new mid-sized chip soon, with the launch of an advanced LLM on the HC2 platform possible towards the end of the year.
Hope you found this article helpful - what do you think? Like and subscribe to our blog for more practical insights and the latest tech news from HostZealot.