Alibaba Cloud Reduces GPU Load for AI Services by Nearly Five Times
13:16, 22.10.2025
Alibaba Cloud concluded that AI services often use resources irrationally. Many AI models consume power unevenly, thereby increasing the load on the GPU. Such inefficient distribution of resources hinders scaling and increases the costs associated with the AI infrastructure.
Aegaeon: Resource Redistribution
In response to the observed challenge, Alibaba Cloud introduced the Aegaeon system, which dynamically redistributes resources, thereby solving the problem. Thanks to the system, graphics modules can now switch between models in real time, even during response generation.
Why Aegaeon is Profitable
The new system allows one GPU to serve up to seven models simultaneously, which is much more productive compared to the previous two or three models. When using Aegaeon, task switching delays were reduced by 97%. The system is already in use on Alibaba Cloud's Bailian marketplace. This represents an important step towards a cheaper and more sustainable AI infrastructure.