Red Hat Launches llm-d, a Kubernetes-Based Platform for Scalable AI Inference

1m, 1s

13:10, 22.05.2025

Article Content

Key Features of llm-d
Cooperation with Leading Players in the AI Industry
Technology and Architecture

Red Hat has introduced llm-d, a new open source project designed for high-performance distributed inference of large language models (LLMs). The platform is developed on Kubernetes and is focused on simplifying the scaling of generative AI. The source code is available on GitHub under the Apache 2.0 license.

Key Features of llm-d

The main features of the platform include

Optimized Inference Scheduler for vLLM;
Disaggregated service architecture;
Reuse of prefix caches;
Flexible scaling depending on traffic, tasks, and available resources.

Cooperation with Leading Players in the AI Industry

The development is carried out in partnership with such companies as Nvidia, AMD, Intel, IBM Research, Google Cloud, CoreWeave, Hugging Face, and others. Such cooperation emphasizes the seriousness of the approach to llm-d and the potential of the platform as an industry standard.

Technology and Architecture

The project uses the vLLM library for distributed inference, as well as components such as LMCache for KV cache offloading, AI-enabled intelligent traffic routing, highly efficient communication APIs, and automatic scaling to load and infrastructure.

All this allows you to adapt the system to different usage scenarios and performance requirements. And the launch of llm-d can be a significant step towards democratizing powerful AI systems and making them accessible to a wide audience of developers and researchers.

VPS popular offers

See all products

KVM-NVMe 16384

-10%

€

/mo

€ 60.5 /mo

Billed annually

CPU

6 Epyc Cores

RAM

16 GB

Space

150 GB NVMe

Bandwidth

Unlimited
DDoS Protected SSD-KVM 8192

-15.5%

€

/mo

€ 95 /mo

Billed annually

CPU

6 Xeon Cores

RAM

8 GB

Space

100 GB SSD

Bandwidth

80 Mbps
KVM-NVMe 32768

-10%

€

/mo

€ 96.8 /mo

Billed annually

CPU

8 Epyc Cores

RAM

32 GB

Space

200 GB NVMe

Bandwidth

Unlimited
wKVM-NVMe 2048

-7.3%

€

/mo

€ 22 /mo

Billed annually

CPU

3 Epyc Cores

RAM

2 GB

Space

25 GB NVMe

Bandwidth

Unlimited
10Ge-wKVM-SSD 2048

-8.4%

€

/mo

€ 37.4 /mo

Billed annually

CPU

4 Xeon Cores

RAM

2 GB

Space

75 GB SSD

Bandwidth

Unlimited
wKVM-SSD 4096 HK

-21.5%

€

/mo

€ 40 /mo

Billed annually

CPU

2 Xeon Cores

RAM

4 GB

Space

100 GB SSD

Bandwidth

300 GB
KVM-SSD 1024 Metered

-26.7%

€

/mo

€ 10 /mo

Billed annually

CPU

3 Xeon Cores

RAM

1 GB

Space

20 GB SSD

Bandwidth

1 TB
wKVM-SSD 65536

-9.7%

€

/mo

€ 187.5 /mo

Billed annually

CPU

10 Xeon Cores

RAM

64 GB

Space

300 GB SSD

Bandwidth

Unlimited
KVM-SSD 512 HK

-13.1%

€

/mo

€ 7 /mo

Billed annually

CPU

2 Xeon Cores

RAM

512 MB

Space

10 GB SSD

Bandwidth

300 GB
KVM-SSD 2048

-10%

€

/mo

€ 8.3 /mo

Billed annually

CPU

4 Xeon Cores

RAM

2 GB

Space

30 GB SSD

Bandwidth

Unlimited

Red Hat Launches llm-d, a Kubernetes-Based Platform for Scalable AI Inference

Key Features of llm-d

Cooperation with Leading Players in the AI Industry

Technology and Architecture

Was this article helpful to you?

VPS popular offers

Other articles on this topic