Intel's Lunar Lake architecture is introduced

Intel's Lunar Lake architecture is introduced

Author: HostZealot Team
2 min.

During the Intel Tech Tour 2024, Lunar Lake was presented, and the main modifications became unrevealed. These processors were primarily designed for laptops, but some of the improvements can be transferred to Arrow Lake and used on PCs.

To achieve the desired level of performance and power, the new Lunar Lake architecture has undergone important optimizations. The most significant changes were made to the Skymont and E-cores. The integrated Xe2 graphics also had a considerable impact on the performance of the video chip.

Lunar Lake features the latest neural processor, which has a performance of 48 TOPS and can be used for AI. If we take into account the overall performance, it is 120 TOPS, and this will have a considerable impact on the capabilities of artificial intelligence.

One of the goals of introducing mobile processors of this series was to ensure energy efficiency. It is also planned to use this architecture in the next projects from Intel (Panther Lake and Arrow Lake).

To create the best possible architecture, Intel used TSMC's N3B process. This choice particularly affected the characteristics of computing cores, NPUs, and embedded graphics. As for the controller, the company decided to use the TSMC N6 process. Only the 22FFL Foveros tiles were developed by Intel.

Lunar Lake SoC Structure

The structure of Lunar Lake processors will have its own peculiarities, such as 4 E-core and 4 P-core. N6, TSMC N3B, and a stiffener will be placed on the base tile. Two memory stacks will be placed on the chip in 32 GB and 15 GB configurations. The guaranteed bandwidth per chip will be up to 8.5 GT/s.

The compute tile will consist of NPU 4.0, Xe2 chips, and main cores. To increase the frequency of calls, the tile will have 8 MB of “side cache”, which in turn will be used by jointly computing blocks.

Energy-efficient cores

Significant progress has been made with Skymont and improvements regarding the Lion Cove kernel. More specifically, there will be a 68 percent increase in IPC for floating point work and a 38 percent increase for integer workloads. In this case, the performance of multi-threaded tasks will increase by almost 4 times, and up to 2 times in single-threaded tasks.

The company is also planning positive changes in vector performance, which will be made possible by switching from two 128-bit vector channels to four. There will also be changes to minimize latency.

Previous energy-efficient cores had a 2 MB cache, and now this figure has been significantly increased to 4 MB.

Performance cores

An unexpected move made by the company was the removal of Hyperthreading, which provided an average IPC increase of 14 percent. It was found that hyperthreading, which has a positive effect on increasing IPC in a multi-threaded workload, is not very useful in a hybrid design. Intel reports that, depending on the chip power, the overall performance is 10-18%.

With the removal of Hyperthreading, the core has become smaller, and other changes, such as increasing GPU cores or E-cores, are now possible. Thanks to this important step, the company has been able to increase efficiency by 15 percent and performance by 10 percent.  

Compared to the previous architecture, the prediction unit was expanded by 8 times. In addition, the cache-to-L2 bandwidth was tripled and the instruction fetch bandwidth was doubled. As for micro-operations, their queue has been increased to 192.

Intel Xe2 graphics

The Xe2 graphics processor will have a significantly different performance before the use of artificial intelligence and will have an overall performance increase of one and a half times. In addition to the new Lunar Lake processors, Xe2 will also be available in gaming graphics cards.

The second-generation core in the Xe architecture is characterized by an increase in cache, support for certain data types, and modifications to vector mechanisms. The graphics processor consists of elements with fixed functions, and visualization that significantly affect textures and geometry.

The vector engine supports the following instructions: BF16, INT4, INT2, FP16, which is necessary for operations related to artificial intelligence. The visualization unit has also seen a significant number of improvements and accelerations.

The Lunar Lake video chip has 8 ray tracing units, 64 vector engines, 2 Xe cores, and many other components.

Controller and NPU 4.0 

The NPU has significantly exceeded the characteristics of competitors. A separate chip is needed primarily for AI tasks, as well as to save battery. The overall performance of Lunar Lake is 120 TOPS.

In general, there have been a significant number of improvements in the architecture, including DMA and MAC mechanisms, 6 neural computing mechanisms, and much more. Compared to the previous generation of NPUs, the throughput has increased significantly.

The controller tile has all the I/O functions as well as memory controllers. The company says that Lunar Lake laptops will have at least 2 port-connectivity. 

Related Articles