October 1, 2022

Robotic Notes

All technology News

The next generation of Arm server CPU cores

6 min read

Just under four years ago, Arm announced its Neoverse family of infrastructure processors. Deciding to double down on the server and desktop markets by designing Arm CPU cores specifically for those markets—and not just recycling consumer-focused Cortex-A designs—Arm set out to tackle the infrastructure market in a much more aggressive way. Those efforts, in turn, have increasingly paid off handsomely for Arm and its partners, who, thanks to products like Graviton and Amazon’s Ampere Altra, have finally managed to capture a significant portion of the server processor market.

But with Arm processors finally achieving the market penetration that had eluded them for the previous decade, Arm needs to make sure it doesn’t rest on its laurels. Of the company’s three lines of Neoverse core designs—the efficient E, the flexible N, and the high-performance V—the company is now working on its second generation of N cores, aptly named N2. Now the company is preparing to update the rest of the Neoverse lineup with the next generation of V and E cores, announcing the Neoverse V2 and Neoverse E2 cores today. Both designs are planned to bring the Armv9 architecture to HPC and other server clients, as well as significant performance improvements.

Arm Neoverse V2: Armv9 graces high-performance computing

Leading the charge for Arm’s new CPU core IP is the company’s second-generation V-series design, the Neoverse V2. The full V2 platform, codenamed Demeter, marks Arm’s first iteration of their high-performance V-series cores, as well as the transition of this core lineup from Armv8.4 ISA to Armv9. And while this is only Arm’s second attempt at a dedicated high-performance server core, make no mistake: Arm aims to be ambitious. The company claims that the Neoverse V2 processors will offer the highest single-threaded overall performance available on the market, eclipsing the next-generation designs from both AMD and Intel.

While this week’s announcement from Arm isn’t a full deep dive into the new architecture — and, more annoyingly, the company didn’t talk about specific PPA metrics — Arm does offer a high-level look at some of the changes and features that will come with the platform. V2. Of course, the V2 IP is already complete and shipping to customers today (most notably NVIDIA), but Arm is somewhat teasing what they say about V2 before the first IP-based chips ship in 2023 .

First and foremost, moving to Armv9 brings with it the full set of features that come with the latest Arm architecture. This includes the security improvements that are a cornerstone of the architecture (and especially convenient for cloud-based environments), along with the newer SVE2 vector extensions to Arm.

On the latter, Arm makes an interesting change here by reconfiguring the width of their vector engines; while V1 implements SVE(1) using 2-pipe 256-bit SIMD, V2 switches to 4-pipe 128-bit SIMD. The end result is that the cumulative SIMD width of V2 is no wider than V1, but the execution flow has changed to handle a larger number of smaller vectors in parallel. This change makes the SIMD pipeline width identical to Arm’s Cortex parts (which are 128-bit, the minimum size for SVE2), but it means that Arm no longer takes full advantage of scalable part of SVE by using larger SIMDs. I expect we’ll find out why Arm is taking this route once they do a full V2 deep dive, as I’m curious if it’s purely performance play or something closer to design homogenization within the Arm ecosystem.

Also, it’s probably worth noting that while the Arm presentation slides put bfloat16 and int8 matmul as functions, they are not new Characteristic. Still, Arm promises that V2’s SIMD processing will provide microarchitecture performance improvements over V1.

More generally, V2 will also introduce larger L2 cache sizes. The V2 design supports up to 2MB of private L2 cache per core, double the V1’s maximum size. V2 will also introduce further improvements to Arm’s integer processing performance, although the company isn’t going into more detail at this stage. Architecturally, the V1 borrowed quite a bit from the Cortex-X1 processor design, and it wouldn’t be too surprising if that’s the case again for the V2, borrowing from the X2. In this case, consumer chips like Snapdragon 8 Gen1 and Dimensity 9000 should provide a loose reference of what to expect.

For the Demeter Arm platform, it will once again use their CMN-700 mesh fabric that was first introduced for the V1 generation. The CMN-700 is still a modern mesh design with support for up to 144 nodes in a 12×12 configuration and is suitable for interfacing with DDR5 memory as well as PCIe 5/CXL 2 for I/O. As a result, strictly speaking V2 does not bring anything new at the fabric level – even 512MB SLC can be done with a V1 + CMN-700 setup – but it does mean that the CMN-700 network and its features are now a baseline that moves forward with V2.

The Neoverse V2 core, in turn, will be the cornerstone of the upcoming generation of high-performance Arm server processors. The de facto flagship here will be NVIDIA’s Grace processor, which will be one of the first (if not the first) V2 designs to ship in 2023. NVIDIA had previously announced that Grace would be based on a Neoverse design, so this week’s announcement from Arm finally confirms the long-held suspicion that Grace will be based on the next-gen Neoverse V core.

NVIDIA, for its part, is planning their fall GTC event to take place in just a few days. So we’re likely to hear a bit more about Grace and its Neoverse V2 underpinnings as NVIDIA looks to promote the chip ahead of its release next year.

Neoverse E2: Cortex-A510 for use with N2

Along with the Neoverse V2 announcement, Arm also used a briefing this week to announce the Neoverse E2 platform. Unlike the V2 reveal, this is a much smaller-scale announcement, and Arm offers only a few technical details. After all, E2’s day in the sun will come a little later.

However, the E2 platform is delivered to partners with an eye toward interoperability with the existing N2 platform. To this end, Arm has combined the Cortex-A510 CPU, Arm’s small/high-performance Cortex CPU core, and paired it with the CMN-700 network. This is intended to give additional flexibility to server operators/vendors by providing an alternative CPU core to the N2, while offering the advanced I/O and memory features of the Arm network. Highlighting this, the E2 system backplane is even compatible with the N2 backplane.

Neoverse Next: Poseidon, N-Next and E-Next

Finally, Arm’s announcement this week gives a glimpse into the company’s future roadmap for all three Neoverse platforms, where, unsurprisingly, Arm is working on updated versions of each of the platforms.

It should be noted that all three platforms require the addition of PCIe 6 support as well as CXL 3.0 support. This will come from the next iteration of Arm’s CMN mesh network, which, as Arm already does today, is shared between the three platforms.

In the meantime, it’s interesting to see the Poseidon name pop up again on Arm’s roadmaps. Going back to Arm’s first Neoverse roadmap, Poseidon was the name attached to Arm’s 5mn/2021 platform, a spot since occupied by N2 and V1/V2 in various forms. Since V2 doesn’t land in hardware until 2023, Poseidon/V3 is still years away, but it probably makes some sense for Arm to keep the codename (as a new microarchitecture).

But first the N-Next platform will be released – the alleged Neoverse N3. With the Neoverse N platform a generation ahead of the rest (N2 was first announced in 2020), it will be the next platform to be updated. The N3 should be available to partners in 2023, with Arm widely touting the generation’s performance and efficiency.

Source link