Meet Xe HPG, the beating heart inside Intel's first graphics cards
[ad_1]
It’s the start of a new era of competition. Today, Intel revealed its debut Arc GPUs, heralding its long-teased entry into discrete consumer graphics cards. Watch out, Nvidia and AMD. Chipzilla’s in the fray now, fueled by its new Xe HPG (High-performance gaming) GPU architecture.
Update: In the months since Intel revealed Xe HPG and Arc on March 30, 2022, Arc GPUs have begun slowly hitting the streets, beginning in Asia. While we don’t have any formal Intel Arc reviews yet, as the GPUs have yet to land in the U.S., check out our exclusive Arc A370M laptop benchmarks, along with Gamers Nexus’s desktop Arc A380 review of a Chinese model, and Intel’s performance estimates for the higher-end Arc A750 desktop card for real-world context of what this architecture is capable of. Our original Xe HPG deep-dive continues below.
Intel took an unusual (but strategically smart) approach to Arc’s debut, rolling out Arc 3 graphics for modestly priced portable laptops. It lets the company leverage its substantial strengths in notebooks and software support rather than going blow-for-blow in gaming frame rates on the desktop, where Nvidia and AMD stand firm. We’ve covered the Arc 3 laptop GPU reveal and Intel’s killer features in a separate piece that explains what everyday folks should expect from this new breed of laptop. There’s some pretty compelling stuff, including key “Deep Link” features that add eye-opening capabilities when you pair an Intel Arc GPU with an Intel Core processor.
That’s not the point of this article though. As part of the reveal, Intel Fellow Tom Peterson also provided the press with a high-level overview of the Xe HPG architecture underpinning these Arc “Alchemist” graphics cards. It’s our first glimpse at the nuts and bolts powering Intel’s discrete graphics ambitions.
So, as we did with Nvidia’s Ampere and AMD’s RDNA 2 architectures, here’s a brief technical explainer on the innards of Intel Arc’s Xe HPG chips. Much the way Nvidia and AMD use different technologies and terminologies for their designs, Intel’s Arc chips rely on some proprietary concepts (including a new take on clock speeds that needs some explaining). That makes it difficult to compare Arc against rival GPU architectures—Intel doesn’t even use common terms like ROPs and TMUs—but by the time we’re done here, you’ll have a solid high-level understanding of what makes Xe HPG tick. Let’s dig in.
Meet Xe HPG
Intel
For Intel, Xe HPG “render slices” comprise the backbone of every Arc GPU. Intel’s laptop and desktop Arc offerings can be scaled up or down as needed to fit different market needs, but these render slices are at their heart, containing dedicated ray tracing units, rasterizers, geometry blocks, and the fundamental building block for Arc, the Xe Cores themselves. Xe XPG can scale all the way up to eight render slices in Arc mobile GPUs, represented by the flagship Arc A770M GPU in laptop form.
Each render slice contains four Xe cores and four ray tracing units, along with all the other bits necessary for running a modern GPU. These render slices are fully DirectX 12 Ultimate compliant, meaning Intel’s Arc GPUs can handle ray tracing, Variable Rate Shading, Mesh Shading, and all the other features associated with that standard.
Intel
Let’s go deeper and take a peek at the Xe cores themselves. Each Xe core (again, there are four per render slice) is comprised of three key bits: 16 256-bit “XVE” vector engines that handle more traditional rasterization tasks, 16 1024-bit “XMX” matrix engines that handle machine learning tasks (like the tensor cores in Nvidia’s rival RTX GPUs), and 192KB of shared L1/SLM cache. That cache can be used to hold tasks during compute workloads, or shaders and textures while gaming.
Intel
The biggest companies in PC gaming may be betting big on ray tracing being the future of graphics, but traditional rendering remains king for now. Each Xe Vector Engine includes a dedicated floating point (FP) execution port to handle traditional shading tasks, along with a shared INT/EM port that can tackle integer-based tasks at the same time.
Nvidia introduced concurrent FP/INT pipelines with its RTX 20-series “Turing” architecture to keep integer tasks from clogging up the FP32 pipeline, and it’s become the norm since. “When Nvidia examined how real-world games behaved, it found that for every 100 floating point instructions performed, an average of 36 and as many as 50 non-floating point instructions were also processed, jamming things up,” we wrote in 2018. “The new integer pipeline handles those extra instructions separately from and concurrently with the FP32 pipeline. Executing the two tasks at the same time results in a big speed boost.”
Intel
Intel’s dedicated “XMX” matrix engines hook into the vector engines in each Xe Core. They’re broadly similar to Nvidia’s RTX tensor cores, designed to greatly accelerate machine learning tasks. These are the bits that unlock the potential of XeSS, Intel’s rival to Nvidia’s vaunted DLSS upsampling, as well as other special sauce features like Hyper Compute and the virtual camera feature in Intel’s new Arc Control command center. (Again, read our Arc laptop GPU reveal coverage for deeper insight into those consumer-level features.)
Intel
When tapped by compatible software (such as a game with XeSS or an app that supports Hyper Compute), the XMX core’s 4-deep systolic array can calculate up to 256 multiply accumulate (MAC) operations per clock for INT8 inferencing, a massive increase over the 64 ops/clock offered by modern GPUs with DP4a hardware on board, and the 16 ops/clock supported by older GPUs.
Intel’s XeSS supports a fallback mode to run on rival Nvidia and AMD graphics cards that lack XMX cores, defaulting to DP4a hardware instead. This picture illustrates very well why Intel expects XeSS to run much, much faster on Arc GPUs with XMX hardware inside.
Intel
Each Xe Core features 16 total Vector and Matrix engines, with pairs of each running in lockstep, able to run FP, INT, and XMX tasks all at the same time. Arc GPUs can be kept very, very busy indeed.
Intel
Intel has always been proud of its media engines, spearheaded by the lightning-fast QuickSync technology, and the Xe XPG’s media engine is no different. It includes all the modern capabilities you’d expect in a graphics chip—various 8K HDR encode and decode support, HEVC, VP9, you name it—but also one big inclusion that no other chip (CPU or GPU) offers: hardware-accelerated AV1 encoding.
The highly efficient next-generation video standard was created by a consortium of industry giants and is rapidly moving towards becoming the norm, and modern desktop GPUs support AV1 decoding that can help you watch 8K videos without your system setting itself on fire, but until now you needed to use software alone to actually create AV1 videos. Intel says that the hardware-accelerated AV1 creation unlocked by Arc is 50 times faster than software encodes, or it’s capable of delivering much clearer streaming visuals at the same bitrate as other encoders.
Paired with the Hyper Encode feature offered in all-Intel laptops as part of the company’s Deep Link suite, which leverages the media engines in both the CPU and GPU rather than one or the other, Arc-based systems could prove terribly compelling for video creators (if gaming performance is up to snuff, of course).
Xe HPG display engine
Intel
The Xe HPG display engine remains consistent across the Arc GPU stack, meaning every Arc graphics card offers the same video output capabilities (though the exact port configuration will vary by model). Don’t expect good frame rates if you actually try gaming on a pair of 8K screens, but it’s good to know Arc will support it if you want all the pixels for your productivity tasks!
Real-world Arc A-series laptop GPUs
Intel
Let’s take a moment to bring all this technical talk back to the practical realm. Intel cobbled together a bunch of Xe cores and render slices into a pair of dedicated Arc “Alchemist” GPUs for the mobile market: the higher-end ACM-G10, and the more modest ACM-G11, which will appear in the debut Arc 3 laptops launching today.
Intel
From there, those GPUs can be sliced and diced to meet different market needs. Here’s how the first generation of Arc graphics for laptops shakes out: Arc 3 laptops launch today, with Arc 5 and 7 laptops expected to launch sometime early this summer.
Xe HPG graphics clock speeds
Something might have jumped out at you in those laptop GPU spec charts: their ultra-low clock speeds. In an era where Nvidia’s GPUs push 2GHz and some AMD GPUs clear 2.5GHz, seeing Intel’s Arc topping out at 1650MHz and going as low as 900MHz is a tad eye-raising. Clock speeds between rival graphics brands aren’t as clear cut as they seem, however.
Intel
AMD’s “Game Clock” for Radeon GPUs isn’t the same as Nvidia’s “Boost Clock,” as I’ve explained before. Intel is using yet another metric for its Arc GPUs, dubbed “Graphics Clock.” Petersen defined Intel’s Graphics Clock as the average clock speed for a typical workload that particular GPU was intended for (so gaming for He XPG and likely compute tasks for workstation cards, for example). If you look at the laptop GPU charts above, you’ll also see a range of TDPs defined for each; the Graphics Clock is based off the lowest available TDP. In other words, Intel’s Graphics Clock essentially represents almost a worst case scenario for Arc GPUs.
Intel
All that said, graphics cores can run at different speeds depending on how hard they’re being pushed—they’ll hit much higher speed in 2D retro games and much lower speeds in complex modern games that hit every part of the Xe Core and Render Slice, for example. And wattage can make a massive difference to performance as well; as we’ve seen with Nvidia’s mobile GeForce offerings, pumping more juice into a GPU can help propel a lower-tier GPU past a low-watt version of an ostensibly more potent sibling.
It’s also worth noting that clock speed isn’t everything. In the same company’s architecture, faster is generally better—a 2GHz GeForce GPU will be faster than a 1.5GHz one, say. But AMD’s desktop Radeon RX 6500 XT lags behind its siblings despite packing a ludicrously fast 2.8GHz clock speed. Raw clock speed gains are far from the only way to drive faster performance, as AMD’s Robert Hallock recently explained on our Full Nerd podcast. That company’s Ryzen 7 5800X3D processor actually saw big gaming performance gains by dropping clock speeds and plopping a huge slab of cache atop the chip.
It’s complicated, is what I’m saying. Don’t look too deeply into the clock speeds for Intel’s Arc GPUs until laptops and desktop graphics cards wind up in the hands of reviewers.
But wait, there’s more!
Intel
And that about does it for our tour of Intel’s Xe HPG architecture. The company kept things pretty high level for today’s mobile-centric reveal, but we’d expect to see a whitepaper with more details released the closer we get to the arrival of Arc 5 and 7 laptops in early summer, and Arc desktop graphics cards sometime in the second quarter.
If all this talk about matrix engines and media encoders got you hot and bothered, be sure to check out our separate coverage of the Arc 3 laptop GPU launch for a more practical look at what Intel is actually doing with all these hardware features. Those Deep Link capabilities could be some mighty delicious special sauce indeed.
Now, all that’s left to do is wait for reviews.
[ad_2]
0 comments:
Post a Comment