mouse-and-keyboard vs controller is still the real console divide —
DX12U will bring DirectX Raytracing, Mesh Shaders, and more to Windows PCs.
Today, Microsoft is announcing a new version of its gaming and multimedia API platform, DirectX. The new version, DirectX 12 Ultimate, largely unifies Windows PCs with the upcoming Xbox Series X platform, offering the platform’s new precision rendering features to Windows gamers with supporting video cards.
Many of the new features have more to do with the software side of development than the hardware. The new DirectX 12 Ultimate API calls aren’t just enabling access to new hardware features, they’re offering deeper, lower-level, and potentially more efficient access to hardware features and resources that are already present.
For now, the new features are slated largely for Nvidia cards only, with “full support on GeForce RTX”—the presentation you’re seeing slides from actually came from Nvidia itself, not Microsoft. Meanwhile, AMD has announced that its upcoming slate of RDNA 2 GPUs will have “full support” for the DirectX 12 Ultimate API—but not any prior generations of AMD cards. (AMD takes the opportunity to remind gamers that the same RDNA 2 architecture is powering both Microsoft’s Xbox Series X and Sony’s PlayStation 5 consoles.)
Some of the new calls are reminiscent of work that AMD has done independently in Radeon drivers. For example, variable rate shading strikes us as similar to AMD’s Radeon Boost system, which dynamically lowers frame resolution during rapid panning. While these features certainly aren’t the same thing, they’re similar enough in concept that we know AMD has at least been thinking along similar lines.
DirectX ray tracing
DirectX Ray Tracing, aka DXR, isn’t brand new—DXR1.0 was introduced two years ago. However, DirectX 12 Ultimate introduces several new features under a DXR1.1 versioning scheme. None of DXR1.1’s features require new hardware—existing ray-tracing-capable GPUs merely need driver support in order to enable them.
For the moment, only Nvidia offers customer-facing PC graphics cards with hardware ray tracing. However, the Xbox Series X offers ray tracing on its custom Radeon GPU hardware—and AMD CEO Lisa Su said to expect discrete Radeon graphics cards with ray tracing support “as we go through 2020” at CES2020 in January.
Inline ray tracing
Inline ray tracing is an alternate API that allows developers lower-level access to the ray tracing pipeline than DXR1.0’s dynamic-shader based ray tracing. Rather than replacing dynamic-shader ray tracing, inline ray tracing is present as an alternative model, which can enable developers to make inexpensive ray tracing calls that don’t carry the full weight of a dynamic shader call. Examples include constrained shadow calculation, queries from shaders that don’t support dynamic-shader rays, or simple recursive rays.
There’s no simple answer as to when inline ray tracing is more appropriate than dynamic; developers will need to experiment to figure out the best balance between use of both toolsets.
DispatchRays() via ExecuteIndirect()
Shaders running on the GPU can now generate a list of DispatchRays() calls, including their individual parameters. This can significantly reduce latency for scenarios that prepare and immediately spawn ray tracing work on the GPU, since it eliminates a round-trip to the CPU and back.
Growing state objects via AddToStateObject()
Under DXR1.0, if developers wanted to add a new shader to an existing ray tracing pipeline, they would need to instantiate an entirely new pipeline with an extra shader, copying the existing shaders to the new pipeline along with the new one. This required the system to parse and validate the existing shaders as well as the new one, when the new pipeline is instantiated.
AddToStateObject() eliminates this waste by doing just what it sounds like: allowing developers to expand an existing ray tracing pipeline in place, requiring parsing and validation of only the new shader. The efficiency bump here should be obvious: a 1,000-shader pipeline that needs to add a single new shader now only needs to validate one shader, rather than 1,001.
GeometryIndex() in ray tracing shaders
GeometryIndex() allows shaders to distinguish geometries within bottom-level acceleration structures, without needing to change data in the shader records for each geometry. In other words, all the geometries in a bottom-level acceleration structure can now share the same shader record. When needed, shaders can use GeometryIndex() to index into the app’s own data structures.
Skipping primitive instantiation with config tweaks
Developers can optimize ray tracing pipelines by skipping unnecessary primitives. For example, DXR1.0 offers
DXR1.1 adds additional options for
Variable rate shading
Variable rate shading (VRS) bills itself as “a scalpel in the world of sledgehammers.” VRS allows developers to select the shading rate on portions of frames independently, focusing the majority of the detail—and the rendering workload—on the portions that actually need it and leaving background or otherwise visually unimportant elements to render more rapidly.
There are two hardware tiers for VRS support. Tier 1 hardware can implement per-draw shading rates, which would allow developers to draw large, far away, or obscured assets with lower shading detail, then draw detailed assets with higher shading detail.
If you know that a first-person shooter gamer will be paying more attention to their crosshairs than anywhere else, you can have maximum shading detail in that area, falling gradually off to lowest shading detail in the peripheral vision.
A real-time strategy or roleplaying game developer, on the other hand, might instead choose to focus extra shading detail on edge boundaries, where aliasing artifacts are more likely to be visually obnoxious.
Per-primitive VRS takes things a step further by allowing developers to specify shading rate on a per-triangle basis. One obvious use case is for games with motion blur effects—why bother rendering detailed shadows on faraway objects if you know you’re going to blur them anyway?
Screenspace and per-primitive variable rate shading can be mixed and matched within the same scene, using VRS combiners.
Mesh and amplification shaders
Mesh shaders parallelize mesh processing using a compute programming model. Chunks of the overall mesh are separated into “meshlets,” each consisting typically of 200 or fewer vertices. The individual meshlets can then be processed simultaneously rather than sequentially.
Mesh shaders dispatch a set of threadgroups, each of which processes a different meshlet. Each threadgroup can access groupshared memory but can output vertices and primitives that don’t need to correlate with a specific thread in the group.
This greatly reduces rendering latency, particularly for geometries with linear bottlenecks. It also allows developers much more granular control over separate pieces of the overall mesh rather than needing to treat the entire geometry as a whole.
Amplification shaders are essentially collections of mesh shaders managed and instantiated as one. An amplification shader dispatches threadgroups of mesh shaders, each of which has access to the amplification shader’s data.
Sampler feedback essentially makes it simpler for developers to figure out in what level of detail to render textures on the fly. With this feature, a shader can query what part of a texture would be needed to satisfy a sampling request without actually having to perform the sample operation. This allows games to render larger, more detailed textures while using less video memory.
Texture Spacing Shading expands on the sampler feedback technique by allowing the game to apply shading effects to a texture, independent of the object the texture is wrapped around. For example, a cube with only three faces visible doesn’t need lighting effects applied to the back three faces.
Using TSS, lighting effects can be applied to only the visible portions of the texture. In our cube example, this might mean only lighting the portion wrapping the three visible faces in compute space. This can be done prior to and independent from rasterization, reducing aliasing and minimizing the computation expense of the lighting effects.
Listing image by Nvidia