I was lucky enough to be able to visit GDC this year. As always, there were a lot of interesting talks, ideas and announcements to come out of it. I wasn’t able to go to each one of these, but wanted to do a quick info dump of the ones I was able to attend. Quick disclaimer that this post isn’t meant to be comprehensive (I’m only covering rendering/GPU/engine talks that I was able to attend), and some of my comments might be subjective to an extent (I might also be wrong, in which case please correct me :)). Also, Krzysztof Narkowicz is collating a list of presentation slides which you can find on his blog here
. I’ll will update this article with links to the presentations as and when they become available.
Day 1 (Feb 27)
The first day of the conference was Low-level API day with D3D12 and Vulkan being the main focus. The talks included.
-
“D3D12 and Vulkan Done Right” & “Wave Programming in D3D12 and Vulkan
– Holger Gruen(NVIDIA), David Lively (AMD), Gareth Thomas (AMD) -
How to Thrive on the Bleeding Edge Whilst Avoiding Death by 1,000 Paper Cuts
– Jurjen Katsman (Nixxes) -
“Async Compute: Deep Dive” & “Raster Ordered Views and Conservative Rasterization
– Alex Dunn (NVIDIA), Stephan Hodes (AMD), Evgeny Makarov (NVIDIA), Rahul Sathe (NVIDIA) -
Moving to DirectX 12: Lessons Learned
– Tiago Rodrigues (Ubisoft Montreal) -
“Cinematic Depth of Field” & “Advanced Particle Simulation in Compute”
– Richard Tonge (NVIDIA), Hammad Mazhar (NVIDIA), Karl Hillesland (AMD)
The common theme was how to do resource barriers correctly, inspect and avoid memory overcommitment, and how to best deal with pipelines states (PSOs). Yuriy O’Donnel had an excellect talk
on how to perform resource transitions and manage residency programmatically in Frostibe using FRAMEGRAPH.
Wave Programming
in SM6 “enable the elimination of barrier constructs when the scope of synchronization is within the width of the SIMD processor, or some other set of threads that are known to be atomic relative to each other.” This allows for fast parallel reduction
. Wave programming also allows one to query
Helper Lanes (used for pixel quad gradients) which was previously abstracted away. Using wave voting
schemes such as WaveAllTrue()
allows one to choose optimal code path for divergent vs. non-divergent control flow.
Early adopters of DX12 explained some of the growing pains with it, although the situation seems to be improving with Microsoft releasing PIX support for DX12 (beta)
and post- TDR
analysis tools such as NVIDIA Aftermath
. It was also suggested that the driver (and complexity) cannot be overlooked even when working with low level APIs. At any rate, it will be hard for your first DX12 implementation to beat years of DX11 driver development and optimization. The best benefits of DX12 come from multi-GPU and async compute. As such, it was suggested that, it might be best to lose the DX11 training wheels and develop directly on DX12 so as to not constantly have to justify the DX12 time ivestment compared to DX11.
For Async Compute, it is important to ensure proper work pairing (using fences). One needs to be aware of async tax
when running workloads in this mode. It is best to maintain non-async code paths and profile rather than just fire-and-forget. Since async compute is like CPU HyperThreading (multiple threads sharing same H/W resource), it is important to monitor resource contention. Other than monitoring register pressure, one also needs to monitor cache trashing. One option suggested was to use dummy LDS to reduce occupancy so as to reduce cache trashing ¯\_(ツ)_/¯ Personally, I would love to see that information exposed to profiling tools.
Rahul Sathe (NVIDIA) talked about using programmable sample locations and conservative raterization. Using conservative rasterization violates the uniqueness of the top-left rule
of rasterization which “ensures that adjacent triangles are drawn once”. This can cause flickering artifacts as writes for overlapping pixel shader invocations is unordered (“overlapping” is defined as invocations that are generated by the same draw calls and share the same pixel coordinate when in pixel-frequency execution mode, and the same pixel and sample coordinate in sample-frequency mode.) This flickering can also come from a single triangle that gets clipped outside the guard band, and produces multiple adjacent triangles. Raster Order Views
are like UAVs but they ensure that pixel shader invocations are executed in the order in which the geometry is submitted. Pairing ROVs with conservative rasterization fixes the temporal instability.
The Cinematic Depth of Field
talk proposed using a scatter-based approach for constant time variable width blur called
Fast Filter Spreading
which is like reverse- Summed Area Tables
in spirit. Unlike SATs this can be extended to perform a Bartlett filter (tent filter) rather than a simple box filter. Also, unlike SATs which have a precision meltdown at higher resolutions, this technique is resolution independent and has a fixed cost, although that cost is quite high. The SPH
based particle simulation system was also discussed.
Day 2 (Feb 28)
-
The Future of Rendering in Unity
– Tim Cooper (Unity) -
Toward Film-Like Pixel Quality in Real-Time Games
– Hao Chen (Amazon) -
From DCC to Pixels in Seconds: Rapid and Continuous Iteration in Lumberyard
– Nicholas Lawson -
AMD Capsaicin & Cream
-
NVidia Event
-
Unity Keynote
The new C# based scriptable render pipeline in Unity was demonstrated. Aras Pranckevičius had a similar talk at the AMD Capsaicin & Cream event, the slides for which can be found here
Amazon Lumberyard had a big presence on this day. Hao Chen demonstrated Lumberyard’s commitment to image quality. His talk was an in-depth primer on different types of aliasing
artifacts that make games fall short of cinematics, such as shader aliasing
, specular aliasing
and temporal aliasing
.
Nicholas Lawson’s talk described a content workflow in Lumberyard that can be best described as continuous integration for content
. There is an AssetProcessor service that runs in the background, and monitors your game content directories for changes made. When it detects changes, it will trigger any import/crunch steps required to bring it in game, and will even live update any instances of the asset in game while it is running. Very cool stuff!
AMD’s Capsaicin & Cream event announced the Radeon Rx Vega GPU based on Vega Architecture
. The new NCUs in the Vega will have support for Rapid Packed Math (RPM) which can be used to speed up FP16 calculations. The event also hosted talks from industry professionals such as
-
Scriptable Render Pipeline, Future of Rendering in Unity
– Aras Pranckevičius (Unity) -
Improving Texture Compression in Games
– Stephanie Hurlburt (Binomial)
The Nvidia event announced the 1080 Ti and the availability of Gameworks for DX12
. This was also the first time Nvidia acknowledged the use of Tiled Caching
on Pascal which started with Maxwell
.
Day 3 (March 1)
-
Cold, Hard Cache: Insomniac’s Cache Simulator
– Andreas Fredriksson (Insomniac Games) -
Advanced Shader Programming on GCN
– Timothy Lottes (AMD) -
Real-Time Rendering for Feature Film: Rogue One, a Case Study
– John Knoll (Industrial Light & Magic), Naty Hoffman (ILMxLAB), Roger Cordes (ILMxLAB) -
State of Unreal
The CacheSim talk by Andreas Fredriksson was clearly a hit! He showed how he used the Trap Flag of the EFLAGS register to simulate and report cache misses for the AMD Jaguar CPU. The source code is available at GitHub
.
Timothy’s GCN talk demonstrated the use of packed math to get performance speedups among other things.
One of the coolest things for me was the use of Unreal’s realtime rendering in BLACKBIRD
. Also, check out this awesome short film created by the same guys – “The Human Race”.
It was also encouraging to see the Rogue One shots that were composed using Unreal Engine 4
. It goes to show that realtime rendering is coming of age. Exciting times! Also, congrats to Tim Sweeney on the Lifetime Achievement Award
.
Day 4 (March 2)
-
NVIDIA Aftermath: A New Way of Debugging Crashes on the GPU
– Alex Dunn (NVIDIA) -
FrameGraph: Extensible Rendering Architecture in Frostbite
– Yuriy O’Donnell (Frostbite) -
PBR Diffuse Lighting for GGX+Smith Microsurfaces
– Earl Hammon, Jr. (Respawn Entertainment)
NVidia Aftermath
is a post-mortem GPU crash analysis tool. It is sufficiently light weight and can be shipped with the game to examine GPU crashes in the wild. Its unobtrusive nature can help catch Heisenbugs
which validation and debug layers may fail to catch.
Yuriy O’Donnel demonstrated the use of FRAMEGRAPH in the Frostbite engine. Render passes are annoted to create a high-level representation (DAG) of the frame, which is used to simplify resource management, better/automatic async compute pairing and simplified barriers. FRAMEGRAPH + Memory Aliasing is used to get ~50% savings in render target memory and to maintain full utilization of ESRAM on XB1.
The award for the most amount of math crammed into a presenation probably goes to Earl Hammon, Jr. and his talk on PBR Diffuse Lighting for GGX+Smith Microsurfaces
, but it was easily one of my favorite talks at GDC this year. He goes over the microfactet BRDF normalization derivation, and then works out his solution for energy conserving diffuse for GGX+Smith specular. He demonstrated how to interpret BRDF slices
. He also showed how to use trigonometric identities to compute NdotH and LdotH without ever computing the half-vector H (and thereby saving cycles).
And that’s it. I wasn’t there on Day 5 and had to miss out on some interesting talks including the 4K Checkerboard talk by Graham Wihlidal. I’ll add links to other talks of interest as slides become available. Once again, if you are looking for a more comprehensive list of talks you can find them here
.