disckitty.ca Tech/NVIDIA's GTC 2021 Highlights

So, I haven't yet watched all these, but when the program came out, this is my hit-list of highlights for this year's FREE NVIDIA GPU Technology Conference (GTC), which ran Apr 12 - 16th, 2021. I may have simplified the titles a touch... Read more...

Content only available on-demand until May 11, 2021.

GPU and Dev Tools

How GPU Computing Works (Modern-Day)
- Stephen Jones - CUDA Architect, NVIDIA
- We have lots of flops - latency / delay in accessing memory is now the common bottleneck
- Increasing the number of threads is a great way to fix this (and how GPUs have been architected to workaround latency)
- For operations that require all data interacting with all other data (eg. matrix multiplications), TensorCores are designed specifically for large (> 400x400) matrix multiplication, improving the upper limit - not sensible for other operations, but one way to handle (some?) all-to-all operations.
CUDA - New Features and Beyond
- Stephen Jones - CUDA Architect, NVIDIA
Concurrent Algorithms on CUDA
CUDA - Latests Developer Tools
GPU Profiling with NVIDIA Nsight
Debugging & Analyzing CUDA Correctness
NVIDIA C Standard Library
- Bryce Lelbach - HPC Programming Models Architect, NVIDIA
- Currently, NVCC (libcu++ - libcudacxx) - using cuda and cuda::std namespaces. Requires demarking __host__ and __device__. Does not cover all std libraries. Focus on concurrency, clocks, syscalls; and items commonly re-implemented (type_traits, tuples)
- Adding NVC++ (libnv++) -- to be released soon. Implements std.(Not ABI compatible with gcc)
- Good support for atomic (avoid using 'volatile'), and memcpy_async (asynchronous memory copying), and preps for future C++20 arrive/wait functionality
- For examples of parallelized datastructures (eg. hash maps), see cuCollections
- Future work covers future c++2* implementations, including parallel algorithms for ranges, executors, and parallel linear algebra algorithms (based off BLAS)
Future of standard and CUDA C
- Bryce Lelbach - HPC Programming Models Architect, NVIDIA
- c++20 last year: modules, coroutines, concepts, ranges, scalable synchronization
- c++2* looking at: executors (managing and orchestrating execution), static reflection (eg. compile time knowledge of parameters, member functions), and pattern matching (regex?); also eg. float types - float16_t, etc.
- want to have language that can easily harness parallelization (eg. generally look serialized, but underhood parallelized); eg. c++17 introduced std::execution::par for flagging parallel execution
- Panel FAQ covering a variety of specific questions on c++2* and nv*

disckitty.ca

NVIDIA's GTC 2021 Highlights

GPU and Dev Tools

Ray Tracing

Cloud

Math

Applications