NVIDIA's GTC 2021 Highlights

April 20, 2021, at 05:42 PMerika

So, I haven't yet watched all these, but when the program came out, this is my hit-list of highlights for this year's FREE NVIDIA GPU Technology Conference (GTC), which ran Apr 12 - 16th, 2021. I may have simplified the titles a touch... Read more...

Content only available on-demand until May 11, 2021.

GPU and Dev Tools

  • How GPU Computing Works (Modern-Day)
    • Stephen Jones - CUDA Architect, NVIDIA
    • We have lots of flops - latency / delay in accessing memory is now the common bottleneck
    • Increasing the number of threads is a great way to fix this (and how GPUs have been architected to workaround latency)
    • For operations that require all data interacting with all other data (eg. matrix multiplications), TensorCores are designed specifically for large (> 400x400) matrix multiplication, improving the upper limit - not sensible for other operations, but one way to handle (some?) all-to-all operations.
  • CUDA - New Features and Beyond
    • Stephen Jones - CUDA Architect, NVIDIA
  • Concurrent Algorithms on CUDA
  • CUDA - Latests Developer Tools
  • GPU Profiling with NVIDIA Nsight
  • Debugging & Analyzing CUDA Correctness
  • NVIDIA C Standard Library
    • Bryce Lelbach - HPC Programming Models Architect, NVIDIA
    • Currently, NVCC (libcu++ - libcudacxx) - using cuda and cuda::std namespaces. Requires demarking __host__ and __device__. Does not cover all std libraries. Focus on concurrency, clocks, syscalls; and items commonly re-implemented (type_traits, tuples)
    • Adding NVC++ (libnv++) -- to be released soon. Implements std.(Not ABI compatible with gcc)
    • Good support for atomic (avoid using 'volatile'), and memcpy_async (asynchronous memory copying), and preps for future C++20 arrive/wait functionality
    • For examples of parallelized datastructures (eg. hash maps), see cuCollections
    • Future work covers future c++2* implementations, including parallel algorithms for ranges, executors, and parallel linear algebra algorithms (based off BLAS)
  • Future of standard and CUDA C
    • Bryce Lelbach - HPC Programming Models Architect, NVIDIA
    • c++20 last year: modules, coroutines, concepts, ranges, scalable synchronization
    • c++2* looking at: executors (managing and orchestrating execution), static reflection (eg. compile time knowledge of parameters, member functions), and pattern matching (regex?); also eg. float types - float16_t, etc.
    • want to have language that can easily harness parallelization (eg. generally look serialized, but underhood parallelized); eg. c++17 introduced std::execution::par for flagging parallel execution
    • Panel FAQ covering a variety of specific questions on c++2* and nv*

Ray Tracing




BlogIt Side bar

Recently Written

erika: end hiding comments and categories

erika: end hiding comments and categories

Group-Specific Sidebar

Blix theme adapted by David Gilbert, powered by BlogIt