Offloading Communication Control Logic in GPU Accelerated Applications

View Researcher's Other Codes

Disclaimer: The provided code links for this paper are external links. Science Nest has no responsibility for the accuracy, legality or content of these links. Also, by downloading this code(s), you agree to comply with the terms of use as set out by the author(s) of the code(s).

Please contact us in case of a broken link from here

Authors Elena Agostini, Davide Rossetti, Sreeram Potluri
Journal/Conference Name 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)
Paper Category
Paper Abstract NVIDIA GPUDirect is a family of technologies aimed at optimizing data movement among GPUs (P2P) or between GPUs and third-party devices (RDMA). GPUDirect Async, introduced in CUDA 8.0, is a new addition which allows direct synchronization between GPU and third party devices. For example, Async allows an NVIDIA GPU to directly trigger and poll for completion of communication operations queued to an InfiniBand Connect-IB network adapter, removing CPU involvement from the critical path in GPU accelerated applications. In this paper, we present the building blocks of GPUDirect Async and explain the supported usage models of this new technology. We also present a performance evaluation using a micro-benchmark and a synthetic stencil benchmark. Finally, we demonstrate the use of Async in a few multi-GPU MPI applications HPGMG-FV (geometric multi-grid), achieving up to 25% improvement in total execution time; CoMD-CUDA (classical molecular dynamics), reducing communications times up to 30%; LULESH2-CUDA, achieving an average performance improvement of 13% in the total execution time.
Date of publication 2017
Code Programming Language C++
Comment

Copyright Researcher 2022