A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling

View Researcher's Other Codes

Disclaimer: The provided code links for this paper are external links. Science Nest has no responsibility for the accuracy, legality or content of these links. Also, by downloading this code(s), you agree to comply with the terms of use as set out by the author(s) of the code(s).

Please contact us in case of a broken link from here

Authors Elias Konstantinidis, Yiannis Cotronis
Journal/Conference Name Journal of Parallel and Distributed Computing
Paper Category
Paper Abstract Typically, the execution time of a kernel on a GPU is a difficult to predict measure as it depends on a wide range of factors. Performance can be limited by either memory transfer, compute throughput or other latencies. In this paper, we improve on the roofline model following a quantitative approach and present a completely automated GPU performance prediction technique. In this respect this model utilizes micro-benchmarking and profiling in a “black box” fashion as no inspection of source/binary code is required. The proposed model combines parameters in order to characterize the performance limiting factor and to estimate execution time. In addition, we propose the quadrant-split visual representation, which captures the characteristics of multiple processors in relation to a particular kernel. We performed experiments on stencil computation (red/black SOR), SGEMM and a total of 28 kernels of the Rodinia benchmark suite, using six CUDA GPUs and we showed an absolute error in predictions of 27.66% in the average case. Furthermore, the performance model was also examined on an AMD GPU through the HIP programming environment. Prediction errors were comparable despite the significant architectural differences between different vendor GPUs.
Date of publication 2017
Code Programming Language Cuda
Comment

Copyright Researcher 2022