#cuda

(all tags)

Publications

Movement and Placement of Non-Contiguous Data In Distributed GPU Computing
Carl Pearson
Ph.D. Dissertation
04/21
Machine Learning for CUDA+MPI Design Rules
Carl Pearson, Aurya Javeed, Karen Devine
in
23rd IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC)
03/22
Adaptive Cache Bypass and Insertion for Many-Core Accelerators
Xuhao Chen, Shengzhao Wu, Li-Wen Chang, Wei-Sheng Huang, Carl Pearson, Wen-mei Hwu
in
Proceedings of International Workshop on Manycore Embedded Systems, 2016
06/14
Collaborative (CPU+ GPU) Algorithms for Triangle Counting and Truss Decomposition
Vikram S. Mailthody, Ketan Date, Zaid Qureshi, Carl Pearson, Rakesh Nagi, Jinjun Xiong, Wen-Mei Hwu
in
2018 IEEE High Performance Extreme Computing Conference
09/18

Posts

Improving MPI_Pack performance in CUDA-aware MPI
CUDA Releases: Component Versions and Sizes
Using nvtx-connector and Nsight Systems to Understand your Kokkos Application

Talks

Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects
at
ACM/SPEC International Conference on Performance Engineering
04/10/19
Benchmarking CUDA Communication Primitives on High-Bandwidth Interconnects
at
ADA Liason Meeting
06/05/19
Latency and Bandwidth Microbenchmarks of Six US Department of Energy Systems in the Top500
at
Cluster 2023
11/02/23
Latency and Bandwidth Microbenchmarks of US Department of Energy Systems in the June 2023 Top500 List
at
Supercomputing 2023
11/13/23
Kokkos Kernels: State on Exascale Architectures
at
Kokkos User Group Meeting 2023
12/12/23