#cuda

(all tags)

Publications

Collaborative (CPU+ GPU) Algorithms for Triangle Counting and Truss Decomposition
Vikram S. Mailthody, Ketan Date, Zaid Qureshi, Carl Pearson, Rakesh Nagi, Jinjun Xiong, Wen-Mei Hwu
in
2018 IEEE High Performance Extreme Computing Conference
09/18
Movement and Placement of Non-Contiguous Data In Distributed GPU Computing
Carl Pearson
Ph.D. Dissertation
04/21
Machine Learning for CUDA+MPI Design Rules
Carl Pearson, Aurya Javeed, Karen Devine
in
23rd IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC)
03/22
Adaptive Cache Bypass and Insertion for Many-Core Accelerators
Xuhao Chen, Shengzhao Wu, Li-Wen Chang, Wei-Sheng Huang, Carl Pearson, Wen-mei Hwu
in
Proceedings of International Workshop on Manycore Embedded Systems, 2016
06/14

Posts

Improving MPI_Pack performance in CUDA-aware MPI
CUDA Releases: Component Versions and Sizes
Using Kokkos Tools and Nsight Systems to Understand your Kokkos Application

Talks

Latency and Bandwidth Microbenchmarks of US Department of Energy Systems in the June 2023 Top500 List
at
Supercomputing 2023
11/13/23
Kokkos Kernels: State on Exascale Architectures
at
Kokkos User Group Meeting 2023
12/12/23
Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects
at
ACM/SPEC International Conference on Performance Engineering
04/10/19
Latency and Bandwidth Microbenchmarks of Six US Department of Energy Systems in the Top500
at
Cluster 2023
11/02/23
Benchmarking CUDA Communication Primitives on High-Bandwidth Interconnects
at
ADA Liason Meeting
06/05/19