#cuda

(all tags)

Publications

Collaborative (CPU+ GPU) Algorithms for Triangle Counting and Truss Decomposition

Vikram S. Mailthody, Ketan Date, Zaid Qureshi, Carl Pearson, Rakesh Nagi, Jinjun Xiong, Wen-Mei Hwu

2018 IEEE High Performance Extreme Computing Conference

Movement and Placement of Non-Contiguous Data In Distributed GPU Computing

Carl Pearson

Ph.D. Dissertation

Adaptive Cache Bypass and Insertion for Many-Core Accelerators

Xuhao Chen, Shengzhao Wu, Li-Wen Chang, Wei-Sheng Huang, Carl Pearson, Wen-mei Hwu

Proceedings of International Workshop on Manycore Embedded Systems, 2016

Machine Learning for CUDA+MPI Design Rules

Carl Pearson, Aurya Javeed, Karen Devine

23rd IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC)

Posts

Using Kokkos Tools and Nsight Systems to Understand your Kokkos Application

CUDA Releases: Component Versions and Sizes

Improving MPI_Pack performance in CUDA-aware MPI

Talks

Benchmarking CUDA Communication Primitives on High-Bandwidth Interconnects

ADA Liason Meeting

Latency and Bandwidth Microbenchmarks of US Department of Energy Systems in the June 2023 Top500 List

Supercomputing 2023

Latency and Bandwidth Microbenchmarks of Six US Department of Energy Systems in the Top500

Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects

ACM/SPEC International Conference on Performance Engineering

Kokkos Kernels: State on Exascale Architectures

Kokkos User Group Meeting 2023