AI & Machine Learning
GPU Fabric Bench
Personal / Open Source
Ongoing
Creator / Maintainer
AI Infrastructure & LLM
Tech Stack
NCCL
EFA
RDMA
MPI
Terraform
Ansible
Summary
RDMA/EFA fabric benchmarking for multi-node GPU training, measuring NCCL collective communication throughput at near-peak EFA bandwidth.
What I Built
-
Benchmarked RDMA/EFA fabric performance for NCCL GPU collective communications.
-
Ran AllReduce sweeps (1K→4G) across 2×p4d.24xlarge instances (16×A100 GPUs).
-
Measured 55 GB/s bus bandwidth — roughly 90% of the 400 Gbps EFA peak.
-
Provisioned the benchmarking infrastructure with Terraform and Ansible.
