Hitesh Sahu Hitesh Sahu

Home
›
work
›
…
›
5 gpu fabric bench

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🦥 Sloths can hold their breath longer than dolphins 🐬.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

AI-Machine-Learning

AI & Machine Learning

Cloud & DevOps

Full-Stack Applications

Mobile Development

Cover Image for GPU Fabric Bench

AI & Machine Learning

GPU Fabric Bench

Personal / Open Source

Ongoing

Creator / Maintainer

AI Infrastructure & LLM

Tech Stack

NCCL

EFA

RDMA

MPI

Terraform

Ansible

Summary

RDMA/EFA fabric benchmarking for multi-node GPU training, measuring NCCL collective communication throughput at near-peak EFA bandwidth.

What I Built

Benchmarked RDMA/EFA fabric performance for NCCL GPU collective communications.
Ran AllReduce sweeps (1K→4G) across 2×p4d.24xlarge instances (16×A100 GPUs).
Measured 55 GB/s bus bandwidth — roughly 90% of the 400 Gbps EFA peak.
Provisioned the benchmarking infrastructure with Terraform and Ansible.

NVIDIA Super POD

Prompt Bridge

Let's work together

+49 176-2019-2523

hiteshkrsahu@gmail.com

WhatsApp

Skype

Munich 🥨, Germany 🇩🇪, EU

Playstore

Hitesh Sahu's apps on Google Play Store

Need Help?

Let's Connect

Navigation

Home/About

Skills

Work/Projects

Lab/Experiments

Contribution

Awards

Art/Sketches

Thoughts

Contact

Links

Sitemap

Legal Notice

Privacy Policy

Made with

NextJS by

| © 2026 All rights reserved.