Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. work
  4. ›
  5. …

  6. ›
  7. 5 gpu fabric bench

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🦥 Sloths can hold their breath longer than dolphins 🐬.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

AI-Machine-Learning

    AI & Machine Learning

    Cloud & DevOps

    Full-Stack Applications

    Mobile Development

Cover Image for GPU Fabric Bench
AI & Machine Learning

GPU Fabric Bench

Personal / Open Source

Ongoing

Creator / Maintainer

AI Infrastructure & LLM

Tech Stack
NCCL
EFA
RDMA
MPI
Terraform
Ansible

Summary

RDMA/EFA fabric benchmarking for multi-node GPU training, measuring NCCL collective communication throughput at near-peak EFA bandwidth.


What I Built
  • Benchmarked RDMA/EFA fabric performance for NCCL GPU collective communications.

  • Ran AllReduce sweeps (1K→4G) across 2×p4d.24xlarge instances (16×A100 GPUs).

  • Measured 55 GB/s bus bandwidth — roughly 90% of the 400 Gbps EFA peak.

  • Provisioned the benchmarking infrastructure with Terraform and Ansible.

← Previous

NVIDIA Super POD

Next →

Prompt Bridge

Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.