NVIDIA Super POD

Ongoing

Creator / Maintainer

AI Infrastructure & LLM

Kubernetes

NVIDIA GPU Operator

DCGM

Triton Inference Server

SLURM

Terraform

Self-provisioned GPU cluster on AWS with full observability and HPC-style job scheduling for multi-model inference serving.

Provisioned a GPU cluster on AWS using Terraform (g4dn Spot instances) for cost-efficient compute.
Deployed Kubernetes with the NVIDIA GPU Operator and DCGM exporter feeding Prometheus/Grafana dashboards.
Configured Triton Inference Server for multi-model concurrent serving.
Set up SLURM/enroot for HPC-style job scheduling on the cluster.

RAG Factory

GPU Fabric Bench

Loading ⏳

💡 Did you know?