AI & Machine Learning
NVIDIA Super POD
Personal / Open Source
Ongoing
Creator / Maintainer
AI Infrastructure & LLM
Tech Stack
Kubernetes
NVIDIA GPU Operator
DCGM
Triton Inference Server
SLURM
Terraform
Summary
Self-provisioned GPU cluster on AWS with full observability and HPC-style job scheduling for multi-model inference serving.
What I Built
-
Provisioned a GPU cluster on AWS using Terraform (g4dn Spot instances) for cost-efficient compute.
-
Deployed Kubernetes with the NVIDIA GPU Operator and DCGM exporter feeding Prometheus/Grafana dashboards.
-
Configured Triton Inference Server for multi-model concurrent serving.
-
Set up SLURM/enroot for HPC-style job scheduling on the cluster.
