Why we rebuilt the agent
The original Kubeadapt agent required Prometheus, OpenCost, node-exporter, and kube-state-metrics before it could collect a single metric. Every cluster needed a full monitoring stack installed up front. Setup was slow, RBAC was complex, and upgrades broke things.
kubeadapt-agent is a ground-up rewrite with a single dependency: metrics-server.
What changed
No more Prometheus, OpenCost, node-exporter, kube-state-metrics
All four dependencies are gone. kubeadapt-agent reads resource usage directly from the Kubernetes Metrics API. The only prerequisite is metrics-server.
GPU metrics, automatically
kubeadapt-agent detects NVIDIA GPUs at startup. If dcgm-exporter is running anywhere in your cluster, the agent finds it and starts collecting GPU utilization, memory, temperature, and power draw per device. See the GPU monitoring guide for details.
<!-- callout:warning -->GPU Sharing Limitations
GPU time-slicing, MPS, and MIG configurations have limited per-workload attribution due to DCGM Exporter constraints. GPU right-sizing works at the node level only in shared GPU setups. See the GPU monitoring guide for specifics.
Real-time collection
The old agent re-fetched every resource on a fixed interval, pulling the same data over and over. kubeadapt-agent uses Kubernetes watch connections: it syncs once at startup, then receives only change events in real time. When a pod scales or a node joins, the agent knows instantly.
It tracks 20+ resource types by default: nodes, pods, deployments, statefulsets, daemonsets, jobs, HPAs, VPAs, persistent volumes, and more.
Auto-discovery
On startup the agent probes the cluster and adapts to what's available:
- Cloud provider (AWS, GCP, Azure)
- VPA and Karpenter if installed
- GPU nodes and NVIDIA metrics exporters
- metrics-server availability
Pre-computed enrichment
Before each snapshot is sent, the agent enriches raw metrics automatically:
- Workload mapping: Every pod is traced back to its parent deployment, statefulset, or daemonset. No orphaned metrics.
- Resource totals: Cluster-wide CPU, memory, GPU, and storage usage are pre-calculated and ready for the dashboard.
Before and after
| Old agent | kubeadapt-agent | |
|---|---|---|
| External dependencies | 4 (Prometheus, OpenCost, node-exporter, kube-state-metrics) | 1 (metrics-server) |
| GPU support | Manual config | Auto-discovered |
| Data collection | Periodic full-list API calls | Informer-based (real-time) |
| Cloud/VPA/Karpenter | Manual or N/A | Auto-detected |
| Health reporting | Basic | Full diagnostics per snapshot |
Migration
<!-- callout:warning -->Breaking Change
This is a 2.0 release on the same Helm chart. The old agent binary is fully replaced. Run
helm upgradeto migrate. The old agent's Prometheus/OpenCost dependencies can be removed after upgrade if nothing else uses them.
Follow the quick-start guide for installation steps.

