Core Concepts
Spot migration
How Kubeadapt scores workloads for spot eligibility, what blocks a migration, and what a low-risk recommendation means in practice.
You spotted a cluster with a 6-figure annual bill and one node pool of m5.xlarge on-demand. Spot pricing on the same instance type runs 60–80% cheaper, but only some of your workloads can safely move. Spot Migration is the page that tells you which ones — with a score, a risk level, and a list of blockers per workload.
What a spot migration is
Spot instances (AWS), preemptible VMs (GCP), and spot VMs (Azure) are unused cloud capacity sold at a steep discount. The trade-off is preemption: the provider can reclaim the instance with a short warning (typically two minutes). Stateless, replicated, restart-tolerant workloads handle that gracefully. Stateful or singleton workloads do not.
Kubeadapt does not move workloads for you. It scores each workload's spot suitability, surfaces the cost delta, and flags every reason a workload might not be safe. You take the decision and apply the node selector / toleration / scheduling change through your usual deployment pipeline.
How the eligibility score works
The score is rule-based, not machine-learned. For each workload, Kubeadapt inspects the controller, the pod spec, attached storage, the PodDisruptionBudget, replica count, and the rolling update strategy, then assigns an eligibility score from 0 to 100.
The score maps to a risk level via a fixed cutoff:
| Risk level | Eligibility score | Counts toward Available Savings |
|---|---|---|
| Low | Above 70 | Yes |
| Medium | 40–70 | No |
| High | Below 40 | No |
A workload scoring exactly 70 falls into Medium, not Low. The cutoff is intentionally conservative — it keeps single-replica services and other borderline workloads out of the low-risk shortlist.
What blocks a workload
Five conditions, in order of how often they appear in real clusters:
- Persistent storage attached. If the workload binds a PV with a non-spot-friendly StorageClass — anything that can't be reattached to a new node in a different AZ — the migration is unsafe. The page lists the offending storage classes so you can match them against your cluster.
- No PodDisruptionBudget. A spot reclaim is a voluntary disruption from Kubernetes' perspective. Without a PDB, there's no contract on minimum availability when the kubelet drains the node. Workloads without a PDB get a risk penalty.
- Single replica. A workload with
replicas: 1has no redundancy. Reclaim the node and the workload is unavailable until reschedule. Kubeadapt flags this even when the controller would survive (Deployments, StatefulSets), because reschedule time on spot capacity churn is the real cost. - StatefulSet semantics. StatefulSets with
parallelPodManagementPolicy: OrderedReady(the default) take longer to recover and may stall mid-recreation if spot capacity is constrained. - Recreate update strategy. A
Deploymentusingstrategy: Recreateinstead ofRollingUpdateaccepts downtime during updates by design. Combined with spot reclaim, the downtime windows multiply.
A workload can fail one of these and still score in the low-risk band if every other dimension is clean. The score is a weighted assessment, not a single-blocker veto. Open any workload row to see the per-dimension breakdown under Migration Assessment.
What "low risk" means in practice
Low risk means the migration will not violate availability contracts the workload has declared. It does not mean the workload is always safe to move:
- The score reflects spec, not behavior. A pod that opens a 2-hour TCP connection on startup will technically migrate fine, but the user on the other end will notice.
- The score reflects current spec. A team that adds a singleton sidecar tomorrow may drop the workload below the cutoff. Re-check the score before applying.
- Cluster-level constraints (e.g., the node group has no spot capacity in your region) are evaluated separately. Open Capacity Planning to confirm spot supply.
The safe pattern: filter the Spot Migration page to low-risk workloads, sort by monthly savings descending, and start with the top three. Cut over one workload, let it run for a week, then move the next. Cluster-wide spot moves on day one tend to surface preemption-recovery edge cases that take a day to debug.
What the page shows
Per-cluster page at Clusters → [cluster] → Spot Migration. Top of the page:
- Total Monthly Savings — sum of monthly savings across every spot recommendation.
- Total Workloads — every workload Kubeadapt evaluated (eligible and not eligible).
- Eligible Workloads — workloads that meet the spot-eligibility criteria.
- Avg Risk — average risk level across the evaluated workloads.
The table below lists every workload with name, namespace, controller type, replicas, risk level, current monthly cost, projected spot cost, and monthly savings. Filter by namespace, eligibility, or risk level; search by workload name. Click a row to see the assessment, storage config, PDB config, rolling-update strategy, and per-dimension cost analysis.
Acting on a recommendation
Kubeadapt is read-only — the change to move a workload onto spot is yours. The minimal apply path:
- Add a node selector or affinity rule pinning the workload to spot-capable node groups.
- Add a toleration for your spot taint (commonly
node.kubernetes.io/lifecycle=spot:NoSchedule). - If the workload has fewer than 2 replicas, scale up first. Spot is not a replacement for replication.
- Apply through your existing CI/CD path. Kubeadapt picks up the change on the next snapshot (60s).
Once the workload is running on spot, Kubeadapt will see it on the next cluster update and the recommendation closes out automatically.
What's not here
- Automated apply. Kubeadapt does not patch your manifests. The Shift-Left cost gates feature that opens PRs is alpha and not yet in v1 docs.
- Cross-cluster ranking. Recommendations are per-cluster; there is no "best spot opportunity across your fleet" view today.
- Diversification advice. The analyzer does not tell you which instance families to spread across to reduce simultaneous-reclaim risk. Use your cluster autoscaler's diversification config for that.
Next steps
- Right-sizing — pair spot migration with right-sized requests for compounding savings.
- Best practices — the page also flags missing PDBs and single-replica workloads, both blockers above.