Right-sizing
Overview
Right-sizing is the continuous process of adjusting resource requests to match actual usage, eliminating waste while maintaining performance and reliability. It's not a one-time activity, but an ongoing practice that adapts to changing workload patterns, traffic fluctuations, and application evolution.
Kubeadapt's Goal: Reliable Cost Optimization
The primary objective is not aggressive cost cutting, but reliable cost savings that maintain application performance and availability. Recommendations balance:
- Cost efficiency - Reducing waste and unnecessary resource allocation
- Performance reliability - Avoiding degradation and maintaining SLAs
- Environment-specific requirements - Production vs. non-production needs
The challenge:
- Set requests too high → pay for unused resources
- Set requests too low → pods crash or get evicted
- Workloads change over time → yesterday's optimal settings become wasteful
Kubeadapt's continuous approach:
- Analyze actual resource usage over time
- Identify optimal requests based on evolving patterns
- Assess risk levels dynamically
- Generate ready-to-apply recommendations automatically
- Re-evaluate as workloads change and grow
Kubeadapt's Core Philosophy: Request-Centric Rightsizing
The fundamental principle: Rightsizing is achieved by optimizing requests, not limits.
Why Requests Are the Primary Target
1. Requests Determine Allocated Cost
- Cloud providers charge for requested resources (whether utilized or not)
- Over-provisioned requests = wasted money
- Under-provisioned requests = scheduling failures
2. Limits Are a Secondary Concern
- Limits should protect against anomalies, not restrict normal operation
- Tight limits cause throttling (CPU) and OOM kills (memory)
- Properly sized requests reduce the need for aggressive limits
3. CPU vs Memory Behavior
CPU (Compressible Resource):
- Throttling is non-fatal → performance degradation only
- Can be shared and stretched across processes
- Kubeadapt uses P95 percentile for more aggressive optimization
Memory (Non-compressible Resource):
- OOM kill is fatal → pod restart
- Cannot be shared, must be available when needed
- Kubeadapt uses P99 percentile for conservative approach
The Kubeadapt Method
Step 1: Rightsize Requests
- Use P99 (memory) or P95 (CPU) usage + scaling trend coefficient
- Ensures requests match actual utilization
- Primary cost optimization target
Step 2: Configure Limits for Protection
- Production: 110-130% of requests (anomaly tolerance)
- Non-production: 300-600% of requests (developer flexibility)
- Secondary configuration for safety
Result:
- Optimized costs (right-sized requests)
- Reliable performance (controlled limits)
- Eviction prevention (proper QoS class)
The Right-sizing Problem
Why Manual Right-sizing Fails
Without tooling, manual right-sizing requires:
-
Monitoring every workload manually
bash1kubectl top pods --namespace=production- Tedious for 100+ deployments
- Only shows current usage, not historical patterns
-
Analyze patterns over time
- What was peak usage last week?
- Are there daily/weekly spikes?
- Is usage growing or stable?
-
Calculate safe requests
- How much headroom is safe?
- What if traffic spikes?
- Will this cause OOMKills?
-
Update YAML files
- Find the right deployment file
- Update requests
- Apply and monitor
Result: Most teams set high requests "to be safe" and leave them forever.
Typical waste: 60-80% of requested resources go unused.
How Kubeadapt Right-sizing Works
Background: Vertical Right-sizing
Kubeadapt provides vertical (per-pod resource) right-sizing recommendations:
Current Kubernetes Limitations:
- Applying VPA recommendations requires pod restarts (rolling update)
- Rolling updates take time and may cause brief service disruptions
Kubeadapt's Current Approach:
- Provide vertical right-sizing recommendations (requests/limits) for manual or GitOps application
- Recommendations applied via deployment YAML updates
Future: In-place Resource Adjustment
Kubernetes 1.33 (Beta):
- In-place resource adjustment allows changing pod requests/limits without restart
- Requires kubelet flag: text
1--feature-gates=InPlacePodVerticalScaling=true - Kubeadapt can recommend in-place adjustments where supported
Kubernetes 1.35 (Expected Stable):
- Feature expected to reach GA (General Availability)
- Kubeadapt plans to offer automated in-place right-sizing
- No pod restarts = faster optimization with zero downtime
Enhancement Tracking:
Current Recommendation: GitOps-Friendly Workflow (Alpha)
For clusters without in-place adjustment, the workflow is:
- Kubeadapt generates recommendations
- Recommendations exported to Git repository (audit trail)
- CI/CD pipeline applies changes with tracking
- Rolling update occurs (pod restarts)
- Periodic application (daily/weekly recommended)
Alpha Feature: GitOps integration for automated recommendation application with full audit/rollback capabilities.
The 4-Step Process
text1Step 1: Data Collection (Continuous) 2 ↓ 3Step 2: Pattern Analysis (Daily) 4 ↓ 5Step 3: Recommendation Generation 6 ↓ 7Step 4: Risk Assessment (Per recommendation)
Each step is explained in detail below.
Step 1: Data Collection
Data Collection Process
For every pod, every 60 seconds:
1. CPU Usage
- Current CPU usage (millicores)
- CPU throttling events
- Requested CPU
- Limit (if set)
2. Memory Usage
- Current memory usage (bytes)
- Memory limit (if set)
- Requested memory
- OOMKill events
3. Context
- Pod phase (Running, Pending, Failed)
- Restart count
- Node placement
- HPA status (if applicable)
Step 2: Pattern Analysis
Overview: Kubeadapt analyzes workload usage patterns across two dimensions to generate accurate, context-aware recommendations.
| Dimension | Scope | Purpose | Metric |
|---|---|---|---|
| Daily Growth Trend | Day-to-day | Future growth buffer | Daily % change (0%-1%+) |
| Intra-Hour Volatility | Within 1 hour | Spike safety buffer | Coefficient of Variation |
Why both are needed:
- Daily Trend: Detects if workload is growing (needs future headroom)
- Intra-Hour CV: Detects spike unpredictability (needs safety margin)
Both dimensions are independent and can vary separately for the same workload.
Statistical Analysis
For each workload, the following metrics are calculated:
1. Percentile Distribution
text1P50 (median): 250m CPU 2P90 (90th percentile): 380m CPU 3P95 (95th percentile): 420m CPU 4P99 (99th percentile): 480m CPU 5Max (100th percentile): 650m CPU
Why percentiles matter:
- P50 (median) = typical usage
- P95 = most spikes covered
- P99 = almost all spikes covered
- Max = absolute peak (might be anomaly)
2. Daily Growth Trend (Macro Growth Analysis)
Purpose: Detect if workload is growing or shrinking over time to add future growth buffer.
Method: Linear regression on daily average usage across 7-30 days.
Trend calculation (Scale-Agnostic Percentage-Based):
text1Daily % Change = (slope / mean_value) × 100 2 3Where: 4- slope = Linear regression slope of daily averages 5- mean_value = Mean of daily averages over lookback period 6 7Example: 8Day 1 average: 300m CPU 9Day 7 average: 500m CPU 10Slope: ~28.6m/day 11Mean: ~400m 12Daily % Change: (28.6 / 400) × 100 ≈ 7.15%
Why percentage-based (not angle-based):
- Angle-based is scale-dependent: same growth rate yields different angles for CPU (millicores) vs Memory (bytes)text
1arctan(slope) - Percentage-based is scale-agnostic: 10% growth is 10% whether it's CPU or Memory
Trend classifications:
| Daily % Change | Classification | Growth Buffer | Use Case |
|---|---|---|---|
| ≤ 0% | Stable/Decline | 1.0x (no buffer) | Mature workload, no growth |
| 0% - 0.5% | Light growth | 1.0x - 1.10x | Slow, steady growth |
| 0.5% - 1.0% | Steady growth | 1.10x - 1.20x | Consistent expansion |
| ≥ 1% | Rapid growth | 1.20x (+20%) | Fast-growing service |
Buffer is calculated via linear interpolation between 1.0x and 1.20x based on daily % change (capped at 1%).
Why this matters:
- Growing workloads need extra headroom for future usage
- Stable workloads can be sized more aggressively
- Declining workloads may need downsize recommendations
3. Intra-Hour Volatility (Spike Analysis)
Purpose: Detect unpredictable spikes within the same time period to add safety margin.
Method: Calculate Coefficient of Variation (CV) from usage metrics over the lookback period.
Coefficient of Variation formula:
text1CV = Standard Deviation / Mean
Example - CPU usage across lookback period:
text1Day 1: 100m, 150m, 120m, 180m, 110m → Mean: 132m, StdDev: 31m 2Day 2: 105m, 145m, 125m, 175m, 115m → Mean: 133m, StdDev: 28m 3... 4Day 7: 110m, 155m, 130m, 185m, 120m → Mean: 140m, StdDev: 30m 5 6Aggregate CV = 30 / 135 = 0.22 (medium volatility)
Volatility classifications (linear interpolation):
| CV Value | Spike Buffer | Behavior |
|---|---|---|
| 0.0 | 1.0x | Perfectly predictable |
| 0.2 | 1.07x | Stable, minor variation |
| 0.4 | 1.14x | Moderate variation |
| 0.6 | 1.21x | Frequent spikes |
| 0.8 | 1.28x | High unpredictability |
| 1.0+ | 1.35x (max) | Extremely variable |
Formula: 1buffer = 1.0 + min(cv, 1.0) × 0.35
Why this matters:
- Low CV: Usage is predictable → Can size closer to P95
- High CV: Usage is spiky → Need larger buffer above P95 for safety
Key difference from Daily Growth Trend:
- Daily Trend: Measures direction over days (macro)
- Intra-Hour CV: Measures predictability within same hour (micro)
- Both can be high independently (smooth growth vs spiky but stable average)
Visual Comparison: Data Granularity
text1 SPIKE BUFFER (CV) GROWTH BUFFER (Trend) 2 3 ^ ^ 4 | * * | o (Day 7 avg) 5 | * * * | o 6 | * * * * * | o 7CPU |* * * * * * * CPU | o 8 | * * | o 9 | * | o 10 +-----------------------------> +-----------------------------> 11 ~10,080 raw data points 7 daily aggregated points 12 (stddev/avg -> CV) (linear regression -> slope)
- Spike Buffer: Calculates CV from ALL individual metrics (e.g., 7 days × 24h × 60min = ~10k points)
- Growth Buffer: First aggregates to daily averages, then fits trend line to 7 points
4. QoS-Based Eviction Priority
Understanding how Kubernetes evicts pods is critical for rightsizing:
Kubelet Eviction Order (During Node Pressure):
Group 1 (Evicted First):
- BestEffort pods (no requests/limits set)
- Burstable pods where usage > requests
Group 2 (Evicted Second):
- Burstable pods where usage < requests
- Guaranteed pods (only if system services need resources)
Key Takeaways:
- Very low requests → pod likely in Group 1 → higher eviction risk
- Properly sized requests → pod in Group 2 → lower eviction risk
- Guaranteed pods (requests = limits) are most protected but prevent burst capacity
Kubeadapt's Strategy:
- Production: Burstable QoS with limits 110-130% of requests
- Behaves nearly identical to Guaranteed for eviction purposes
- Provides critical anomaly tolerance without full resource reservation
Step 3: Recommendation Generation
The Core Algorithm
Two-tier recommendation formula:
text1Final Request = Base Request × Growth Buffer × Spike Buffer 2 3Where: 4 Base Request = Percentile from policy configuration 5 - CPU: P95 (throttling is non-fatal) 6 - Memory: P99 (OOMKill is fatal) 7 8 Growth Buffer = Daily % change coefficient (scale-agnostic) 9 Formula: dailyPctChange = (slope / meanValue) × 100 10 - Stable/Decline (≤0%): 1.0x (no buffer) 11 - Growing (0%-1%): 1.0x - 1.20x (linear interpolation) 12 - Rapid (≥1%): 1.20x (+20%) 13 14 Spike Buffer = CV coefficient (linear interpolation) 15 Formula: buffer = 1.0 + min(cv, 1.0) × 0.35 16 - CV 0.0 - 1.0: 1.0x - 1.35x (linear) 17 - CV ≥ 1.0: 1.35x (max)
Example calculation:
text1Workload: api-gateway 2 3Step 1: Base Request (from lookback period metrics) 4 CPU P95: 120m 5 Memory P99: 850Mi 6 7Step 2: Apply Growth Buffer 8 Daily % change: 0.75% (growing) 9 Growth buffer: 1.0 + 0.75 × 0.20 = 1.15x 10 11 CPU with growth: 120m × 1.15 = 138m 12 Memory with growth: 850Mi × 1.15 = 978Mi 13 14Step 3: Apply Spike Buffer 15 CV: 0.28 16 Spike buffer: 1.0 + 0.28 × 0.35 = 1.10x 17 18 Final CPU request: 138m × 1.10 = 152m 19 Final Memory request: 978Mi × 1.10 = 1076Mi (≈1.05Gi) 20 21Result: 22 CPU Request: 155m (rounded) 23 Memory Request: 1.1Gi
Why P95 for CPU, P99 for memory?
CPU:
- Throttling is non-fatal (just slower)
- P95 covers most spikes
- More aggressive sizing = more cost savings
Memory:
- OOMKill is fatal (pod restart)
- P99 provides safety for spikes
- Conservative sizing prevents service disruption
Why Aggressive Limits Can Be Harmful
Before discussing safety margins, it's important to understand why setting limits too aggressively can cause more problems than it solves.
CPU Throttling:
Setting CPU limits too tight causes throttling, even when the node has idle capacity:
text1Example: 300m CPU limit on 4-core node (4000m capacity) 2- Container can use maximum 30ms of CPU per 100ms period 3- Result: Performance degradation even when node has 3700m idle CPU 4- Impact: Application slowdown, increased latency
Key Point: CPU throttling is non-fatal but degrades performance. The container runs slower, not crashes.
Memory OOM Kills:
Memory limits are fatal - any spike beyond the limit kills the pod:
text1Example: 2Gi memory limit 2- Application normally uses 1.8Gi 3- Traffic spike causes brief 2.1Gi usage 4- Result: Pod immediately killed (OOMKill) 5- Impact: Service disruption, pod restart, potential data loss
Key Point: OOM kills are fatal and cause service disruption. Unlike CPU, there's no "slow down" - the pod dies.
The Traditional Approach Problem:
Many teams set aggressive limits "to be safe":
- ❌ CPU limit = 2x request (causes frequent throttling)
- ❌ Memory limit = request (no room for spikes → OOMKills)
- ❌ Result: Performance issues despite having cluster capacity
Kubeadapt's Balanced Approach:
-
Rightsize requests first (eliminates waste at the source)
- Use P99 (memory) or P95 (CPU) + scaling trend
- Ensures requests match actual utilization
- This is where cost optimization happens
-
Use controlled limits for anomaly protection only
- Production: 110-130% of requests
- Provides burst capacity without excessive overcommit
- Prevents node exhaustion without aggressive throttling
-
Avoid aggressive limits that harm more than help
- Don't use limits to "force" cost savings
- Right-sized requests already provide cost optimization
- Limits should protect, not restrict normal operation
Summary: Rightsizing is achieved by optimizing requests (where costs occur), not by setting aggressive limits (which cause throttling/OOMKills).
Analysis Windows
Default: 7-day rolling window
Exceptions:
New workloads (<7 days old):
- Use available data (minimum 24 hours)
- Higher safety margins (+10%)
Seasonal workloads:
- Extend window to 30 days
- Capture weekly patterns
- Identify peak periods
Environment-Based Recommendations
Overview
Kubeadapt generates different recommendations based on environment type. The same deployment may receive different resource configurations in production vs. non-production environments.
Environment Classification:
- Production-like: Production, pre-production, staging (with production-equivalent load testing)
- Non-production-like: Development, testing, sandbox environments
Why Different Approaches:
Production and non-production have fundamentally different optimization priorities:
| Priority | Production-like | Non-production-like |
|---|---|---|
| Primary Goal | Reliable performance | Maximum cost reduction |
| Risk Tolerance | Low (avoid degradation) | High (acceptable issues) |
| Availability | High (HA required) | Low (single replica acceptable) |
| Node Overcommit | Conservative (120-130%) | Aggressive (300-1000%) |
Environment Detection
Kubeadapt uses a cluster-level policy to determine environment type:
Cluster Profile Selection
When creating a cluster in Kubeadapt UI, users select a profile that determines the Analysis Policy:
| Profile | Use Case | Philosophy |
|---|---|---|
| Production | Mission-critical workloads | Conservative: higher percentiles (P95/P99), tighter limit buffers |
| Non-Production | Dev/Test/Staging | Aggressive: lower percentiles (P50), larger limit buffers |
| Custom | User-defined | Per-analyzer threshold overrides via SaaS UI |
Example:
text1Cluster: prod-us-east-1 2Profile: production 3→ All workloads use production policy settings 4 - CPU: P95 percentile, 1.2x limit buffer 5 - Memory: P99 percentile, 1.3x limit buffer
text1Cluster: dev-us-east-1 2Profile: non_production 3→ All workloads use non-production policy settings 4 - CPU: P50 percentile, 4.0x limit buffer 5 - Memory: P50 percentile, 3.0x limit buffer
Policy Settings:
The cluster profile determines the
1AnalysisPolicy| Field | Production | Non-Production |
|---|---|---|
text 1cpu_percentile | P95 | P50 |
text 1memory_percentile | P99 | P50 |
text 1cpu_limit_buffer | 1.2x | 4.0x |
text 1memory_limit_buffer | 1.3x | 3.0x |
text 1min_monthly_savings_usd | $1.0 | $0.5 |
Production-like Environment Recommendations
Optimization Philosophy:
Reliable cost savings without compromising performance or availability.
1. Requests (Resource Allocation)
Calculation Method:
text1 2Request = Base × GrowthBuffer × SpikeBuffer 3 4Where: 5- Base: P95 (CPU) or P99 (Memory) from policy percentile 6- GrowthBuffer: 1.0x - 1.20x based on daily % change 7- SpikeBuffer: 1.0x - 1.35x based on CV (stddev/avg) 8 9Growth Buffer (scale-agnostic, percentage-based): 10- Stable/Decline (≤0%): 1.0x → No growth buffer needed 11- Growing (0%-1%): 1.0x - 1.20x → Linear interpolation 12- Rapid (≥1%): 1.20x → Maximum growth buffer 13
Example:
text1Workload: api-gateway 2Base (P95 CPU): 750m 3Daily % change: 0.8% (growing) 4CV: 0.3 5 6Growth Buffer = 1.0 + 0.8 × 0.20 = 1.16x 7Spike Buffer = 1.0 + 0.3 × 0.35 = 1.105x 8 9Recommended CPU Request: 10750m × 1.16 × 1.105 ≈ 962m → 1000m
Why P95 for CPU, P99 for Memory:
- CPU (P95): Throttling is non-fatal, more aggressive optimization acceptable
- Memory (P99): OOM kill is fatal, conservative approach prevents pod restarts
- Growth buffer adjusts for daily growth projections
- Spike buffer handles usage variability
2. Limits (Resource Boundaries)
Strategy: Guaranteed QoS with Controlled Overcommit
Kubernetes QoS classes affect pod eviction priorities:
| QoS Class | Condition | Eviction Priority |
|---|---|---|
| Guaranteed | requests = limits | Lowest (most protected) |
| Burstable | requests < limits | Medium |
| BestEffort | No requests/limits | Highest (first to evict) |
Kubeadapt Approach:
text1 2Recommended Limit = Request × 1.1 to 1.3 3 4Examples: 5Request: 1000m → Limit: 1100m-1300m (110-130% of request) 6Request: 2Gi → Limit: 2.2Gi-2.6Gi (110-130% of request) 7
Why not requests = limits (pure Guaranteed)?
- Pure Guaranteed prevents any burst capacity
- Small limit buffer handles anomalies without eviction
- Node-level overcommit: 110-130% allows denser packing
- Balance: Protection from eviction + anomaly tolerance
Node Capacity Planning:
text1 2Node capacity: 8 cores 3Total pod requests: 7 cores (87% utilization) 4Total pod limits: 9 cores (112% overcommit) 5 6Result: 7- Normal operation: All pods run smoothly 8- Anomaly (multiple pods spike): Limits prevent node exhaustion 9- Overcommit level: Conservative 112% (safe range: 110-130%) 10
Risk Mitigation:
- Guaranteed QoS prevents preemption by other pods
- Moderate limits protect node from resource exhaustion
3. QoS Class
Recommendation: Burstable QoS with Controlled Limits
Configuration approach:
- Requests: Right-sized based on P99 usage + scaling trend
- Limits: 110-130% of requests (production)
- QoS Classification: Burstable (requests < limits)
Why not pure Guaranteed (requests = limits)?
Pure Guaranteed prevents any burst capacity for handling anomalies. Kubeadapt's approach provides:
- Eviction protection: Limits close to requests (110-130%) provide strong protection
- Anomaly tolerance: 10-30% burst capacity handles unexpected spikes
- Node-level optimization: Allows controlled overcommit (120-130% node capacity)
Eviction Resistance:
Pods with limits 110-130% of requests are in eviction Group 2 (evicted after BestEffort and over-request Burstable pods), providing strong protection while maintaining burst capacity.
4. Node Capacity Targets
Goal: 90-100% Allocated, 120-130% Overcommitted
text1 2Example Node: 8 cores, 32 GB 3 4Target Allocation (requests): 5- CPU requests: 7-8 cores (87-100% of capacity) 6- Memory requests: 28-32 GB (87-100% of capacity) 7 8Actual Limits (overcommit): 9- CPU limits: 9-10.4 cores (112-130% of capacity) 10- Memory limits: 33-42 GB (103-131% of capacity) 11 12Overcommit ratio: 120-130% 13
Why This Range:
- Too low (<110%): Wasted node capacity, higher costs
- Sweet spot (120-130%): Dense packing + anomaly tolerance
- Too high (>150%): Performance degradation risk if multiple pods spike
CPU Throttling Risk:
If all pods simultaneously hit their limits:
text1 2Total limits: 10 cores 3Node capacity: 8 cores 4→ 2 cores of throttling distributed across pods 5 6Impact: Minor latency increase (non-fatal) 7
Memory OOM Risk:
Conservative memory overcommit (110-130%) ensures:
- Memory limits rarely all reached simultaneously
- Production workloads use Burstable QoS with controlled limits (110-130% of requests)
- Right-sized requests keep usage < requests under normal conditions
- This places pods in eviction Group 2 (lower eviction priority)
- If node pressure occurs, Group 1 pods evicted first (BestEffort + over-request Burstable)
- Controlled limits (110-130%) provide safe area for anomalies without pod crashes
Non-production-like Environment Recommendations
Optimization Philosophy:
Maximum cost reduction, accepting higher risk and lower availability.
Key Differences from Production
| Aspect | Production | Non-production |
|---|---|---|
| Base Percentile | P95 (CPU), P99 (Memory) | P50 (CPU & Memory) |
| CPU Limit Buffer | 1.2x (from policy) | 4.0x (from policy) |
| Memory Limit Buffer | 1.3x (from policy) | 3.0x (from policy) |
| Node Overcommit | 120-130% | 300-1000% |
| QoS | Burstable (controlled limits) | BestEffort acceptable |
1. Requests (Policy-Driven Allocation)
Calculation (same formula, different policy values):
text1 2Request = Base × GrowthBuffer × SpikeBuffer 3 4Where: 5- Base: P50 (CPU & Memory) from policy percentile 6- GrowthBuffer: 1.0x - 1.20x based on daily % change 7- SpikeBuffer: 1.0x - 1.35x based on CV (stddev/avg) 8 9Example: 10Dev environment workload 11P50 CPU usage: 20m 12P50 Memory usage: 180Mi 13Daily % change: 0.5% (light growth) 14CV: 0.2 15 16Growth Buffer = 1.0 + 0.5 × 0.20 = 1.10x 17Spike Buffer = 1.0 + 0.2 × 0.35 = 1.07x 18 19Recommended CPU Request: 20m × 1.10 × 1.07 ≈ 24m 20Recommended Memory Request: 180Mi × 1.10 × 1.07 ≈ 212Mi 21
Why P50 for both CPU and Memory:
- Cost priority: Minimize request allocation (what you pay for)
- Safety buffer: High policy limit buffers (4.0x CPU, 3.0x Memory) provide headroom
- Dev workload nature: Idle most of the time, occasional bursts acceptable
- Aggressive overcommit: Nodes can be 300-1000% overcommitted in non-production
Rationale:
- Developers run sporadic tests
- Most of the time: idle or minimal usage
- Occasional spikes: High limits handle bursts without OOM
- Priority: Minimize allocated cost
2. Limits (Policy-Driven High Headroom)
Strategy: High limits configured via cluster policy
text1 2CPU Limit = CPU Request × cpu_limit_buffer (4.0x from policy) 3Memory Limit = Memory Request × memory_limit_buffer (3.0x from policy) 4 5Example (from P50-based request above): 6CPU Request: 24m 7CPU Limit: 24m × 4.0 = 96m 8 9Memory Request: 212Mi 10Memory Limit: 212Mi × 3.0 = 636Mi 11 12Comparison to production: 13Production: CPU Request 150m (P95), Limit 180m (1.2x buffer) 14Non-prod: CPU Request 24m (P50), Limit 96m (4.0x buffer) 15
Rationale:
- Developers need burst capacity for testing
- Low requests (P50-based) minimize cost
- High policy-driven limits prevent OOM/throttling during test bursts
- Node overcommit acceptable (dev pods less critical)
- Safety buffer comes from policy limit buffers, not from conservative percentile choice
3. Node Capacity (Aggressive Overcommit)
Target: 300-1000% Overcommit
text1 2Example Node: 8 cores, 32 GB 3 4Allocated requests: 5- CPU: 8 cores (100% utilization) 6- Memory: 32 GB (100% utilization) 7 8Total limits: 9- CPU: 64 cores (800% overcommit!) 10- Memory: 128 GB (400% overcommit) 11 12Overcommit ratio: 400-800% 13
Why This Works:
- Dev workloads mostly idle (25-100m actual usage)
- Developers test intermittently (not all pods simultaneously)
- Even if all pods spike: Throttling acceptable in non-prod
- Cost savings: 4-8x more pods per node
Risk Acceptance:
- CPU throttling during tests: Acceptable
- Pod evictions possible: Acceptable (no production impact)
- Performance degradation: Acceptable
Example Comparison: Same Deployment, Different Environments
Production Environment:
Deployment Configuration:
- Name: api-server
- Namespace: production
- CPU request: 1000m (P95 × scaling trend)
- Memory request: 2Gi (P99 × scaling trend)
- CPU limit: 1200m (120% of request)
- Memory limit: 2.4Gi (120% of request)
Non-production Environment:
Deployment Configuration:
- Name: api-server
- Namespace: development
- CPU request: 100m (P50 minimal usage)
- Memory request: 256Mi (P50 minimal usage)
- CPU limit: 600m (6x request for developer test headroom)
- Memory limit: 1Gi (4x request)
Cost Comparison:
text1Production (per pod): 2Request: 1000m CPU, 2Gi memory 3Baseline cost: High (prioritizes reliability) 4 5Non-production (per pod): 6Request: 100m CPU, 256Mi memory 7Baseline cost: ~90% lower (prioritizes cost) 8 9Cost reduction: 90% (development vs. production)
Special Cases
StatefulSets
StatefulSets may have per-pod variation:
StatefulSets often have different resource needs per pod (e.g., primary vs replicas in databases).
Example: PostgreSQL StatefulSet
- Replicas: 3
- Pod-0 (primary): High CPU/memory usage (handles writes)
- Pod-1, Pod-2 (replicas): Lower CPU/memory usage (read-only replicas)
Challenge:
All pods in a StatefulSet share the same resource specification - you cannot set different requests/limits per pod index.
Kubeadapt's Right-sizing Approach:
Uniform Sizing (Only Option):
- Size all pods for the highest usage pod (typically the primary)
- Simpler management and operational consistency
- Some resource waste on lower-usage replicas (read replicas)
- Trade-off: Operational simplicity vs. marginal cost savings
Why This Is Acceptable:
- Primary pod rightsizing already provides significant savings vs. over-provisioned baseline
- Complexity of managing separate StatefulSets per role often outweighs marginal savings
- Workload-specific tools (like database operators) can handle role-specific sizing if needed
Advanced Alternative (Manual):
For teams requiring per-role optimization:
- Deploy separate StatefulSets for different roles (e.g., primary StatefulSet + replica StatefulSet)
- Use pod affinity/anti-affinity rules to ensure distribution
- Note: Adds operational complexity, only recommended for very large deployments
Burstable vs. Guaranteed QoS
Quality of Service classes affect right-sizing:
Burstable (requests < limits):
- Example: requests 500m CPU / 1Gi memory, limits 2000m CPU / 4Gi memory
- Can burst above requests when node has capacity
- May be throttled/OOMKilled if node is full
- More cost-efficient (pay for requests, use limits opportunistically)
Guaranteed (requests = limits):
- Example: requests = limits = 2000m CPU / 4Gi memory
- Reserved resources, fully allocated on node
- Never throttled or evicted (unless exceeds limits)
- More expensive (pay for maximum capacity 24/7)
- No burst capacity for anomalies
Kubeadapt approach:
- Right-size requests (primary optimization goal)
- Suggest limit strategy based on workload:
- Critical services: limits = 2x requests
- Burstable workloads: limits = 3-4x requests
- Batch jobs: No limits (use all available)
Multi-Container Pods
Pods with sidecars:
Pods may contain multiple containers (main app + sidecars for logging, metrics, proxies, etc.).
Example Pod:
- App container: 1000m CPU / 2Gi memory
- Logging sidecar: 100m CPU / 128Mi memory
- Metrics sidecar: 50m CPU / 64Mi memory
- Total pod requests: 1150m CPU / 2.2Gi memory
Right-sizing approach:
-
Analyze each container separately
- App: Usage-based recommendation
- Sidecars: May have fixed resource needs
-
Generate per-container recommendations
text1app: 1000m → 600m (usage-based) 2logging-sidecar: 100m → 100m (keep, minimal overhead) 3metrics-sidecar: 50m → 50m (keep, minimal overhead) -
Total pod resource reduction
text1Current: 1150m CPU, 2.2Gi memory 2Recommended: 750m CPU, 2.2Gi memory 3Reduction: 35% CPU, 0% memory
Recommendation Lifecycle
Recommendations follow a state machine lifecycle from creation to resolution.
States
| State | Description | User Action |
|---|---|---|
| Pending | Active recommendation, continuously updated by analyzer | Review in UI |
| Applied | User confirmed they applied the recommendation | "Mark as Applied" button |
| Dismissed | User doesn't want recommendations for this resource | "Dismiss" button |
| Archived | Previous applied recommendation, preserved for history | Automatic |
State Transitions
text1 2 ┌────────────────────────────────┐ 3 │ │ 4 ▼ │ 5[Created] ──► Pending ──► Applied ──► Archived │ 6 │ │ │ 7 │ └── Cooldown Period ─┘ 8 │ (new rec after cooldown) 9 ▼ 10 Dismissed ◄──► Pending (un-dismiss) 11
Transition Rules:
- Pending → Applied: User clicks "Mark as Applied" in UI
- Applied → Archived: Cooldown period passes + new recommendation generated
- Pending → Dismissed: User clicks "Dismiss" in UI
- Dismissed → Pending: User clicks "Un-dismiss" to re-enable recommendations
Cooldown Period
After applying a recommendation, the analyzer waits before generating new recommendations for the same workload:
| Policy Field | Production | Non-Production | Purpose |
|---|---|---|---|
text 1cooldown_days | 7 days | 7 days | Let system stabilize with new config |
Cooldown Logic:
text1 2IF status = 'applied' THEN 3 IF applied_at + cooldown_days < now() THEN 4 -- Run analyzer calculation 5 IF monthly_savings >= min_monthly_savings_usd THEN 6 -- Move current to archived, create new pending 7 Generate new recommendation 8 END IF 9 ELSE 10 Skip (still in cooldown period) 11 END IF 12END IF 13
Re-generation Rules
| Previous State | Re-generation Rule | Rationale |
|---|---|---|
| Pending | ✅ Always upsert | Normal operation - keep recommendations current |
| Applied | ⏳ Cooldown only | Wait for cooldown, then generate if threshold met |
| Archived | ❌ Never touch | History record - immutable |
| Dismissed | ❌ Block all | User explicitly rejected - respect their decision |
Savings Threshold
Recommendations are only generated when monthly savings meet the policy threshold:
| Policy Field | Production | Non-Production |
|---|---|---|
text 1min_monthly_savings_usd | $1.0 | $0.5 |
This prevents recommendation noise for workloads with trivial savings potential.
Summary
Kubeadapt's right-sizing approach in a nutshell:
Two-Tier Pattern Analysis:
- Daily Growth Trend - Adds 0-20% future growth buffer
- Intra-Hour Volatility - Adds 0-35% spike safety buffer
Recommendation Formula:
text1 2Final Request = Base (P95/P99) × Growth Buffer × Spike Buffer 3
Key principle:
- Optimize requests (where cost occurs)
- Control limits (for anomaly protection only)
- Never use limits for cost optimization (causes throttling/OOMKills)
Learn More
Related Documentation:
- Cost Attribution - How costs are calculated
- Resource Efficiency - Efficiency metrics explained
- Available Savings - Review recommendations in the UI
Workflows:
- Right-sizing Guide - Step-by-step optimization process
Reference: