Right-sizing


Overview

Right-sizing is the continuous process of adjusting resource requests to match actual usage, eliminating waste while maintaining performance and reliability. It's not a one-time activity, but an ongoing practice that adapts to changing workload patterns, traffic fluctuations, and application evolution.

Kubeadapt's Goal: Reliable Cost Optimization

The primary objective is not aggressive cost cutting, but reliable cost savings that maintain application performance and availability. Recommendations balance:

  • Cost efficiency - Reducing waste and unnecessary resource allocation
  • Performance reliability - Avoiding degradation and maintaining SLAs
  • Environment-specific requirements - Production vs. non-production needs

The challenge:

  • Set requests too high → pay for unused resources
  • Set requests too low → pods crash or get evicted
  • Workloads change over time → yesterday's optimal settings become wasteful

Kubeadapt's continuous approach:

  1. Analyze actual resource usage over time
  2. Identify optimal requests based on evolving patterns
  3. Assess risk levels dynamically
  4. Generate ready-to-apply recommendations automatically
  5. Re-evaluate as workloads change and grow

Kubeadapt's Core Philosophy: Request-Centric Rightsizing

The fundamental principle: Rightsizing is achieved by optimizing requests, not limits.

Why Requests Are the Primary Target

1. Requests Determine Allocated Cost

  • Cloud providers charge for requested resources (whether utilized or not)
  • Over-provisioned requests = wasted money
  • Under-provisioned requests = scheduling failures

2. Limits Are a Secondary Concern

  • Limits should protect against anomalies, not restrict normal operation
  • Tight limits cause throttling (CPU) and OOM kills (memory)
  • Properly sized requests reduce the need for aggressive limits

3. CPU vs Memory Behavior

CPU (Compressible Resource):

  • Throttling is non-fatal → performance degradation only
  • Can be shared and stretched across processes
  • Kubeadapt uses P95 percentile for more aggressive optimization

Memory (Non-compressible Resource):

  • OOM kill is fatal → pod restart
  • Cannot be shared, must be available when needed
  • Kubeadapt uses P99 percentile for conservative approach

The Kubeadapt Method

Step 1: Rightsize Requests

  • Use P99 (memory) or P95 (CPU) usage + scaling trend coefficient
  • Ensures requests match actual utilization
  • Primary cost optimization target

Step 2: Configure Limits for Protection

  • Production: 110-130% of requests (anomaly tolerance)
  • Non-production: 300-600% of requests (developer flexibility)
  • Secondary configuration for safety

Result:

  • Optimized costs (right-sized requests)
  • Reliable performance (controlled limits)
  • Eviction prevention (proper QoS class)

The Right-sizing Problem

Why Manual Right-sizing Fails

Without tooling, manual right-sizing requires:

  1. Monitoring every workload manually

    bash
    1kubectl top pods --namespace=production
    • Tedious for 100+ deployments
    • Only shows current usage, not historical patterns
  2. Analyze patterns over time

    • What was peak usage last week?
    • Are there daily/weekly spikes?
    • Is usage growing or stable?
  3. Calculate safe requests

    • How much headroom is safe?
    • What if traffic spikes?
    • Will this cause OOMKills?
  4. Update YAML files

    • Find the right deployment file
    • Update requests
    • Apply and monitor

Result: Most teams set high requests "to be safe" and leave them forever.

Typical waste: 60-80% of requested resources go unused.


How Kubeadapt Right-sizing Works

Background: Vertical Right-sizing

Kubeadapt provides vertical (per-pod resource) right-sizing recommendations:

Current Kubernetes Limitations:

  • Applying VPA recommendations requires pod restarts (rolling update)
  • Rolling updates take time and may cause brief service disruptions

Kubeadapt's Current Approach:

  • Provide vertical right-sizing recommendations (requests/limits) for manual or GitOps application
  • Recommendations applied via deployment YAML updates

Future: In-place Resource Adjustment

Kubernetes 1.33 (Beta):

  • In-place resource adjustment allows changing pod requests/limits without restart
  • Requires kubelet flag:
    text
    1--feature-gates=InPlacePodVerticalScaling=true
  • Kubeadapt can recommend in-place adjustments where supported

Kubernetes 1.35 (Expected Stable):

  • Feature expected to reach GA (General Availability)
  • Kubeadapt plans to offer automated in-place right-sizing
  • No pod restarts = faster optimization with zero downtime

Enhancement Tracking:


Current Recommendation: GitOps-Friendly Workflow (Alpha)

For clusters without in-place adjustment, the workflow is:

  1. Kubeadapt generates recommendations
  2. Recommendations exported to Git repository (audit trail)
  3. CI/CD pipeline applies changes with tracking
  4. Rolling update occurs (pod restarts)
  5. Periodic application (daily/weekly recommended)

Alpha Feature: GitOps integration for automated recommendation application with full audit/rollback capabilities.


The 4-Step Process

text
1Step 1: Data Collection (Continuous) 23Step 2: Pattern Analysis (Daily) 45Step 3: Recommendation Generation 67Step 4: Risk Assessment (Per recommendation)

Each step is explained in detail below.


Step 1: Data Collection

Data Collection Process

For every pod, every 60 seconds:

1. CPU Usage

  • Current CPU usage (millicores)
  • CPU throttling events
  • Requested CPU
  • Limit (if set)

2. Memory Usage

  • Current memory usage (bytes)
  • Memory limit (if set)
  • Requested memory
  • OOMKill events

3. Context

  • Pod phase (Running, Pending, Failed)
  • Restart count
  • Node placement
  • HPA status (if applicable)

Step 2: Pattern Analysis

Overview: Kubeadapt analyzes workload usage patterns across two dimensions to generate accurate, context-aware recommendations.

DimensionScopePurposeMetric
Daily Growth TrendDay-to-dayFuture growth bufferDaily % change (0%-1%+)
Intra-Hour VolatilityWithin 1 hourSpike safety bufferCoefficient of Variation

Why both are needed:

  • Daily Trend: Detects if workload is growing (needs future headroom)
  • Intra-Hour CV: Detects spike unpredictability (needs safety margin)

Both dimensions are independent and can vary separately for the same workload.


Statistical Analysis

For each workload, the following metrics are calculated:

1. Percentile Distribution

text
1P50 (median): 250m CPU 2P90 (90th percentile): 380m CPU 3P95 (95th percentile): 420m CPU 4P99 (99th percentile): 480m CPU 5Max (100th percentile): 650m CPU

Why percentiles matter:

  • P50 (median) = typical usage
  • P95 = most spikes covered
  • P99 = almost all spikes covered
  • Max = absolute peak (might be anomaly)

2. Daily Growth Trend (Macro Growth Analysis)

Purpose: Detect if workload is growing or shrinking over time to add future growth buffer.

Method: Linear regression on daily average usage across 7-30 days.

Trend calculation (Scale-Agnostic Percentage-Based):

text
1Daily % Change = (slope / mean_value) × 100 2 3Where: 4- slope = Linear regression slope of daily averages 5- mean_value = Mean of daily averages over lookback period 6 7Example: 8Day 1 average: 300m CPU 9Day 7 average: 500m CPU 10Slope: ~28.6m/day 11Mean: ~400m 12Daily % Change: (28.6 / 400) × 100 ≈ 7.15%

Why percentage-based (not angle-based):

  • Angle-based
    text
    1arctan(slope)
    is scale-dependent: same growth rate yields different angles for CPU (millicores) vs Memory (bytes)
  • Percentage-based is scale-agnostic: 10% growth is 10% whether it's CPU or Memory

Trend classifications:

Daily % ChangeClassificationGrowth BufferUse Case
≤ 0%Stable/Decline1.0x (no buffer)Mature workload, no growth
0% - 0.5%Light growth1.0x - 1.10xSlow, steady growth
0.5% - 1.0%Steady growth1.10x - 1.20xConsistent expansion
≥ 1%Rapid growth1.20x (+20%)Fast-growing service

Buffer is calculated via linear interpolation between 1.0x and 1.20x based on daily % change (capped at 1%).

Why this matters:

  • Growing workloads need extra headroom for future usage
  • Stable workloads can be sized more aggressively
  • Declining workloads may need downsize recommendations

3. Intra-Hour Volatility (Spike Analysis)

Purpose: Detect unpredictable spikes within the same time period to add safety margin.

Method: Calculate Coefficient of Variation (CV) from usage metrics over the lookback period.

Coefficient of Variation formula:

text
1CV = Standard Deviation / Mean

Example - CPU usage across lookback period:

text
1Day 1: 100m, 150m, 120m, 180m, 110m → Mean: 132m, StdDev: 31m 2Day 2: 105m, 145m, 125m, 175m, 115m → Mean: 133m, StdDev: 28m 3... 4Day 7: 110m, 155m, 130m, 185m, 120m → Mean: 140m, StdDev: 30m 5 6Aggregate CV = 30 / 135 = 0.22 (medium volatility)

Volatility classifications (linear interpolation):

CV ValueSpike BufferBehavior
0.01.0xPerfectly predictable
0.21.07xStable, minor variation
0.41.14xModerate variation
0.61.21xFrequent spikes
0.81.28xHigh unpredictability
1.0+1.35x (max)Extremely variable

Formula:

text
1buffer = 1.0 + min(cv, 1.0) × 0.35

Why this matters:

  • Low CV: Usage is predictable → Can size closer to P95
  • High CV: Usage is spiky → Need larger buffer above P95 for safety

Key difference from Daily Growth Trend:

  • Daily Trend: Measures direction over days (macro)
  • Intra-Hour CV: Measures predictability within same hour (micro)
  • Both can be high independently (smooth growth vs spiky but stable average)

Visual Comparison: Data Granularity

text
1 SPIKE BUFFER (CV) GROWTH BUFFER (Trend) 2 3 ^ ^ 4 | * * | o (Day 7 avg) 5 | * * * | o 6 | * * * * * | o 7CPU |* * * * * * * CPU | o 8 | * * | o 9 | * | o 10 +-----------------------------> +-----------------------------> 11 ~10,080 raw data points 7 daily aggregated points 12 (stddev/avg -> CV) (linear regression -> slope)
  • Spike Buffer: Calculates CV from ALL individual metrics (e.g., 7 days × 24h × 60min = ~10k points)
  • Growth Buffer: First aggregates to daily averages, then fits trend line to 7 points

4. QoS-Based Eviction Priority

Understanding how Kubernetes evicts pods is critical for rightsizing:

Kubelet Eviction Order (During Node Pressure):

Group 1 (Evicted First):

  • BestEffort pods (no requests/limits set)
  • Burstable pods where usage > requests

Group 2 (Evicted Second):

  • Burstable pods where usage < requests
  • Guaranteed pods (only if system services need resources)

Key Takeaways:

  • Very low requests → pod likely in Group 1 → higher eviction risk
  • Properly sized requests → pod in Group 2 → lower eviction risk
  • Guaranteed pods (requests = limits) are most protected but prevent burst capacity

Kubeadapt's Strategy:

  • Production: Burstable QoS with limits 110-130% of requests
  • Behaves nearly identical to Guaranteed for eviction purposes
  • Provides critical anomaly tolerance without full resource reservation

Step 3: Recommendation Generation

The Core Algorithm

Two-tier recommendation formula:

text
1Final Request = Base Request × Growth Buffer × Spike Buffer 2 3Where: 4 Base Request = Percentile from policy configuration 5 - CPU: P95 (throttling is non-fatal) 6 - Memory: P99 (OOMKill is fatal) 7 8 Growth Buffer = Daily % change coefficient (scale-agnostic) 9 Formula: dailyPctChange = (slope / meanValue) × 100 10 - Stable/Decline (≤0%): 1.0x (no buffer) 11 - Growing (0%-1%): 1.0x - 1.20x (linear interpolation) 12 - Rapid (≥1%): 1.20x (+20%) 13 14 Spike Buffer = CV coefficient (linear interpolation) 15 Formula: buffer = 1.0 + min(cv, 1.0) × 0.35 16 - CV 0.0 - 1.0: 1.0x - 1.35x (linear) 17 - CV ≥ 1.0: 1.35x (max)

Example calculation:

text
1Workload: api-gateway 2 3Step 1: Base Request (from lookback period metrics) 4 CPU P95: 120m 5 Memory P99: 850Mi 6 7Step 2: Apply Growth Buffer 8 Daily % change: 0.75% (growing) 9 Growth buffer: 1.0 + 0.75 × 0.20 = 1.15x 10 11 CPU with growth: 120m × 1.15 = 138m 12 Memory with growth: 850Mi × 1.15 = 978Mi 13 14Step 3: Apply Spike Buffer 15 CV: 0.28 16 Spike buffer: 1.0 + 0.28 × 0.35 = 1.10x 17 18 Final CPU request: 138m × 1.10 = 152m 19 Final Memory request: 978Mi × 1.10 = 1076Mi (≈1.05Gi) 20 21Result: 22 CPU Request: 155m (rounded) 23 Memory Request: 1.1Gi

Why P95 for CPU, P99 for memory?

CPU:

  • Throttling is non-fatal (just slower)
  • P95 covers most spikes
  • More aggressive sizing = more cost savings

Memory:

  • OOMKill is fatal (pod restart)
  • P99 provides safety for spikes
  • Conservative sizing prevents service disruption

Why Aggressive Limits Can Be Harmful

Before discussing safety margins, it's important to understand why setting limits too aggressively can cause more problems than it solves.

CPU Throttling:

Setting CPU limits too tight causes throttling, even when the node has idle capacity:

text
1Example: 300m CPU limit on 4-core node (4000m capacity) 2- Container can use maximum 30ms of CPU per 100ms period 3- Result: Performance degradation even when node has 3700m idle CPU 4- Impact: Application slowdown, increased latency

Key Point: CPU throttling is non-fatal but degrades performance. The container runs slower, not crashes.

Memory OOM Kills:

Memory limits are fatal - any spike beyond the limit kills the pod:

text
1Example: 2Gi memory limit 2- Application normally uses 1.8Gi 3- Traffic spike causes brief 2.1Gi usage 4- Result: Pod immediately killed (OOMKill) 5- Impact: Service disruption, pod restart, potential data loss

Key Point: OOM kills are fatal and cause service disruption. Unlike CPU, there's no "slow down" - the pod dies.

The Traditional Approach Problem:

Many teams set aggressive limits "to be safe":

  • ❌ CPU limit = 2x request (causes frequent throttling)
  • ❌ Memory limit = request (no room for spikes → OOMKills)
  • ❌ Result: Performance issues despite having cluster capacity

Kubeadapt's Balanced Approach:

  1. Rightsize requests first (eliminates waste at the source)

    • Use P99 (memory) or P95 (CPU) + scaling trend
    • Ensures requests match actual utilization
    • This is where cost optimization happens
  2. Use controlled limits for anomaly protection only

    • Production: 110-130% of requests
    • Provides burst capacity without excessive overcommit
    • Prevents node exhaustion without aggressive throttling
  3. Avoid aggressive limits that harm more than help

    • Don't use limits to "force" cost savings
    • Right-sized requests already provide cost optimization
    • Limits should protect, not restrict normal operation

Summary: Rightsizing is achieved by optimizing requests (where costs occur), not by setting aggressive limits (which cause throttling/OOMKills).


Analysis Windows

Default: 7-day rolling window

Exceptions:

New workloads (<7 days old):

  • Use available data (minimum 24 hours)
  • Higher safety margins (+10%)

Seasonal workloads:

  • Extend window to 30 days
  • Capture weekly patterns
  • Identify peak periods

Environment-Based Recommendations

Overview

Kubeadapt generates different recommendations based on environment type. The same deployment may receive different resource configurations in production vs. non-production environments.

Environment Classification:

  • Production-like: Production, pre-production, staging (with production-equivalent load testing)
  • Non-production-like: Development, testing, sandbox environments

Why Different Approaches:

Production and non-production have fundamentally different optimization priorities:

PriorityProduction-likeNon-production-like
Primary GoalReliable performanceMaximum cost reduction
Risk ToleranceLow (avoid degradation)High (acceptable issues)
AvailabilityHigh (HA required)Low (single replica acceptable)
Node OvercommitConservative (120-130%)Aggressive (300-1000%)

Environment Detection

Kubeadapt uses a cluster-level policy to determine environment type:

Cluster Profile Selection

When creating a cluster in Kubeadapt UI, users select a profile that determines the Analysis Policy:

ProfileUse CasePhilosophy
ProductionMission-critical workloadsConservative: higher percentiles (P95/P99), tighter limit buffers
Non-ProductionDev/Test/StagingAggressive: lower percentiles (P50), larger limit buffers
CustomUser-definedPer-analyzer threshold overrides via SaaS UI

Example:

text
1Cluster: prod-us-east-1 2Profile: production 3→ All workloads use production policy settings 4 - CPU: P95 percentile, 1.2x limit buffer 5 - Memory: P99 percentile, 1.3x limit buffer
text
1Cluster: dev-us-east-1 2Profile: non_production 3→ All workloads use non-production policy settings 4 - CPU: P50 percentile, 4.0x limit buffer 5 - Memory: P50 percentile, 3.0x limit buffer

Policy Settings:

The cluster profile determines the

text
1AnalysisPolicy
values that analyzers use:

FieldProductionNon-Production
text
1cpu_percentile
P95P50
text
1memory_percentile
P99P50
text
1cpu_limit_buffer
1.2x4.0x
text
1memory_limit_buffer
1.3x3.0x
text
1min_monthly_savings_usd
$1.0$0.5

Production-like Environment Recommendations

Optimization Philosophy:

Reliable cost savings without compromising performance or availability.

1. Requests (Resource Allocation)

Calculation Method:

text
1 2Request = Base × GrowthBuffer × SpikeBuffer 3 4Where: 5- Base: P95 (CPU) or P99 (Memory) from policy percentile 6- GrowthBuffer: 1.0x - 1.20x based on daily % change 7- SpikeBuffer: 1.0x - 1.35x based on CV (stddev/avg) 8 9Growth Buffer (scale-agnostic, percentage-based): 10- Stable/Decline (≤0%): 1.0x → No growth buffer needed 11- Growing (0%-1%): 1.0x - 1.20x → Linear interpolation 12- Rapid (≥1%): 1.20x → Maximum growth buffer 13

Example:

text
1Workload: api-gateway 2Base (P95 CPU): 750m 3Daily % change: 0.8% (growing) 4CV: 0.3 5 6Growth Buffer = 1.0 + 0.8 × 0.20 = 1.16x 7Spike Buffer = 1.0 + 0.3 × 0.35 = 1.105x 8 9Recommended CPU Request: 10750m × 1.16 × 1.105 ≈ 962m → 1000m

Why P95 for CPU, P99 for Memory:

  • CPU (P95): Throttling is non-fatal, more aggressive optimization acceptable
  • Memory (P99): OOM kill is fatal, conservative approach prevents pod restarts
  • Growth buffer adjusts for daily growth projections
  • Spike buffer handles usage variability

2. Limits (Resource Boundaries)

Strategy: Guaranteed QoS with Controlled Overcommit

Kubernetes QoS classes affect pod eviction priorities:

QoS ClassConditionEviction Priority
Guaranteedrequests = limitsLowest (most protected)
Burstablerequests < limitsMedium
BestEffortNo requests/limitsHighest (first to evict)

Kubeadapt Approach:

text
1 2Recommended Limit = Request × 1.1 to 1.3 3 4Examples: 5Request: 1000m → Limit: 1100m-1300m (110-130% of request) 6Request: 2Gi → Limit: 2.2Gi-2.6Gi (110-130% of request) 7

Why not requests = limits (pure Guaranteed)?

  • Pure Guaranteed prevents any burst capacity
  • Small limit buffer handles anomalies without eviction
  • Node-level overcommit: 110-130% allows denser packing
  • Balance: Protection from eviction + anomaly tolerance

Node Capacity Planning:

text
1 2Node capacity: 8 cores 3Total pod requests: 7 cores (87% utilization) 4Total pod limits: 9 cores (112% overcommit) 5 6Result: 7- Normal operation: All pods run smoothly 8- Anomaly (multiple pods spike): Limits prevent node exhaustion 9- Overcommit level: Conservative 112% (safe range: 110-130%) 10

Risk Mitigation:

  • Guaranteed QoS prevents preemption by other pods
  • Moderate limits protect node from resource exhaustion

3. QoS Class

Recommendation: Burstable QoS with Controlled Limits

Configuration approach:

  • Requests: Right-sized based on P99 usage + scaling trend
  • Limits: 110-130% of requests (production)
  • QoS Classification: Burstable (requests < limits)

Why not pure Guaranteed (requests = limits)?

Pure Guaranteed prevents any burst capacity for handling anomalies. Kubeadapt's approach provides:

  • Eviction protection: Limits close to requests (110-130%) provide strong protection
  • Anomaly tolerance: 10-30% burst capacity handles unexpected spikes
  • Node-level optimization: Allows controlled overcommit (120-130% node capacity)

Eviction Resistance:

Pods with limits 110-130% of requests are in eviction Group 2 (evicted after BestEffort and over-request Burstable pods), providing strong protection while maintaining burst capacity.


4. Node Capacity Targets

Goal: 90-100% Allocated, 120-130% Overcommitted

text
1 2Example Node: 8 cores, 32 GB 3 4Target Allocation (requests): 5- CPU requests: 7-8 cores (87-100% of capacity) 6- Memory requests: 28-32 GB (87-100% of capacity) 7 8Actual Limits (overcommit): 9- CPU limits: 9-10.4 cores (112-130% of capacity) 10- Memory limits: 33-42 GB (103-131% of capacity) 11 12Overcommit ratio: 120-130% 13

Why This Range:

  • Too low (<110%): Wasted node capacity, higher costs
  • Sweet spot (120-130%): Dense packing + anomaly tolerance
  • Too high (>150%): Performance degradation risk if multiple pods spike

CPU Throttling Risk:

If all pods simultaneously hit their limits:

text
1 2Total limits: 10 cores 3Node capacity: 8 cores 4→ 2 cores of throttling distributed across pods 5 6Impact: Minor latency increase (non-fatal) 7

Memory OOM Risk:

Conservative memory overcommit (110-130%) ensures:

  • Memory limits rarely all reached simultaneously
  • Production workloads use Burstable QoS with controlled limits (110-130% of requests)
  • Right-sized requests keep usage < requests under normal conditions
  • This places pods in eviction Group 2 (lower eviction priority)
  • If node pressure occurs, Group 1 pods evicted first (BestEffort + over-request Burstable)
  • Controlled limits (110-130%) provide safe area for anomalies without pod crashes

Non-production-like Environment Recommendations

Optimization Philosophy:

Maximum cost reduction, accepting higher risk and lower availability.

Key Differences from Production

AspectProductionNon-production
Base PercentileP95 (CPU), P99 (Memory)P50 (CPU & Memory)
CPU Limit Buffer1.2x (from policy)4.0x (from policy)
Memory Limit Buffer1.3x (from policy)3.0x (from policy)
Node Overcommit120-130%300-1000%
QoSBurstable (controlled limits)BestEffort acceptable

1. Requests (Policy-Driven Allocation)

Calculation (same formula, different policy values):

text
1 2Request = Base × GrowthBuffer × SpikeBuffer 3 4Where: 5- Base: P50 (CPU & Memory) from policy percentile 6- GrowthBuffer: 1.0x - 1.20x based on daily % change 7- SpikeBuffer: 1.0x - 1.35x based on CV (stddev/avg) 8 9Example: 10Dev environment workload 11P50 CPU usage: 20m 12P50 Memory usage: 180Mi 13Daily % change: 0.5% (light growth) 14CV: 0.2 15 16Growth Buffer = 1.0 + 0.5 × 0.20 = 1.10x 17Spike Buffer = 1.0 + 0.2 × 0.35 = 1.07x 18 19Recommended CPU Request: 20m × 1.10 × 1.07 ≈ 24m 20Recommended Memory Request: 180Mi × 1.10 × 1.07 ≈ 212Mi 21

Why P50 for both CPU and Memory:

  • Cost priority: Minimize request allocation (what you pay for)
  • Safety buffer: High policy limit buffers (4.0x CPU, 3.0x Memory) provide headroom
  • Dev workload nature: Idle most of the time, occasional bursts acceptable
  • Aggressive overcommit: Nodes can be 300-1000% overcommitted in non-production

Rationale:

  • Developers run sporadic tests
  • Most of the time: idle or minimal usage
  • Occasional spikes: High limits handle bursts without OOM
  • Priority: Minimize allocated cost

2. Limits (Policy-Driven High Headroom)

Strategy: High limits configured via cluster policy

text
1 2CPU Limit = CPU Request × cpu_limit_buffer (4.0x from policy) 3Memory Limit = Memory Request × memory_limit_buffer (3.0x from policy) 4 5Example (from P50-based request above): 6CPU Request: 24m 7CPU Limit: 24m × 4.0 = 96m 8 9Memory Request: 212Mi 10Memory Limit: 212Mi × 3.0 = 636Mi 11 12Comparison to production: 13Production: CPU Request 150m (P95), Limit 180m (1.2x buffer) 14Non-prod: CPU Request 24m (P50), Limit 96m (4.0x buffer) 15

Rationale:

  • Developers need burst capacity for testing
  • Low requests (P50-based) minimize cost
  • High policy-driven limits prevent OOM/throttling during test bursts
  • Node overcommit acceptable (dev pods less critical)
  • Safety buffer comes from policy limit buffers, not from conservative percentile choice

3. Node Capacity (Aggressive Overcommit)

Target: 300-1000% Overcommit

text
1 2Example Node: 8 cores, 32 GB 3 4Allocated requests: 5- CPU: 8 cores (100% utilization) 6- Memory: 32 GB (100% utilization) 7 8Total limits: 9- CPU: 64 cores (800% overcommit!) 10- Memory: 128 GB (400% overcommit) 11 12Overcommit ratio: 400-800% 13

Why This Works:

  • Dev workloads mostly idle (25-100m actual usage)
  • Developers test intermittently (not all pods simultaneously)
  • Even if all pods spike: Throttling acceptable in non-prod
  • Cost savings: 4-8x more pods per node

Risk Acceptance:

  • CPU throttling during tests: Acceptable
  • Pod evictions possible: Acceptable (no production impact)
  • Performance degradation: Acceptable

Example Comparison: Same Deployment, Different Environments

Production Environment:

Deployment Configuration:

  • Name: api-server
  • Namespace: production
  • CPU request: 1000m (P95 × scaling trend)
  • Memory request: 2Gi (P99 × scaling trend)
  • CPU limit: 1200m (120% of request)
  • Memory limit: 2.4Gi (120% of request)

Non-production Environment:

Deployment Configuration:

  • Name: api-server
  • Namespace: development
  • CPU request: 100m (P50 minimal usage)
  • Memory request: 256Mi (P50 minimal usage)
  • CPU limit: 600m (6x request for developer test headroom)
  • Memory limit: 1Gi (4x request)

Cost Comparison:

text
1Production (per pod): 2Request: 1000m CPU, 2Gi memory 3Baseline cost: High (prioritizes reliability) 4 5Non-production (per pod): 6Request: 100m CPU, 256Mi memory 7Baseline cost: ~90% lower (prioritizes cost) 8 9Cost reduction: 90% (development vs. production)

Special Cases

StatefulSets

StatefulSets may have per-pod variation:

StatefulSets often have different resource needs per pod (e.g., primary vs replicas in databases).

Example: PostgreSQL StatefulSet

  • Replicas: 3
  • Pod-0 (primary): High CPU/memory usage (handles writes)
  • Pod-1, Pod-2 (replicas): Lower CPU/memory usage (read-only replicas)

Challenge:

All pods in a StatefulSet share the same resource specification - you cannot set different requests/limits per pod index.

Kubeadapt's Right-sizing Approach:

Uniform Sizing (Only Option):

  • Size all pods for the highest usage pod (typically the primary)
  • Simpler management and operational consistency
  • Some resource waste on lower-usage replicas (read replicas)
  • Trade-off: Operational simplicity vs. marginal cost savings

Why This Is Acceptable:

  • Primary pod rightsizing already provides significant savings vs. over-provisioned baseline
  • Complexity of managing separate StatefulSets per role often outweighs marginal savings
  • Workload-specific tools (like database operators) can handle role-specific sizing if needed

Advanced Alternative (Manual):

For teams requiring per-role optimization:

  • Deploy separate StatefulSets for different roles (e.g., primary StatefulSet + replica StatefulSet)
  • Use pod affinity/anti-affinity rules to ensure distribution
  • Note: Adds operational complexity, only recommended for very large deployments

Burstable vs. Guaranteed QoS

Quality of Service classes affect right-sizing:

Burstable (requests < limits):

  • Example: requests 500m CPU / 1Gi memory, limits 2000m CPU / 4Gi memory
  • Can burst above requests when node has capacity
  • May be throttled/OOMKilled if node is full
  • More cost-efficient (pay for requests, use limits opportunistically)

Guaranteed (requests = limits):

  • Example: requests = limits = 2000m CPU / 4Gi memory
  • Reserved resources, fully allocated on node
  • Never throttled or evicted (unless exceeds limits)
  • More expensive (pay for maximum capacity 24/7)
  • No burst capacity for anomalies

Kubeadapt approach:

  1. Right-size requests (primary optimization goal)
  2. Suggest limit strategy based on workload:
    • Critical services: limits = 2x requests
    • Burstable workloads: limits = 3-4x requests
    • Batch jobs: No limits (use all available)

Multi-Container Pods

Pods with sidecars:

Pods may contain multiple containers (main app + sidecars for logging, metrics, proxies, etc.).

Example Pod:

  • App container: 1000m CPU / 2Gi memory
  • Logging sidecar: 100m CPU / 128Mi memory
  • Metrics sidecar: 50m CPU / 64Mi memory
  • Total pod requests: 1150m CPU / 2.2Gi memory

Right-sizing approach:

  1. Analyze each container separately

    • App: Usage-based recommendation
    • Sidecars: May have fixed resource needs
  2. Generate per-container recommendations

    text
    1app: 1000m → 600m (usage-based) 2logging-sidecar: 100m → 100m (keep, minimal overhead) 3metrics-sidecar: 50m → 50m (keep, minimal overhead)
  3. Total pod resource reduction

    text
    1Current: 1150m CPU, 2.2Gi memory 2Recommended: 750m CPU, 2.2Gi memory 3Reduction: 35% CPU, 0% memory

Recommendation Lifecycle

Recommendations follow a state machine lifecycle from creation to resolution.

States

StateDescriptionUser Action
PendingActive recommendation, continuously updated by analyzerReview in UI
AppliedUser confirmed they applied the recommendation"Mark as Applied" button
DismissedUser doesn't want recommendations for this resource"Dismiss" button
ArchivedPrevious applied recommendation, preserved for historyAutomatic

State Transitions

text
1 2 ┌────────────────────────────────┐ 3 │ │ 4 ▼ │ 5[Created] ──► Pending ──► Applied ──► Archived │ 6 │ │ │ 7 │ └── Cooldown Period ─┘ 8 │ (new rec after cooldown) 910 Dismissed ◄──► Pending (un-dismiss) 11

Transition Rules:

  • Pending → Applied: User clicks "Mark as Applied" in UI
  • Applied → Archived: Cooldown period passes + new recommendation generated
  • Pending → Dismissed: User clicks "Dismiss" in UI
  • Dismissed → Pending: User clicks "Un-dismiss" to re-enable recommendations

Cooldown Period

After applying a recommendation, the analyzer waits before generating new recommendations for the same workload:

Policy FieldProductionNon-ProductionPurpose
text
1cooldown_days
7 days7 daysLet system stabilize with new config

Cooldown Logic:

text
1 2IF status = 'applied' THEN 3 IF applied_at + cooldown_days < now() THEN 4 -- Run analyzer calculation 5 IF monthly_savings >= min_monthly_savings_usd THEN 6 -- Move current to archived, create new pending 7 Generate new recommendation 8 END IF 9 ELSE 10 Skip (still in cooldown period) 11 END IF 12END IF 13

Re-generation Rules

Previous StateRe-generation RuleRationale
Pending✅ Always upsertNormal operation - keep recommendations current
Applied⏳ Cooldown onlyWait for cooldown, then generate if threshold met
Archived❌ Never touchHistory record - immutable
Dismissed❌ Block allUser explicitly rejected - respect their decision

Savings Threshold

Recommendations are only generated when monthly savings meet the policy threshold:

Policy FieldProductionNon-Production
text
1min_monthly_savings_usd
$1.0$0.5

This prevents recommendation noise for workloads with trivial savings potential.


Summary

Kubeadapt's right-sizing approach in a nutshell:

Two-Tier Pattern Analysis:

  1. Daily Growth Trend - Adds 0-20% future growth buffer
  2. Intra-Hour Volatility - Adds 0-35% spike safety buffer

Recommendation Formula:

text
1 2Final Request = Base (P95/P99) × Growth Buffer × Spike Buffer 3

Key principle:

  • Optimize requests (where cost occurs)
  • Control limits (for anomaly protection only)
  • Never use limits for cost optimization (causes throttling/OOMKills)

Learn More

Related Documentation:

Workflows:

Reference: