Right-sizing | Docs

Overview

Right-sizing is the continuous process of adjusting resource requests to match actual usage, eliminating waste while maintaining performance and reliability. It's not a one-time activity, but an ongoing practice that adapts to changing workload patterns, traffic fluctuations, and application evolution.

Kubeadapt's Goal: Reliable Cost Optimization

The primary objective is not aggressive cost cutting, but reliable cost savings that maintain application performance and availability. Recommendations balance:

Cost efficiency - Reducing waste and unnecessary resource allocation
Performance reliability - Avoiding degradation and maintaining SLAs
Environment-specific requirements - Production vs. non-production needs

The challenge:

Set requests too high → pay for unused resources
Set requests too low → pods crash or get evicted
Workloads change over time → yesterday's optimal settings become wasteful

Kubeadapt's continuous approach:

Analyze actual resource usage over time
Identify optimal requests based on evolving patterns
Assess risk levels dynamically
Generate ready-to-apply recommendations automatically
Re-evaluate as workloads change and grow

Kubeadapt's Core Philosophy: Request-Centric Rightsizing

The fundamental principle: Rightsizing is achieved by optimizing requests, not limits.

Why Requests Are the Primary Target

1. Requests Determine Allocated Cost

Cloud providers charge for requested resources (whether utilized or not)
Over-provisioned requests = wasted money
Under-provisioned requests = scheduling failures

2. Limits Are a Secondary Concern

Limits should protect against anomalies, not restrict normal operation
Tight limits cause throttling (CPU) and OOM kills (memory)
Properly sized requests reduce the need for aggressive limits

3. CPU vs Memory Behavior

CPU (Compressible Resource):

Throttling is non-fatal → performance degradation only
Can be shared and stretched across processes
Kubeadapt uses P95 percentile for more aggressive optimization

Memory (Non-compressible Resource):

OOM kill is fatal → pod restart
Cannot be shared, must be available when needed
Kubeadapt uses P99 percentile for conservative approach

The Kubeadapt Method

Step 1: Rightsize Requests

Use P99 (memory) or P95 (CPU) usage + scaling trend coefficient
Ensures requests match actual utilization
Primary cost optimization target

Step 2: Configure Limits for Protection

Production: 110-130% of requests (anomaly tolerance)
Non-production: 300-600% of requests (developer flexibility)
Secondary configuration for safety

Result:

Optimized costs (right-sized requests)
Reliable performance (controlled limits)
Eviction prevention (proper QoS class)

The Right-sizing Problem

Why Manual Right-sizing Fails

Without tooling, manual right-sizing requires:

Monitoring every workload manually

bash
1kubectl top pods --namespace=production
- Tedious for 100+ deployments
- Only shows current usage, not historical patterns
Analyze patterns over time
- What was peak usage last week?
- Are there daily/weekly spikes?
- Is usage growing or stable?
Calculate safe requests
- How much headroom is safe?
- What if traffic spikes?
- Will this cause OOMKills?
Update YAML files
- Find the right deployment file
- Update requests
- Apply and monitor

Result: Most teams set high requests "to be safe" and leave them forever.

Typical waste: 60-80% of requested resources go unused.

How Kubeadapt Right-sizing Works

Background: Vertical Right-sizing

Kubeadapt provides vertical (per-pod resource) right-sizing recommendations:

Current Kubernetes Limitations:

Applying VPA recommendations requires pod restarts (rolling update)
Rolling updates take time and may cause brief service disruptions

Kubeadapt's Current Approach:

Provide vertical right-sizing recommendations (requests/limits) for manual or GitOps application
Recommendations applied via deployment YAML updates

Future: In-place Resource Adjustment

Kubernetes 1.33 (Beta):

In-place resource adjustment allows changing pod requests/limits without restart
Requires kubelet flag: --feature-gates=InPlacePodVerticalScaling=true
Kubeadapt can recommend in-place adjustments where supported

Kubernetes 1.35 (Expected Stable):

Feature expected to reach GA (General Availability)
Kubeadapt plans to offer automated in-place right-sizing
No pod restarts = faster optimization with zero downtime

Enhancement Tracking:

KEP-1287: In-place Update of Pod Resources

Current Recommendation: GitOps-Friendly Workflow (Alpha)

For clusters without in-place adjustment, the workflow is:

Kubeadapt generates recommendations
Recommendations exported to Git repository (audit trail)
CI/CD pipeline applies changes with tracking
Rolling update occurs (pod restarts)
Periodic application (daily/weekly recommended)

Alpha Feature: GitOps integration for automated recommendation application with full audit/rollback capabilities.

The 4-Step Process

text

Step 1: Data Collection (Continuous)
        ↓
Step 2: Pattern Analysis (Daily)
        ↓
Step 3: Recommendation Generation
        ↓
Step 4: Risk Assessment (Per recommendation)

Each step is explained in detail below.

Step 1: Data Collection

Data Collection Process

For every pod, every 60 seconds:

1. CPU Usage

Current CPU usage (millicores)
CPU throttling events
Requested CPU
Limit (if set)

2. Memory Usage

Current memory usage (bytes)
Memory limit (if set)
Requested memory
OOMKill events

3. Context

Pod phase (Running, Pending, Failed)
Restart count
Node placement
HPA status (if applicable)

Step 2: Pattern Analysis

Overview: Kubeadapt analyzes workload usage patterns across two dimensions to generate accurate, context-aware recommendations.

Dimension	Scope	Purpose	Metric
Daily Growth Trend	Day-to-day	Future growth buffer	Daily % change (0%-1%+)
Intra-Hour Volatility	Within 1 hour	Spike safety buffer	Coefficient of Variation

Why both are needed:

Daily Trend: Detects if workload is growing (needs future headroom)
Intra-Hour CV: Detects spike unpredictability (needs safety margin)

Both dimensions are independent and can vary separately for the same workload.

Statistical Analysis

For each workload, the following metrics are calculated:

1. Percentile Distribution

text

P50 (median): 250m CPU
P90 (90th percentile): 380m CPU
P95 (95th percentile): 420m CPU
P99 (99th percentile): 480m CPU
Max (100th percentile): 650m CPU

Why percentiles matter:

P50 (median) = typical usage
P95 = most spikes covered
P99 = almost all spikes covered
Max = absolute peak (might be anomaly)

2. Daily Growth Trend (Macro Growth Analysis)

Purpose: Detect if workload is growing or shrinking over time to add future growth buffer.

Method: Linear regression on daily average usage across 7-30 days.

Trend calculation (Scale-Agnostic Percentage-Based):

text

Daily % Change = (slope / mean_value) × 100

Where:
- slope = Linear regression slope of daily averages
- mean_value = Mean of daily averages over lookback period

Example:
Day 1 average: 300m CPU
Day 7 average: 500m CPU
Slope: ~28.6m/day
Mean: ~400m
Daily % Change: (28.6 / 400) × 100 ≈ 7.15%

Why percentage-based (not angle-based):

Angle-based arctan(slope) is scale-dependent: same growth rate yields different angles for CPU (millicores) vs Memory (bytes)
Percentage-based is scale-agnostic: 10% growth is 10% whether it's CPU or Memory

Trend classifications:

Daily % Change	Classification	Growth Buffer	Use Case
≤ 0%	Stable/Decline	1.0x (no buffer)	Mature workload, no growth
0% - 0.5%	Light growth	1.0x - 1.10x	Slow, steady growth
0.5% - 1.0%	Steady growth	1.10x - 1.20x	Consistent expansion
≥ 1%	Rapid growth	1.20x (+20%)	Fast-growing service

Buffer is calculated via linear interpolation between 1.0x and 1.20x based on daily % change (capped at 1%).

Why this matters:

Growing workloads need extra headroom for future usage
Stable workloads can be sized more aggressively
Declining workloads may need downsize recommendations

3. Intra-Hour Volatility (Spike Analysis)

Purpose: Detect unpredictable spikes within the same time period to add safety margin.

Method: Calculate Coefficient of Variation (CV) from usage metrics over the lookback period.

Coefficient of Variation formula:

text

CV = Standard Deviation / Mean

Example - CPU usage across lookback period:

text

Day 1: 100m, 150m, 120m, 180m, 110m → Mean: 132m, StdDev: 31m
Day 2: 105m, 145m, 125m, 175m, 115m → Mean: 133m, StdDev: 28m
...
Day 7: 110m, 155m, 130m, 185m, 120m → Mean: 140m, StdDev: 30m

Aggregate CV = 30 / 135 = 0.22 (medium volatility)

Volatility classifications (linear interpolation):

CV Value	Spike Buffer	Behavior
0.0	1.0x	Perfectly predictable
0.2	1.07x	Stable, minor variation
0.4	1.14x	Moderate variation
0.6	1.21x	Frequent spikes
0.8	1.28x	High unpredictability
1.0+	1.35x (max)	Extremely variable

Formula: buffer = 1.0 + min(cv, 1.0) × 0.35

Why this matters:

Low CV: Usage is predictable → Can size closer to P95
High CV: Usage is spiky → Need larger buffer above P95 for safety

Key difference from Daily Growth Trend:

Daily Trend: Measures direction over days (macro)
Intra-Hour CV: Measures predictability within same hour (micro)
Both can be high independently (smooth growth vs spiky but stable average)

Visual Comparison: Data Granularity

text

         SPIKE BUFFER (CV)                      GROWTH BUFFER (Trend)

    ^                                           ^
    |    *  *                                   |           o  (Day 7 avg)
    |  *    *  *                                |        o
    | *  *     *  *    *                        |     o
CPU |*          *  * * *  * *               CPU |   o
    |                    *    *                 |  o
    |                          *                | o
    +----------------------------->             +----------------------------->
         ~10,080 raw data points                     7 daily aggregated points
         (stddev/avg -> CV)                         (linear regression -> slope)

Spike Buffer: Calculates CV from ALL individual metrics (e.g., 7 days × 24h × 60min = ~10k points)
Growth Buffer: First aggregates to daily averages, then fits trend line to 7 points

4. QoS-Based Eviction Priority

Understanding how Kubernetes evicts pods is critical for rightsizing:

Kubelet Eviction Order (During Node Pressure):

Group 1 (Evicted First):

BestEffort pods (no requests/limits set)
Burstable pods where usage > requests

Group 2 (Evicted Second):

Burstable pods where usage < requests
Guaranteed pods (only if system services need resources)

Key Takeaways:

Very low requests → pod likely in Group 1 → higher eviction risk
Properly sized requests → pod in Group 2 → lower eviction risk
Guaranteed pods (requests = limits) are most protected but prevent burst capacity

Kubeadapt's Strategy:

Production: Burstable QoS with limits 110-130% of requests
Behaves nearly identical to Guaranteed for eviction purposes
Provides critical anomaly tolerance without full resource reservation

Step 3: Recommendation Generation

The Core Algorithm

Two-tier recommendation formula:

text

Final Request = Base Request × Growth Buffer × Spike Buffer

Where:
  Base Request = Percentile from policy configuration
    - CPU: P95 (throttling is non-fatal)
    - Memory: P99 (OOMKill is fatal)

  Growth Buffer = Daily % change coefficient (scale-agnostic)
    Formula: dailyPctChange = (slope / meanValue) × 100
    - Stable/Decline (≤0%): 1.0x (no buffer)
    - Growing (0%-1%):      1.0x - 1.20x (linear interpolation)
    - Rapid (≥1%):          1.20x (+20%)

  Spike Buffer = CV coefficient (linear interpolation)
    Formula: buffer = 1.0 + min(cv, 1.0) × 0.35
    - CV 0.0 - 1.0: 1.0x - 1.35x (linear)
    - CV ≥ 1.0:     1.35x (max)

Example calculation:

text

Workload: api-gateway

Step 1: Base Request (from lookback period metrics)
  CPU P95: 120m
  Memory P99: 850Mi

Step 2: Apply Growth Buffer
  Daily % change: 0.75% (growing)
  Growth buffer: 1.0 + 0.75 × 0.20 = 1.15x

  CPU with growth: 120m × 1.15 = 138m
  Memory with growth: 850Mi × 1.15 = 978Mi

Step 3: Apply Spike Buffer
  CV: 0.28
  Spike buffer: 1.0 + 0.28 × 0.35 = 1.10x

  Final CPU request: 138m × 1.10 = 152m
  Final Memory request: 978Mi × 1.10 = 1076Mi (≈1.05Gi)

Result:
  CPU Request: 155m (rounded)
  Memory Request: 1.1Gi

Why P95 for CPU, P99 for memory?

CPU:

Throttling is non-fatal (just slower)
P95 covers most spikes
More aggressive sizing = more cost savings

Memory:

OOMKill is fatal (pod restart)
P99 provides safety for spikes
Conservative sizing prevents service disruption

Why Aggressive Limits Can Be Harmful

Before discussing safety margins, it's important to understand why setting limits too aggressively can cause more problems than it solves.

CPU Throttling:

Setting CPU limits too tight causes throttling, even when the node has idle capacity:

text

Example: 300m CPU limit on 4-core node (4000m capacity)
- Container can use maximum 30ms of CPU per 100ms period
- Result: Performance degradation even when node has 3700m idle CPU
- Impact: Application slowdown, increased latency

Key Point: CPU throttling is non-fatal but degrades performance. The container runs slower, not crashes.

Memory OOM Kills:

Memory limits are fatal - any spike beyond the limit kills the pod:

text

Example: 2Gi memory limit
- Application normally uses 1.8Gi
- Traffic spike causes brief 2.1Gi usage
- Result: Pod immediately killed (OOMKill)
- Impact: Service disruption, pod restart, potential data loss

Key Point: OOM kills are fatal and cause service disruption. Unlike CPU, there's no "slow down" - the pod dies.

The Traditional Approach Problem:

Many teams set aggressive limits "to be safe":

❌ CPU limit = 2x request (causes frequent throttling)
❌ Memory limit = request (no room for spikes → OOMKills)
❌ Result: Performance issues despite having cluster capacity

Kubeadapt's Balanced Approach:

Rightsize requests first (eliminates waste at the source)
- Use P99 (memory) or P95 (CPU) + scaling trend
- Ensures requests match actual utilization
- This is where cost optimization happens
Use controlled limits for anomaly protection only
- Production: 110-130% of requests
- Provides burst capacity without excessive overcommit
- Prevents node exhaustion without aggressive throttling
Avoid aggressive limits that harm more than help
- Don't use limits to "force" cost savings
- Right-sized requests already provide cost optimization
- Limits should protect, not restrict normal operation

Summary: Rightsizing is achieved by optimizing requests (where costs occur), not by setting aggressive limits (which cause throttling/OOMKills).

Analysis Windows

Default: 7-day rolling window

Exceptions:

New workloads (<7 days old):

Use available data (minimum 24 hours)
Higher safety margins (+10%)

Seasonal workloads:

Extend window to 30 days
Capture weekly patterns
Identify peak periods

Environment-Based Recommendations

Overview

Kubeadapt generates different recommendations based on environment type. The same deployment may receive different resource configurations in production vs. non-production environments.

Environment Classification:

Production-like: Production, pre-production, staging (with production-equivalent load testing)
Non-production-like: Development, testing, sandbox environments

Why Different Approaches:

Production and non-production have fundamentally different optimization priorities:

Priority	Production-like	Non-production-like
Primary Goal	Reliable performance	Maximum cost reduction
Risk Tolerance	Low (avoid degradation)	High (acceptable issues)
Availability	High (HA required)	Low (single replica acceptable)
Node Overcommit	Conservative (120-130%)	Aggressive (300-1000%)

Environment Detection

Kubeadapt uses a cluster-level policy to determine environment type:

Cluster Profile Selection

When creating a cluster in Kubeadapt UI, users select a profile that determines the Analysis Policy:

Profile	Use Case	Philosophy
Production	Mission-critical workloads	Conservative: higher percentiles (P95/P99), tighter limit buffers
Non-Production	Dev/Test/Staging	Aggressive: lower percentiles (P50), larger limit buffers
Custom	User-defined	Per-analyzer threshold overrides via SaaS UI

Example:

text

Cluster: prod-us-east-1
Profile: production
→ All workloads use production policy settings
  - CPU: P95 percentile, 1.2x limit buffer
  - Memory: P99 percentile, 1.3x limit buffer

text

Cluster: dev-us-east-1
Profile: non_production
→ All workloads use non-production policy settings
  - CPU: P50 percentile, 4.0x limit buffer
  - Memory: P50 percentile, 3.0x limit buffer

Policy Settings:

The cluster profile determines the AnalysisPolicy values that analyzers use:

Field	Production	Non-Production
`cpu_percentile`	P95	P50
`memory_percentile`	P99	P50
`cpu_limit_buffer`	1.2x	4.0x
`memory_limit_buffer`	1.3x	3.0x
`min_monthly_savings_usd`	$1.0	$0.5

Production-like Environment Recommendations

Optimization Philosophy:

Reliable cost savings without compromising performance or availability.

1. Requests (Resource Allocation)

Calculation Method:

text

Request = Base × GrowthBuffer × SpikeBuffer

Where:
- Base: P95 (CPU) or P99 (Memory) from policy percentile
- GrowthBuffer: 1.0x - 1.20x based on daily % change
- SpikeBuffer: 1.0x - 1.35x based on CV (stddev/avg)

Growth Buffer (scale-agnostic, percentage-based):
- Stable/Decline (≤0%): 1.0x → No growth buffer needed
- Growing (0%-1%): 1.0x - 1.20x → Linear interpolation
- Rapid (≥1%): 1.20x → Maximum growth buffer

Example:

text

Workload: api-gateway
Base (P95 CPU): 750m
Daily % change: 0.8% (growing)
CV: 0.3

Growth Buffer = 1.0 + 0.8 × 0.20 = 1.16x
Spike Buffer = 1.0 + 0.3 × 0.35 = 1.105x

Recommended CPU Request:
750m × 1.16 × 1.105 ≈ 962m → 1000m

Why P95 for CPU, P99 for Memory:

CPU (P95): Throttling is non-fatal, more aggressive optimization acceptable
Memory (P99): OOM kill is fatal, conservative approach prevents pod restarts
Growth buffer adjusts for daily growth projections
Spike buffer handles usage variability

2. Limits (Resource Boundaries)

Strategy: Guaranteed QoS with Controlled Overcommit

Kubernetes QoS classes affect pod eviction priorities:

QoS Class	Condition	Eviction Priority
Guaranteed	requests = limits	Lowest (most protected)
Burstable	requests < limits	Medium
BestEffort	No requests/limits	Highest (first to evict)

Kubeadapt Approach:

text

Recommended Limit = Request × 1.1 to 1.3

Examples:
Request: 1000m → Limit: 1100m-1300m (110-130% of request)
Request: 2Gi → Limit: 2.2Gi-2.6Gi (110-130% of request)

Why not requests = limits (pure Guaranteed)?

Pure Guaranteed prevents any burst capacity
Small limit buffer handles anomalies without eviction
Node-level overcommit: 110-130% allows denser packing
Balance: Protection from eviction + anomaly tolerance

Node Capacity Planning:

text

Node capacity: 8 cores
Total pod requests: 7 cores (87% utilization)
Total pod limits: 9 cores (112% overcommit)

Result:
- Normal operation: All pods run smoothly
- Anomaly (multiple pods spike): Limits prevent node exhaustion
- Overcommit level: Conservative 112% (safe range: 110-130%)

Risk Mitigation:

Guaranteed QoS prevents preemption by other pods
Moderate limits protect node from resource exhaustion

3. QoS Class

Recommendation: Burstable QoS with Controlled Limits

Configuration approach:

Requests: Right-sized based on P99 usage + scaling trend
Limits: 110-130% of requests (production)
QoS Classification: Burstable (requests < limits)

Why not pure Guaranteed (requests = limits)?

Pure Guaranteed prevents any burst capacity for handling anomalies. Kubeadapt's approach provides:

Eviction protection: Limits close to requests (110-130%) provide strong protection
Anomaly tolerance: 10-30% burst capacity handles unexpected spikes
Node-level optimization: Allows controlled overcommit (120-130% node capacity)

Eviction Resistance:

Pods with limits 110-130% of requests are in eviction Group 2 (evicted after BestEffort and over-request Burstable pods), providing strong protection while maintaining burst capacity.

4. Node Capacity Targets

Goal: 90-100% Allocated, 120-130% Overcommitted

text

Example Node: 8 cores, 32 GB

Target Allocation (requests):
- CPU requests: 7-8 cores (87-100% of capacity)
- Memory requests: 28-32 GB (87-100% of capacity)

Actual Limits (overcommit):
- CPU limits: 9-10.4 cores (112-130% of capacity)
- Memory limits: 33-42 GB (103-131% of capacity)

Overcommit ratio: 120-130%

Why This Range:

Too low (<110%): Wasted node capacity, higher costs
Sweet spot (120-130%): Dense packing + anomaly tolerance
Too high (>150%): Performance degradation risk if multiple pods spike

CPU Throttling Risk:

If all pods simultaneously hit their limits:

text

Total limits: 10 cores
Node capacity: 8 cores
→ 2 cores of throttling distributed across pods

Impact: Minor latency increase (non-fatal)

Memory OOM Risk:

Conservative memory overcommit (110-130%) ensures:

Memory limits rarely all reached simultaneously
Production workloads use Burstable QoS with controlled limits (110-130% of requests)
Right-sized requests keep usage < requests under normal conditions
This places pods in eviction Group 2 (lower eviction priority)
If node pressure occurs, Group 1 pods evicted first (BestEffort + over-request Burstable)
Controlled limits (110-130%) provide safe area for anomalies without pod crashes

Non-production-like Environment Recommendations

Optimization Philosophy:

Maximum cost reduction, accepting higher risk and lower availability.

Key Differences from Production

Aspect	Production	Non-production
Base Percentile	P95 (CPU), P99 (Memory)	P50 (CPU & Memory)
CPU Limit Buffer	1.2x (from policy)	4.0x (from policy)
Memory Limit Buffer	1.3x (from policy)	3.0x (from policy)
Node Overcommit	120-130%	300-1000%
QoS	Burstable (controlled limits)	BestEffort acceptable

1. Requests (Policy-Driven Allocation)

Calculation (same formula, different policy values):

text

Request = Base × GrowthBuffer × SpikeBuffer

Where:
- Base: P50 (CPU & Memory) from policy percentile
- GrowthBuffer: 1.0x - 1.20x based on daily % change
- SpikeBuffer: 1.0x - 1.35x based on CV (stddev/avg)

Example:
Dev environment workload
P50 CPU usage: 20m
P50 Memory usage: 180Mi
Daily % change: 0.5% (light growth)
CV: 0.2

Growth Buffer = 1.0 + 0.5 × 0.20 = 1.10x
Spike Buffer = 1.0 + 0.2 × 0.35 = 1.07x

Recommended CPU Request: 20m × 1.10 × 1.07 ≈ 24m
Recommended Memory Request: 180Mi × 1.10 × 1.07 ≈ 212Mi

Why P50 for both CPU and Memory:

Cost priority: Minimize request allocation (what you pay for)
Safety buffer: High policy limit buffers (4.0x CPU, 3.0x Memory) provide headroom
Dev workload nature: Idle most of the time, occasional bursts acceptable
Aggressive overcommit: Nodes can be 300-1000% overcommitted in non-production

Rationale:

Developers run sporadic tests
Most of the time: idle or minimal usage
Occasional spikes: High limits handle bursts without OOM
Priority: Minimize allocated cost

2. Limits (Policy-Driven High Headroom)

Strategy: High limits configured via cluster policy

text

CPU Limit = CPU Request × cpu_limit_buffer (4.0x from policy)
Memory Limit = Memory Request × memory_limit_buffer (3.0x from policy)

Example (from P50-based request above):
CPU Request: 24m
CPU Limit: 24m × 4.0 = 96m

Memory Request: 212Mi
Memory Limit: 212Mi × 3.0 = 636Mi

Comparison to production:
Production: CPU Request 150m (P95), Limit 180m (1.2x buffer)
Non-prod: CPU Request 24m (P50), Limit 96m (4.0x buffer)

Rationale:

Developers need burst capacity for testing
Low requests (P50-based) minimize cost
High policy-driven limits prevent OOM/throttling during test bursts
Node overcommit acceptable (dev pods less critical)
Safety buffer comes from policy limit buffers, not from conservative percentile choice

3. Node Capacity (Aggressive Overcommit)

Target: 300-1000% Overcommit

text

Example Node: 8 cores, 32 GB

Allocated requests:
- CPU: 8 cores (100% utilization)
- Memory: 32 GB (100% utilization)

Total limits:
- CPU: 64 cores (800% overcommit!)
- Memory: 128 GB (400% overcommit)

Overcommit ratio: 400-800%

Why This Works:

Dev workloads mostly idle (25-100m actual usage)
Developers test intermittently (not all pods simultaneously)
Even if all pods spike: Throttling acceptable in non-prod
Cost savings: 4-8x more pods per node

Risk Acceptance:

CPU throttling during tests: Acceptable
Pod evictions possible: Acceptable (no production impact)
Performance degradation: Acceptable

Example Comparison: Same Deployment, Different Environments

Production Environment:

Deployment Configuration:

Name: api-server
Namespace: production
CPU request: 1000m (P95 × scaling trend)
Memory request: 2Gi (P99 × scaling trend)
CPU limit: 1200m (120% of request)
Memory limit: 2.4Gi (120% of request)

Non-production Environment:

Deployment Configuration:

Name: api-server
Namespace: development
CPU request: 100m (P50 minimal usage)
Memory request: 256Mi (P50 minimal usage)
CPU limit: 600m (6x request for developer test headroom)
Memory limit: 1Gi (4x request)

Cost Comparison:

text

Production (per pod):
Request: 1000m CPU, 2Gi memory
Baseline cost: High (prioritizes reliability)

Non-production (per pod):
Request: 100m CPU, 256Mi memory
Baseline cost: ~90% lower (prioritizes cost)

Cost reduction: 90% (development vs. production)

Special Cases

StatefulSets

StatefulSets may have per-pod variation:

StatefulSets often have different resource needs per pod (e.g., primary vs replicas in databases).

Example: PostgreSQL StatefulSet

Replicas: 3
Pod-0 (primary): High CPU/memory usage (handles writes)
Pod-1, Pod-2 (replicas): Lower CPU/memory usage (read-only replicas)

Challenge:

All pods in a StatefulSet share the same resource specification - you cannot set different requests/limits per pod index.

Kubeadapt's Right-sizing Approach:

Uniform Sizing (Only Option):

Size all pods for the highest usage pod (typically the primary)
Simpler management and operational consistency
Some resource waste on lower-usage replicas (read replicas)
Trade-off: Operational simplicity vs. marginal cost savings

Why This Is Acceptable:

Primary pod rightsizing already provides significant savings vs. over-provisioned baseline
Complexity of managing separate StatefulSets per role often outweighs marginal savings
Workload-specific tools (like database operators) can handle role-specific sizing if needed

Advanced Alternative (Manual):

For teams requiring per-role optimization:

Deploy separate StatefulSets for different roles (e.g., primary StatefulSet + replica StatefulSet)
Use pod affinity/anti-affinity rules to ensure distribution
Note: Adds operational complexity, only recommended for very large deployments

Burstable vs. Guaranteed QoS

Quality of Service classes affect right-sizing:

Burstable (requests < limits):

Example: requests 500m CPU / 1Gi memory, limits 2000m CPU / 4Gi memory
Can burst above requests when node has capacity
May be throttled/OOMKilled if node is full
More cost-efficient (pay for requests, use limits opportunistically)

Guaranteed (requests = limits):

Example: requests = limits = 2000m CPU / 4Gi memory
Reserved resources, fully allocated on node
Never throttled or evicted (unless exceeds limits)
More expensive (pay for maximum capacity 24/7)
No burst capacity for anomalies

Kubeadapt approach:

Right-size requests (primary optimization goal)
Suggest limit strategy based on workload:
- Critical services: limits = 2x requests
- Burstable workloads: limits = 3-4x requests
- Batch jobs: No limits (use all available)

Multi-Container Pods

Pods with sidecars:

Pods may contain multiple containers (main app + sidecars for logging, metrics, proxies, etc.).

Example Pod:

App container: 1000m CPU / 2Gi memory
Logging sidecar: 100m CPU / 128Mi memory
Metrics sidecar: 50m CPU / 64Mi memory
Total pod requests: 1150m CPU / 2.2Gi memory

Right-sizing approach:

Analyze each container separately
- App: Usage-based recommendation
- Sidecars: May have fixed resource needs
Generate per-container recommendations

text
1app: 1000m → 600m (usage-based) 2logging-sidecar: 100m → 100m (keep, minimal overhead) 3metrics-sidecar: 50m → 50m (keep, minimal overhead)
Total pod resource reduction

text
1Current: 1150m CPU, 2.2Gi memory 2Recommended: 750m CPU, 2.2Gi memory 3Reduction: 35% CPU, 0% memory

Recommendation Lifecycle

Recommendations follow a state machine lifecycle from creation to resolution.

States

State	Description	User Action
Pending	Active recommendation, continuously updated by analyzer	Review in UI
Applied	User confirmed they applied the recommendation	"Mark as Applied" button
Dismissed	User doesn't want recommendations for this resource	"Dismiss" button
Archived	Previous applied recommendation, preserved for history	Automatic

State Transitions

text

                 ┌────────────────────────────────┐
                 │                                │
                 ▼                                │
[Created] ──► Pending ──► Applied ──► Archived   │
                │            │                    │
                │            └── Cooldown Period ─┘
                │                (new rec after cooldown)
                ▼
           Dismissed ◄──► Pending (un-dismiss)

Transition Rules:

Pending → Applied: User clicks "Mark as Applied" in UI
Applied → Archived: Cooldown period passes + new recommendation generated
Pending → Dismissed: User clicks "Dismiss" in UI
Dismissed → Pending: User clicks "Un-dismiss" to re-enable recommendations

Cooldown Period

After applying a recommendation, the analyzer waits before generating new recommendations for the same workload:

Policy Field	Production	Non-Production	Purpose
`cooldown_days`	7 days	7 days	Let system stabilize with new config

Cooldown Logic:

text

IF status = 'applied' THEN
    IF applied_at + cooldown_days < now() THEN
        -- Run analyzer calculation
        IF monthly_savings >= min_monthly_savings_usd THEN
            -- Move current to archived, create new pending
            Generate new recommendation
        END IF
    ELSE
        Skip (still in cooldown period)
    END IF
END IF

Re-generation Rules

Previous State	Re-generation Rule	Rationale
Pending	✅ Always upsert	Normal operation - keep recommendations current
Applied	⏳ Cooldown only	Wait for cooldown, then generate if threshold met
Archived	❌ Never touch	History record - immutable
Dismissed	❌ Block all	User explicitly rejected - respect their decision

Savings Threshold

Recommendations are only generated when monthly savings meet the policy threshold:

Policy Field	Production	Non-Production
`min_monthly_savings_usd`	$1.0	$0.5

This prevents recommendation noise for workloads with trivial savings potential.

Summary

Kubeadapt's right-sizing approach in a nutshell:

Two-Tier Pattern Analysis:

Daily Growth Trend - Adds 0-20% future growth buffer
Intra-Hour Volatility - Adds 0-35% spike safety buffer

Recommendation Formula:

text

Final Request = Base (P95/P99) × Growth Buffer × Spike Buffer

Key principle:

Optimize requests (where cost occurs)
Control limits (for anomaly protection only)
Never use limits for cost optimization (causes throttling/OOMKills)

Learn More

Related Documentation:

Cost Attribution - How costs are calculated
Resource Efficiency - Efficiency metrics explained
Available Savings - Review recommendations in the UI

Workflows:

Right-sizing Guide - Step-by-step optimization process

Reference: