Core Concepts

Smart Alerting

How Kubeadapt's alerts work — threshold checks against a rolling baseline, not z-score anomaly math — and the four alert types you can configure.

You want to hear about cost surprises before finance does. Smart Alerting is Kubeadapt's way of getting a message to the right channel when one of four things happens in your environment: spend spikes, a budget runs hot, a workload sits idle, or an expensive new workload appears. This page explains the model so the rest of the section makes sense.

Alerts are threshold-based, not anomaly-based

Smart Alerting compares observed values to a baseline and fires when the difference crosses a threshold. The baseline is a rolling window of recent history. The threshold is either configured by you (budget, idle percentage, monthly dollar limit) or derived automatically from the scope's spending pattern (cost spike).

This is deliberate. Kubeadapt does not use z-scores, IQR, MAD, or any learned model of what your spend usually looks like. Threshold-with-rolling-baseline is simpler, predictable, and easy to reason about when an alert fires at 3am. If the spend doubled compared to recent days, you'll know it doubled. There's no need to guess what the algorithm decided was "abnormal".

The trade-off: you'll occasionally see false positives during real changes (a new team onboards, a major deployment ships). The system compensates by requiring spikes to persist across consecutive days before firing — noisy scopes automatically demand longer confirmation.

The four user-configurable alert types

Alert type	Fires when…	Common scope
`cost_spike`	Daily spend rises sharply against the rolling baseline for the same day type (weekday/weekend).	Cluster, team
`budget_threshold`	Period-to-date spend crosses 50%, 80%, 90%, or 100% of a budget you set.	Org, department
`unused_resources`	A scope's idle resources cross a configured percentage. Delivered as a weekly digest.	Cluster
`new_expensive_workload`	A workload appears that's projected to cost more than a configured monthly threshold.	Cluster, team

You configure each type's conditions on a Rule. One Rule can enable any combination of the four. See Rules for the rule shape.

How cost-spike actually works

Of the four, Cost spike is the only one with a non-trivial baseline. The spec deserves the detail.

For each scope, Kubeadapt looks at the last 30 days of daily cost, then computes the baseline as the median of historical samples bucketed by day type — weekday samples for weekday evaluation, weekend samples for weekend evaluation. The current day's cost is compared against that bucketed median.

Two reasons for the day-type bucketing:

Weekend traffic is structurally different from weekday traffic in most production environments. Comparing Sunday's spend to last Tuesday's baseline produces noise.
The bucket-by-day-type approach holds up against a single one-off day. A Black Friday spike in a 30-day window won't poison the next 29 days' baselines because most days are still ordinary.

If the scope's day-type bucket has fewer than two historical samples (cold-start, very new organizations), Kubeadapt falls back to the all-day-type median. The page shows which fallback was used on the incident detail.

Confirmation windows

Cost spike fires only if the spike holds across consecutive completed days. The spike days are excluded from the baseline math, so they don't artificially inflate their own baseline. Noisy scopes — where day-to-day variation rivals the detection threshold — automatically require additional consecutive days before firing, keeping erratic scopes quiet without raising the threshold.

The threshold, the floor, and the confirmation window that decide what counts as "qualifying" all adapt to the scope's spending pattern — there are no knobs on the rule itself.

The other three alert types

Budget threshold is the simplest. You set a budget for a period (monthly, quarterly, or fiscal year), pick which threshold percentages to be notified at (50%, 80%, 90%, 100% — any non-empty subset), and pick a cost mode (Fully Loaded vs Workload Only). The rule fires once when each selected threshold is first crossed; the counter resets at the start of the next period.

Unused resources runs on a weekly schedule. It rolls up the scope's idle resources (low CPU/memory utilization combined with low network activity), filters by the minimum idle percentage you set, and delivers the top N most expensive idle items in a single digest. You configure the percentage and the cap; the cadence is fixed at weekly. Use this when you want a "did we leave money on the table this week?" recap instead of a real-time alert.

New expensive workload watches for workloads that appear and exceed a monthly dollar projection. You configure the dollar threshold, a minimum age (so a workload has to run long enough for the cost projection to stabilize), and a list of owner Kinds to exclude. Defaults exclude short-lived built-ins (Job, CronJob) and common ML operators (Argo Workflows, Spark, Ray, Kubeflow). The auto-exclude option for custom-resource-owned workloads is on by default, which is the right setting for most teams running operators.

Scope: where a rule applies

Every rule has a scope. The valid kinds:

organization — every cluster in your org.
cluster — a single cluster.
namespace — a namespace within a single cluster.
team — workloads attributed to a Team (via labels or Assignment Workbench).
department — workloads attributed to a Department.

Team and department scopes are dynamic. Kubeadapt resolves the membership at evaluation time, and the baseline math operates on a stable set of workloads — so attribution reshuffles (a team grows, a workload moves) don't create false cost-spike alerts on their own.

Where alerts go

A Rule decides what to fire on. A Policy decides where the alert goes. A Channel is the destination (Slack, email, webhook, in-app). The three layers are separate so you can route critical-severity alerts to PagerDuty-like channels and informational alerts to a digest mailing list without duplicating rule definitions.

The full routing model is covered in Policies and Channels.

Incident lifecycle

When a rule fires, an incident is created. Incidents move through five states: pending (just created), firing (notification dispatched), acknowledged (a human marked it seen), snoozed (silenced for a window), resolved (the condition cleared). The state column is visible on every incident; transitions are logged on the timeline.

A rule that's currently muted, disabled, or degraded won't fire new incidents — see Rules for the difference between those states.

Next steps

Rules — the shape of a rule, the four type-specific condition panels, and how previewing works.
Policies — routing from rule output to channels.

Core Concepts

Smart Alerting

How Kubeadapt's alerts work — threshold checks against a rolling baseline, not z-score anomaly math — and the four alert types you can configure.

Alerts are threshold-based, not anomaly-based

The four user-configurable alert types

Alert type	Fires when…	Common scope
`cost_spike`	Daily spend rises sharply against the rolling baseline for the same day type (weekday/weekend).	Cluster, team
`budget_threshold`	Period-to-date spend crosses 50%, 80%, 90%, or 100% of a budget you set.	Org, department
`unused_resources`	A scope's idle resources cross a configured percentage. Delivered as a weekly digest.	Cluster
`new_expensive_workload`	A workload appears that's projected to cost more than a configured monthly threshold.	Cluster, team

You configure each type's conditions on a Rule. One Rule can enable any combination of the four. See Rules for the rule shape.

How cost-spike actually works

Of the four, Cost spike is the only one with a non-trivial baseline. The spec deserves the detail.

Two reasons for the day-type bucketing:

Weekend traffic is structurally different from weekday traffic in most production environments. Comparing Sunday's spend to last Tuesday's baseline produces noise.
The bucket-by-day-type approach holds up against a single one-off day. A Black Friday spike in a 30-day window won't poison the next 29 days' baselines because most days are still ordinary.

Confirmation windows

The threshold, the floor, and the confirmation window that decide what counts as "qualifying" all adapt to the scope's spending pattern — there are no knobs on the rule itself.

The other three alert types

Scope: where a rule applies

Every rule has a scope. The valid kinds:

organization — every cluster in your org.
cluster — a single cluster.
namespace — a namespace within a single cluster.
team — workloads attributed to a Team (via labels or Assignment Workbench).
department — workloads attributed to a Department.

Where alerts go

The full routing model is covered in Policies and Channels.

Incident lifecycle

A rule that's currently muted, disabled, or degraded won't fire new incidents — see Rules for the difference between those states.

Next steps

Rules — the shape of a rule, the four type-specific condition panels, and how previewing works.
Policies — routing from rule output to channels.

Smart Alerting

Alerts are threshold-based, not anomaly-based

The four user-configurable alert types

How cost-spike actually works

Confirmation windows

The other three alert types

Scope: where a rule applies

Where alerts go

Incident lifecycle

Next steps

Related

Smart Alerting

Alerts are threshold-based, not anomaly-based

The four user-configurable alert types

How cost-spike actually works

Confirmation windows

The other three alert types

Scope: where a rule applies

Where alerts go

Incident lifecycle

Next steps

Related