The Quiet RI Bleed: Detecting Commitment Loss Before It Compounds

Most cloud cost alerts fire when something goes up. Commitment loss is the anomaly that goes the other direction — and that is precisely why it is so difficult to catch.

Your total bill can stay flat or even fall while commitment waste climbs from 8% to 93% of your Reserved Instance portfolio in a single week. Standard threshold alerting sees nothing. No spike. No new line item. No alert fires. The signal lives entirely in a ratio: what fraction of your committed spend is currently covering actual usage — and that ratio is not something most billing dashboards display by default.

Without active monitoring, RI portfolios drift to 40–60% utilization within 12 months. At 60% utilization, a commitment that was supposed to save money starts costing more than paying on-demand would have.

Two Failure Modes

Commitment loss appears in two distinct temporal patterns that require different detection logic:

Mode	Pattern	Common Trigger
The cliff	Utilization drops 40–100 percentage points within a single billing week	Workload decommissioned, instance family migration (m5 → Graviton), lift-and-shift to containers, VM scope change
The drift	Utilization declines 2–5 percentage points per week over months	Team headcount reduction, incremental right-sizing, seasonal off-peak, monolith refactored into microservices

Both produce the same eventual billing signature — growing CommitmentDiscountStatus = Unused rows — but the cliff is a level-break signal while the drift is a slope signal. Detecting one without the other misses half the problem.

What Appears in FOCUS Billing Data

The key insight about commitment waste: BilledCost does not show it. For unused commitment rows, BilledCost = $0. The commitment was pre-paid or amortized — billing is already accounted for. The signal lives in EffectiveCost, which carries the amortized cost of the commitment period regardless of whether the benefit was applied.

FOCUS Field	Healthy (90%+ utilization)	Commitment loss (<40% utilization)
`CommitmentDiscountStatus`	Mostly `Used`, small `Unused` remainder	Mostly `Unused`, minimal `Used`
`BilledCost` (Unused rows)	$0	$0 — unchanged; this is why the bill looks normal
`EffectiveCost` (Unused rows)	5–15% of commitment cost (normal buffer)	60–100% of commitment cost (most of the commitment is wasted)
`EffectiveCost` (Used rows)	85–95% of commitment cost (covered usage)	Small or zero (very little being covered)
`CommitmentDiscountId`	Active ARN/ID with matching usage	Same active ARN/ID — commitment is not expired, just unmatched

The waste ratio is the primary metric: SUM(EffectiveCost WHERE CommitmentDiscountStatus='Unused') / SUM(EffectiveCost WHERE CommitmentDiscountId IS NOT NULL). Healthy is below 0.15. Alert threshold: 0.30. Crisis threshold: 0.60.

The Cliff in Detail

A concrete example: 10 EC2 m5.2xlarge Reserved Instances at $112/instance/month, running a fleet at 92% utilization. Monthly commitment waste: $89.60 (8% of $1,120). Acceptable.

The fleet migrates to EKS. EC2 instances are terminated. The 10 RIs remain active — they were purchased for 1 or 3 years. Within one billing week:

CommitmentDiscountStatus = Used fraction: 0.92 → 0.07
CommitmentDiscountStatus = Unused fraction: 0.08 → 0.93
Monthly commitment waste: $89.60 → $1,041.60
Waste ratio: 0.08 → 0.93

The total bill may actually fall — the on-demand EC2 charges disappeared with the fleet. But commitment waste jumped 10× in a week. Without a waste_ratio monitor, this goes undetected until someone reviews the RI dashboard at quarter-end.

Common cliff triggers:

EC2 fleet migrated to EKS or Fargate — RI tied to m5.2xlarge finds no matching on-demand usage
Instance family upgrade: m5 → m7g Graviton — Standard RIs don't flex across instance families
Azure VM SKU change: D2_v3 → D2s_v3 — reservation benefit stops applying immediately
Azure scope misconfiguration: VM moved from Subscription A to Subscription B while reservation is scoped to A
GCP resource-based CUD: workload right-sized below committed vCPU — commitment fee continues, credit disappears

The Drift in Detail

The drift scenario is slower and harder to see precisely because it is gradual. No single event causes it. A team of 50 engineers shrinks to 30; their compute contracts proportionally. An annual RI portfolio is purchased at Q4 peak traffic; workload runs at 60% of peak for the next 11 months. A monolith is refactored into microservices over 6 months, each step reducing per-service compute requirements.

Drift trajectory over 60 days at 1.5 percentage points per week:

Month	Utilization	Waste Ratio	Monthly Waste (on $1,120 portfolio)
Start	90%	0.10	$112
Month 1	75%	0.25	$280
Month 2	55%	0.45	$504
Month 3	40%	0.60	$672

By month 3, you are paying $672/month for committed capacity that covers $448/month of actual usage. You would pay less on on-demand — the commitment is now working against you. This is the point where "I bought reservations to save money" becomes factually incorrect.

Industry data (Flexera, FinOps Foundation): without active quarterly review, RI portfolios reach this state routinely. The FinOps Foundation's healthy utilization benchmark is 85%+. Most unmanaged portfolios fall below 70% within a year.

Provider Notes

FOCUS CommitmentDiscountStatus is the normalizing field across providers. The underlying CUR/billing export fields differ:

Provider	Commitment Type	Waste Field
AWS EC2 RIs	1- or 3-year, per-instance-type	`reservation/UnusedQuantity`, `reservation/UnusedRecurringFee`
AWS Savings Plans	$/hour commitment for 1 or 3 years	`savingsplan/SavingsPlanUnusedCommitment` per hour
Azure Reserved VM Instances	1- or 3-year, per-SKU	Reservation utilization % in Cost Management; "No Benefit" status for fully stranded
GCP Committed Use Discounts	1- or 3-year vCPU/memory or $/hour	Committed vCPU/memory not consumed; commitment fee rows without credit rows

A FOCUS-based query covers all four with a single CommitmentDiscountStatus = 'Unused' filter, normalized across providers by the schema.

Detection Logic

Three detection conditions, any of which is sufficient to flag:

-- Condition A: Cliff detection (level-break)
current 7-day waste_ratio - trailing 30-day waste_ratio > 0.20
(waste_ratio jumped 20+ percentage points in a week)

-- Condition B: Drift detection (slope)
linear slope of daily waste_ratio over 30 days > 0.005/day
(0.5 percentage points per day = 15pp per month — rapid drift)

-- Condition C: Total stranding
any CommitmentDiscountId with zero 'Used' rows
for 7+ consecutive days
(commitment generating zero benefit for a week or more)

-- Dollar floor for all conditions
SUM(Unused EffectiveCost) > $500/month
(suppress noise on micro-commitments)

The cliff and drift require different logic because the signal is different: a sudden jump vs. a sustained trend. Condition C catches the worst case — a commitment that is completely unmatched — regardless of how it got there.

What Does NOT Show Up in Billing

Understanding the gaps prevents false assumptions about what billing data can tell you:

The decommissioning event itself — stopping instances or running terraform destroy produces no billing line items. The event is invisible; only the absence of matching usage rows is observable.
CPU utilization within committed instances — a 100% committed RI covering an idle instance still shows 100% utilization in billing. Waste in billing means uncovered commitment, not inefficient instance usage. (That latter problem is QB1's domain.)
Size-flexible RI normalization — AWS applies large RIs across multiple smaller instances using normalization factors. This can produce apparent UnusedQuantity that represents actual coverage. Cross-check against EffectiveCost non-zero before flagging.

Fix Checklist

For cliffs: Sell or modify the stranded RI immediately. AWS RI Marketplace accepts convertible and some standard RIs. If the commitment cannot be sold, convert it to a different instance type or family that your remaining workloads can absorb. Every day at zero utilization is the maximum possible waste rate.
For drifts: Do not buy new commitments until the existing portfolio is back above 80% utilization. Right-size the commitment portfolio by modifying convertible RIs or letting standard RIs expire without renewal. Use on-demand for the coverage gap during the transition.
For Azure scope misconfiguration: Verify that reservation scope matches the subscription and resource group of the running VMs. A reservation scoped to "Subscription A" provides zero benefit to VMs in "Subscription B." Fix scope in the Azure portal under Reservations → Manage.
Establish a quarterly RI review cadence: commitment waste compounds silently. A 90-minute quarterly review of waste_ratio per commitment, flagging anything below 80% utilization, prevents the drift scenario from reaching crisis level.
For new commitments: buy Compute Savings Plans rather than instance-family RIs wherever possible. Compute Savings Plans flex across EC2 instance families, AZs, and OS types — they are significantly harder to strand via architectural change.

See if this pattern is in your billing data

The 5-question DropInFinOps assessment takes 2 minutes and tells you which anomaly patterns your current billing setup is positioned to catch — and which ones are slipping through.

Take the free assessment →

The Quiet RI Bleed: Detecting Commitment Loss Before It Compounds

The Quiet RI Bleed: Detecting Commitment Loss Before It Compounds

Two Failure Modes

What Appears in FOCUS Billing Data

The Cliff in Detail

The Drift in Detail

Provider Notes

Detection Logic

What Does NOT Show Up in Billing

Fix Checklist

More from our guides

What is FinOps?

Common AWS Cost Mistakes

Practical AWS Lambda Automations

The Quiet RI Bleed: Detecting Commitment Loss Before It Compounds

The Quiet RI Bleed: Detecting Commitment Loss Before It Compounds

Two Failure Modes

What Appears in FOCUS Billing Data

The Cliff in Detail

The Drift in Detail

Provider Notes

Detection Logic

What Does NOT Show Up in Billing

Fix Checklist

More from our guides

What is FinOps?

Common AWS Cost Mistakes

Practical AWS Lambda Automations

Privacy & Cookie Notice

1. What We Collect

2. How We Use Your Data

3. Where Your Data Lives

4. Legal Basis for Processing

5. Data Retention

6. Cookies & Tracking

7. Your Rights (GDPR)

8. Security

9. Updates