The Quiet RI Bleed: Detecting Commitment Loss Before It Compounds

Most cloud cost alerts fire when something goes up. Commitment loss is the anomaly that goes the other direction โ€” and that is precisely why it is so difficult to catch.

Your total bill can stay flat or even fall while commitment waste climbs from 8% to 93% of your Reserved Instance portfolio in a single week. Standard threshold alerting sees nothing. No spike. No new line item. No alert fires. The signal lives entirely in a ratio: what fraction of your committed spend is currently covering actual usage โ€” and that ratio is not something most billing dashboards display by default.

Without active monitoring, RI portfolios drift to 40โ€“60% utilization within 12 months. At 60% utilization, a commitment that was supposed to save money starts costing more than paying on-demand would have.

Two Failure Modes

Commitment loss appears in two distinct temporal patterns that require different detection logic:

ModePatternCommon Trigger
The cliff Utilization drops 40โ€“100 percentage points within a single billing week Workload decommissioned, instance family migration (m5 โ†’ Graviton), lift-and-shift to containers, VM scope change
The drift Utilization declines 2โ€“5 percentage points per week over months Team headcount reduction, incremental right-sizing, seasonal off-peak, monolith refactored into microservices

Both produce the same eventual billing signature โ€” growing CommitmentDiscountStatus = Unused rows โ€” but the cliff is a level-break signal while the drift is a slope signal. Detecting one without the other misses half the problem.

What Appears in FOCUS Billing Data

The key insight about commitment waste: BilledCost does not show it. For unused commitment rows, BilledCost = $0. The commitment was pre-paid or amortized โ€” billing is already accounted for. The signal lives in EffectiveCost, which carries the amortized cost of the commitment period regardless of whether the benefit was applied.

FOCUS FieldHealthy (90%+ utilization)Commitment loss (<40% utilization)
CommitmentDiscountStatus Mostly Used, small Unused remainder Mostly Unused, minimal Used
BilledCost (Unused rows) $0 $0 โ€” unchanged; this is why the bill looks normal
EffectiveCost (Unused rows) 5โ€“15% of commitment cost (normal buffer) 60โ€“100% of commitment cost (most of the commitment is wasted)
EffectiveCost (Used rows) 85โ€“95% of commitment cost (covered usage) Small or zero (very little being covered)
CommitmentDiscountId Active ARN/ID with matching usage Same active ARN/ID โ€” commitment is not expired, just unmatched

The waste ratio is the primary metric: SUM(EffectiveCost WHERE CommitmentDiscountStatus='Unused') / SUM(EffectiveCost WHERE CommitmentDiscountId IS NOT NULL). Healthy is below 0.15. Alert threshold: 0.30. Crisis threshold: 0.60.

The Cliff in Detail

A concrete example: 10 EC2 m5.2xlarge Reserved Instances at $112/instance/month, running a fleet at 92% utilization. Monthly commitment waste: $89.60 (8% of $1,120). Acceptable.

The fleet migrates to EKS. EC2 instances are terminated. The 10 RIs remain active โ€” they were purchased for 1 or 3 years. Within one billing week:

The total bill may actually fall โ€” the on-demand EC2 charges disappeared with the fleet. But commitment waste jumped 10ร— in a week. Without a waste_ratio monitor, this goes undetected until someone reviews the RI dashboard at quarter-end.

Common cliff triggers:

The Drift in Detail

The drift scenario is slower and harder to see precisely because it is gradual. No single event causes it. A team of 50 engineers shrinks to 30; their compute contracts proportionally. An annual RI portfolio is purchased at Q4 peak traffic; workload runs at 60% of peak for the next 11 months. A monolith is refactored into microservices over 6 months, each step reducing per-service compute requirements.

Drift trajectory over 60 days at 1.5 percentage points per week:

MonthUtilizationWaste RatioMonthly Waste (on $1,120 portfolio)
Start90%0.10$112
Month 175%0.25$280
Month 255%0.45$504
Month 340%0.60$672

By month 3, you are paying $672/month for committed capacity that covers $448/month of actual usage. You would pay less on on-demand โ€” the commitment is now working against you. This is the point where "I bought reservations to save money" becomes factually incorrect.

Industry data (Flexera, FinOps Foundation): without active quarterly review, RI portfolios reach this state routinely. The FinOps Foundation's healthy utilization benchmark is 85%+. Most unmanaged portfolios fall below 70% within a year.

Provider Notes

FOCUS CommitmentDiscountStatus is the normalizing field across providers. The underlying CUR/billing export fields differ:

ProviderCommitment TypeWaste Field
AWS EC2 RIs 1- or 3-year, per-instance-type reservation/UnusedQuantity, reservation/UnusedRecurringFee
AWS Savings Plans $/hour commitment for 1 or 3 years savingsplan/SavingsPlanUnusedCommitment per hour
Azure Reserved VM Instances 1- or 3-year, per-SKU Reservation utilization % in Cost Management; "No Benefit" status for fully stranded
GCP Committed Use Discounts 1- or 3-year vCPU/memory or $/hour Committed vCPU/memory not consumed; commitment fee rows without credit rows

A FOCUS-based query covers all four with a single CommitmentDiscountStatus = 'Unused' filter, normalized across providers by the schema.

Detection Logic

Three detection conditions, any of which is sufficient to flag:

-- Condition A: Cliff detection (level-break)
current 7-day waste_ratio - trailing 30-day waste_ratio > 0.20
(waste_ratio jumped 20+ percentage points in a week)

-- Condition B: Drift detection (slope)
linear slope of daily waste_ratio over 30 days > 0.005/day
(0.5 percentage points per day = 15pp per month โ€” rapid drift)

-- Condition C: Total stranding
any CommitmentDiscountId with zero 'Used' rows
for 7+ consecutive days
(commitment generating zero benefit for a week or more)

-- Dollar floor for all conditions
SUM(Unused EffectiveCost) > $500/month
(suppress noise on micro-commitments)

The cliff and drift require different logic because the signal is different: a sudden jump vs. a sustained trend. Condition C catches the worst case โ€” a commitment that is completely unmatched โ€” regardless of how it got there.

What Does NOT Show Up in Billing

Understanding the gaps prevents false assumptions about what billing data can tell you:

Fix Checklist

  1. For cliffs: Sell or modify the stranded RI immediately. AWS RI Marketplace accepts convertible and some standard RIs. If the commitment cannot be sold, convert it to a different instance type or family that your remaining workloads can absorb. Every day at zero utilization is the maximum possible waste rate.
  2. For drifts: Do not buy new commitments until the existing portfolio is back above 80% utilization. Right-size the commitment portfolio by modifying convertible RIs or letting standard RIs expire without renewal. Use on-demand for the coverage gap during the transition.
  3. For Azure scope misconfiguration: Verify that reservation scope matches the subscription and resource group of the running VMs. A reservation scoped to "Subscription A" provides zero benefit to VMs in "Subscription B." Fix scope in the Azure portal under Reservations โ†’ Manage.
  4. Establish a quarterly RI review cadence: commitment waste compounds silently. A 90-minute quarterly review of waste_ratio per commitment, flagging anything below 80% utilization, prevents the drift scenario from reaching crisis level.
  5. For new commitments: buy Compute Savings Plans rather than instance-family RIs wherever possible. Compute Savings Plans flex across EC2 instance families, AZs, and OS types โ€” they are significantly harder to strand via architectural change.

See if this pattern is in your billing data

The 5-question DropInFinOps assessment takes 2 minutes and tells you which anomaly patterns your current billing setup is positioned to catch โ€” and which ones are slipping through.

Take the free assessment โ†’