The Quiet RI Bleed: Detecting Commitment Loss Before It Compounds
Most cloud cost alerts fire when something goes up. Commitment loss is the anomaly that goes the other direction โ and that is precisely why it is so difficult to catch.
Your total bill can stay flat or even fall while commitment waste climbs from 8% to 93% of your Reserved Instance portfolio in a single week. Standard threshold alerting sees nothing. No spike. No new line item. No alert fires. The signal lives entirely in a ratio: what fraction of your committed spend is currently covering actual usage โ and that ratio is not something most billing dashboards display by default.
Without active monitoring, RI portfolios drift to 40โ60% utilization within 12 months. At 60% utilization, a commitment that was supposed to save money starts costing more than paying on-demand would have.
Two Failure Modes
Commitment loss appears in two distinct temporal patterns that require different detection logic:
| Mode | Pattern | Common Trigger |
|---|---|---|
| The cliff | Utilization drops 40โ100 percentage points within a single billing week | Workload decommissioned, instance family migration (m5 โ Graviton), lift-and-shift to containers, VM scope change |
| The drift | Utilization declines 2โ5 percentage points per week over months | Team headcount reduction, incremental right-sizing, seasonal off-peak, monolith refactored into microservices |
Both produce the same eventual billing signature โ growing CommitmentDiscountStatus = Unused rows โ but the cliff is a level-break signal while the drift is a slope signal. Detecting one without the other misses half the problem.
What Appears in FOCUS Billing Data
The key insight about commitment waste: BilledCost does not show it. For unused commitment rows, BilledCost = $0. The commitment was pre-paid or amortized โ billing is already accounted for. The signal lives in EffectiveCost, which carries the amortized cost of the commitment period regardless of whether the benefit was applied.
| FOCUS Field | Healthy (90%+ utilization) | Commitment loss (<40% utilization) |
|---|---|---|
CommitmentDiscountStatus |
Mostly Used, small Unused remainder |
Mostly Unused, minimal Used |
BilledCost (Unused rows) |
$0 | $0 โ unchanged; this is why the bill looks normal |
EffectiveCost (Unused rows) |
5โ15% of commitment cost (normal buffer) | 60โ100% of commitment cost (most of the commitment is wasted) |
EffectiveCost (Used rows) |
85โ95% of commitment cost (covered usage) | Small or zero (very little being covered) |
CommitmentDiscountId |
Active ARN/ID with matching usage | Same active ARN/ID โ commitment is not expired, just unmatched |
The waste ratio is the primary metric: SUM(EffectiveCost WHERE CommitmentDiscountStatus='Unused') / SUM(EffectiveCost WHERE CommitmentDiscountId IS NOT NULL). Healthy is below 0.15. Alert threshold: 0.30. Crisis threshold: 0.60.
The Cliff in Detail
A concrete example: 10 EC2 m5.2xlarge Reserved Instances at $112/instance/month, running a fleet at 92% utilization. Monthly commitment waste: $89.60 (8% of $1,120). Acceptable.
The fleet migrates to EKS. EC2 instances are terminated. The 10 RIs remain active โ they were purchased for 1 or 3 years. Within one billing week:
CommitmentDiscountStatus = Usedfraction: 0.92 โ 0.07CommitmentDiscountStatus = Unusedfraction: 0.08 โ 0.93- Monthly commitment waste: $89.60 โ $1,041.60
- Waste ratio: 0.08 โ 0.93
The total bill may actually fall โ the on-demand EC2 charges disappeared with the fleet. But commitment waste jumped 10ร in a week. Without a waste_ratio monitor, this goes undetected until someone reviews the RI dashboard at quarter-end.
Common cliff triggers:
- EC2 fleet migrated to EKS or Fargate โ RI tied to
m5.2xlargefinds no matching on-demand usage - Instance family upgrade:
m5โm7gGraviton โ Standard RIs don't flex across instance families - Azure VM SKU change:
D2_v3โD2s_v3โ reservation benefit stops applying immediately - Azure scope misconfiguration: VM moved from Subscription A to Subscription B while reservation is scoped to A
- GCP resource-based CUD: workload right-sized below committed vCPU โ commitment fee continues, credit disappears
The Drift in Detail
The drift scenario is slower and harder to see precisely because it is gradual. No single event causes it. A team of 50 engineers shrinks to 30; their compute contracts proportionally. An annual RI portfolio is purchased at Q4 peak traffic; workload runs at 60% of peak for the next 11 months. A monolith is refactored into microservices over 6 months, each step reducing per-service compute requirements.
Drift trajectory over 60 days at 1.5 percentage points per week:
| Month | Utilization | Waste Ratio | Monthly Waste (on $1,120 portfolio) |
|---|---|---|---|
| Start | 90% | 0.10 | $112 |
| Month 1 | 75% | 0.25 | $280 |
| Month 2 | 55% | 0.45 | $504 |
| Month 3 | 40% | 0.60 | $672 |
By month 3, you are paying $672/month for committed capacity that covers $448/month of actual usage. You would pay less on on-demand โ the commitment is now working against you. This is the point where "I bought reservations to save money" becomes factually incorrect.
Industry data (Flexera, FinOps Foundation): without active quarterly review, RI portfolios reach this state routinely. The FinOps Foundation's healthy utilization benchmark is 85%+. Most unmanaged portfolios fall below 70% within a year.
Provider Notes
FOCUS CommitmentDiscountStatus is the normalizing field across providers. The underlying CUR/billing export fields differ:
| Provider | Commitment Type | Waste Field |
|---|---|---|
| AWS EC2 RIs | 1- or 3-year, per-instance-type | reservation/UnusedQuantity, reservation/UnusedRecurringFee |
| AWS Savings Plans | $/hour commitment for 1 or 3 years | savingsplan/SavingsPlanUnusedCommitment per hour |
| Azure Reserved VM Instances | 1- or 3-year, per-SKU | Reservation utilization % in Cost Management; "No Benefit" status for fully stranded |
| GCP Committed Use Discounts | 1- or 3-year vCPU/memory or $/hour | Committed vCPU/memory not consumed; commitment fee rows without credit rows |
A FOCUS-based query covers all four with a single CommitmentDiscountStatus = 'Unused' filter, normalized across providers by the schema.
Detection Logic
Three detection conditions, any of which is sufficient to flag:
-- Condition A: Cliff detection (level-break)
current 7-day waste_ratio - trailing 30-day waste_ratio > 0.20
(waste_ratio jumped 20+ percentage points in a week)
-- Condition B: Drift detection (slope)
linear slope of daily waste_ratio over 30 days > 0.005/day
(0.5 percentage points per day = 15pp per month โ rapid drift)
-- Condition C: Total stranding
any CommitmentDiscountId with zero 'Used' rows
for 7+ consecutive days
(commitment generating zero benefit for a week or more)
-- Dollar floor for all conditions
SUM(Unused EffectiveCost) > $500/month
(suppress noise on micro-commitments)
The cliff and drift require different logic because the signal is different: a sudden jump vs. a sustained trend. Condition C catches the worst case โ a commitment that is completely unmatched โ regardless of how it got there.
What Does NOT Show Up in Billing
Understanding the gaps prevents false assumptions about what billing data can tell you:
- The decommissioning event itself โ stopping instances or running
terraform destroyproduces no billing line items. The event is invisible; only the absence of matching usage rows is observable. - CPU utilization within committed instances โ a 100% committed RI covering an idle instance still shows 100% utilization in billing. Waste in billing means uncovered commitment, not inefficient instance usage. (That latter problem is QB1's domain.)
- Size-flexible RI normalization โ AWS applies large RIs across multiple smaller instances using normalization factors. This can produce apparent
UnusedQuantitythat represents actual coverage. Cross-check againstEffectiveCostnon-zero before flagging.
Fix Checklist
- For cliffs: Sell or modify the stranded RI immediately. AWS RI Marketplace accepts convertible and some standard RIs. If the commitment cannot be sold, convert it to a different instance type or family that your remaining workloads can absorb. Every day at zero utilization is the maximum possible waste rate.
- For drifts: Do not buy new commitments until the existing portfolio is back above 80% utilization. Right-size the commitment portfolio by modifying convertible RIs or letting standard RIs expire without renewal. Use on-demand for the coverage gap during the transition.
- For Azure scope misconfiguration: Verify that reservation scope matches the subscription and resource group of the running VMs. A reservation scoped to "Subscription A" provides zero benefit to VMs in "Subscription B." Fix scope in the Azure portal under Reservations โ Manage.
- Establish a quarterly RI review cadence: commitment waste compounds silently. A 90-minute quarterly review of waste_ratio per commitment, flagging anything below 80% utilization, prevents the drift scenario from reaching crisis level.
- For new commitments: buy Compute Savings Plans rather than instance-family RIs wherever possible. Compute Savings Plans flex across EC2 instance families, AZs, and OS types โ they are significantly harder to strand via architectural change.
See if this pattern is in your billing data
The 5-question DropInFinOps assessment takes 2 minutes and tells you which anomaly patterns your current billing setup is positioned to catch โ and which ones are slipping through.
Take the free assessment โ