A Deep Dive into AWS Billing: CUR, FOCUS, and the Dataset Most Teams Underuse
Your AWS billing export is not an invoice. It is a timestamped, service-attributed, resource-attributed event log โ the same dataset that detects cost anomalies and security breaches, if you know which fields to query.
Most teams set up Cost Explorer, check the monthly total, and call it visibility. Cost Explorer is useful for trend charts and high-level breakdowns. It is not useful for answering the questions that actually matter: why did this specific resource start costing more on Tuesday, what is the cross-service billing signature of this AI workload, and does this data transfer pattern indicate exfiltration or legitimate use?
Those questions require the raw billing dataset โ and knowing how to structure it for analysis.
Two Export Formats: CUR 2.0 and FOCUS
AWS provides two billing export formats through AWS Data Exports. They are not interchangeable โ they serve different use cases and have meaningfully different schemas.
| Dimension | CUR 2.0 (Legacy path) | FOCUS 1.2 (Recommended path) |
|---|---|---|
| Schema standard | AWS-proprietary, AWS-only | FinOps Open Cost and Usage Specification โ same schema across AWS, Azure, GCP |
| Column count | 300+ columns, many service-specific | 48 columns (43 FOCUS spec + 5 AWS x_ columns) โ compact and queryable |
| ServiceName format | Inconsistent casing and naming across services | Standardized: Amazon OpenSearch Service, Amazon Bedrock โ reliable for pattern matching |
| AI billing fields | Usage type in lineItem/UsageType โ format varies per service |
x_UsageType: SearchOCU, InvokeModelInference โ queryable with LIKE patterns |
| Multi-cloud use | AWS only โ requires schema normalization to join with Azure or GCP data | Join directly with Azure and GCP FOCUS exports โ same column names, same value formats |
| Split cost allocation (ECS/EKS) | Supported | Not yet supported in FOCUS โ use CUR 2.0 if this is required |
| Recommended for new implementations | Only if you need split cost allocation | Yes โ all other use cases |
If you are building a new billing pipeline today, start with FOCUS 1.2. If you have an existing CUR-based pipeline, run FOCUS exports in parallel before migrating โ the field mapping requires attention and some queries need to be rewritten. CUR 2.0 is not being deprecated imminently, but FOCUS is the direction AWS and every major cloud provider are converging on.
Setting Up the Pipeline: AWS Data Exports โ S3 โ Athena
The standard architecture for billing analysis is a three-layer stack: raw export files in S3, an Athena table pointing at those files, and SQL queries for analysis. The entire setup takes under an hour the first time.
Step 1 โ Enable AWS Data Exports
AWS Console โ AWS Cost Management โ Data Exports โ Create export. Configuration choices that matter:
- Export type: FOCUS 1.2 with AWS columns (or CUR 2.0 if you need split cost allocation)
- Granularity: Hourly โ daily exports miss intra-day anomaly signals. Hourly exports are larger but enable detection of cost events within the same billing day they occur.
- File format: Parquet โ 10โ20ร smaller than CSV, significantly faster to query in Athena. There is no cost argument for CSV.
- S3 bucket: A dedicated billing data bucket with versioning enabled and no public access. Never mix billing data with application data โ the IAM separation matters for audit purposes.
- Delivery frequency: AWS delivers updated exports up to three times daily. The export is a snapshot of all charges to date for the billing period โ each delivery overwrites the prior one for that billing period.
Step 2 โ Create the Athena table (FOCUS schema)
CREATE EXTERNAL TABLE focus_billing (
billingaccountid STRING,
billingaccountname STRING,
billingcurrency STRING,
billingperiodend TIMESTAMP,
billingperiodstart TIMESTAMP,
chargeclass STRING,
chargecategory STRING,
chargedescription STRING,
chargefrequency STRING,
chargeperiodend TIMESTAMP,
chargeperiodstart TIMESTAMP,
commitmentdiscountcategory STRING,
commitmentdiscountid STRING,
commitmentdiscountname STRING,
commitmentdiscounttype STRING,
consumedquantity DOUBLE,
consumedunit STRING,
contractedcost DOUBLE,
contractedunitprice DOUBLE,
effectivecost DOUBLE,
invoiceissuer STRING,
listcost DOUBLE,
listunitprice DOUBLE,
pricingcategory STRING,
pricingquantity DOUBLE,
pricingunit STRING,
provider STRING,
publisher STRING,
regionid STRING,
regionname STRING,
resourceid STRING,
resourcename STRING,
resourcetype STRING,
servicecategory STRING,
servicename STRING,
skuid STRING,
skuname STRING,
subaccountid STRING,
subaccountname STRING,
tags STRING,
billedcost DOUBLE,
-- AWS x_ columns
x_costcategories STRING,
x_discounts STRING,
x_operation STRING,
x_servicecode STRING,
x_usagetype STRING
)
STORED AS PARQUET
LOCATION 's3://YOUR-BILLING-BUCKET/focus-export/'
TBLPROPERTIES ('parquet.compress'='SNAPPY');
After creating the table, run MSCK REPAIR TABLE focus_billing to load existing partitions. For ongoing updates, set up an AWS Glue crawler on the same S3 path โ it will pick up new export files automatically without manual table updates.
Step 3 โ Verify the data with a baseline query
-- Sanity check: total billed cost by service, last 30 days
SELECT
servicename,
SUM(billedcost) AS total_cost,
COUNT(DISTINCT subaccountid) AS account_count,
COUNT(DISTINCT resourceid) AS resource_count
FROM focus_billing
WHERE chargeperiodstart >= CAST(DATE_ADD('day', -30, CURRENT_DATE) AS TIMESTAMP)
AND chargecategory = 'Usage'
AND chargeclass = 'Regular'
GROUP BY servicename
ORDER BY total_cost DESC
LIMIT 20;
If this returns results, the pipeline is working. If the table is empty, check S3 bucket permissions, the Glue crawler state, and whether the export has delivered its first file (new exports can take up to 24 hours for the first delivery).
The Fields That Actually Matter
A FOCUS export has 48 columns. In practice, 10โ12 drive almost all useful analysis. Understanding what each actually contains โ and what it does not โ prevents the most common query mistakes.
For cost attribution and anomaly detection
| Field | What it actually contains | Common mistake |
|---|---|---|
ServiceName |
The AWS service responsible for the charge: Amazon OpenSearch Service, Amazon Bedrock, Amazon EC2. Standardized in FOCUS โ reliable for LIKE pattern matching across accounts. |
Treating ServiceName as sufficient attribution for AI workloads. A Bedrock Knowledge Base charge appears as Amazon OpenSearch Service โ ServiceName alone misleads. |
x_UsageType |
The specific billing dimension within a service: SearchOCU, IndexingOCU, InvokeModelInference, BoxUsage:ml.g4dn.xlarge, NatGateway-Bytes. This is the field that distinguishes what is actually being billed. |
Filtering only on ServiceName and missing the usage type breakdown. Two services can share a ServiceName but have entirely different cost drivers in x_UsageType. |
ResourceId |
The ARN or ID of the specific resource being billed. For OpenSearch collections: bedrock-knowledge-base-<uuid>. For EC2: instance ID. Required for identifying the specific resource to remediate. |
Aggregating by ServiceName and assuming all resources in a service behave the same. Anomalies are almost always resource-specific. |
ConsumedQuantity vs BilledCost |
ConsumedQuantity is how much was used (OCU-hours, GB, tokens). BilledCost is what was charged. For OCU billing, ConsumedQuantity is always 1.0 โ the billing floor is in BilledCost, not in usage quantity. |
Using ConsumedQuantity to detect idle resources in OCU-billed services. A perfectly idle OpenSearch collection still shows ConsumedQuantity = 1.0 every hour. |
SubAccountId |
The AWS account ID that incurred the charge. For organizations with many accounts, this is the primary grouping key for cross-account analysis. | Analyzing only at the payer account level and missing account-specific anomalies that average out in aggregation. |
RegionId |
The AWS region where the resource ran: us-east-1, eu-west-1. For security detection: any region that has never appeared in an account's billing history is a zero-threshold signal. |
Not tracking region history. New-region charges are the earliest billing signal of credential compromise โ but only detectable if you know what regions are "normal" for each account. |
The Security Signal Layer
AWS explicitly documents the use of billing data for security risk identification. The AWS Security Maturity Model includes billing alarms as a Quick Win โ the first tier of security controls, before SIEM, before GuardDuty, before any advanced tooling. The rationale: when an attacker compromises AWS credentials, the first observable evidence is almost always a cost event, not a log event.
Three billing signals that have a direct security interpretation:
New-region zero-threshold rule
Any RegionId that has never appeared in an account's billing history is an immediate escalation โ not an alert threshold, a threshold of zero. A legitimate new region deployment goes through a change management process; it does not appear silently in the billing data. Cryptomining campaigns, credential compromise, and shadow IT deployments all share this signature: new region, new service, first charge appearing with no corresponding internal authorization.
-- New region detection: find region/account combinations with no prior 90-day history
WITH historical_regions AS (
SELECT DISTINCT subaccountid, regionid
FROM focus_billing
WHERE chargeperiodstart < CAST(DATE_ADD('day', -7, CURRENT_DATE) AS TIMESTAMP)
AND chargeperiodstart >= CAST(DATE_ADD('day', -90, CURRENT_DATE) AS TIMESTAMP)
AND chargecategory = 'Usage'
),
recent_regions AS (
SELECT DISTINCT subaccountid, regionid, MIN(chargeperiodstart) AS first_seen
FROM focus_billing
WHERE chargeperiodstart >= CAST(DATE_ADD('day', -7, CURRENT_DATE) AS TIMESTAMP)
AND chargecategory = 'Usage'
GROUP BY subaccountid, regionid
)
SELECT
r.subaccountid,
r.regionid,
r.first_seen,
'NEW_REGION' AS signal_type
FROM recent_regions r
LEFT JOIN historical_regions h
ON r.subaccountid = h.subaccountid AND r.regionid = h.regionid
WHERE h.regionid IS NULL
ORDER BY r.first_seen DESC;
Data transfer egress spike
Exfiltration generates DataTransfer-Out-Bytes charges. The signal is not absolute volume โ it is a ratio change: egress increasing faster than compute or API call volume in the same account and period. A legitimate application that grows has both compute and egress growing together. An exfiltration event has egress growing against flat or declining compute.
-- Egress-to-compute ratio anomaly: egress growing faster than API/compute spend
WITH daily_costs AS (
SELECT
subaccountid,
DATE(chargeperiodstart) AS charge_date,
SUM(CASE WHEN x_usagetype LIKE '%DataTransfer-Out%' THEN billedcost ELSE 0 END) AS egress_cost,
SUM(CASE WHEN x_usagetype NOT LIKE '%DataTransfer%' THEN billedcost ELSE 0 END) AS compute_cost
FROM focus_billing
WHERE chargeperiodstart >= CAST(DATE_ADD('day', -14, CURRENT_DATE) AS TIMESTAMP)
AND chargecategory = 'Usage'
GROUP BY subaccountid, DATE(chargeperiodstart)
)
SELECT
subaccountid,
charge_date,
egress_cost,
compute_cost,
CASE WHEN compute_cost > 0 THEN egress_cost / compute_cost ELSE NULL END AS egress_ratio
FROM daily_costs
WHERE egress_cost > 5.0
ORDER BY egress_ratio DESC NULLS LAST
LIMIT 50;
New service in an account
GPU-class compute appearing in an account with no prior GPU spend is a cryptomining signal. A new managed AI service appearing with no prior activity is either shadow IT or credential compromise. The zero-threshold rule applies: any ServiceName + x_UsageType combination not seen in the prior 30 days is worth reviewing within 24 hours.
The Latency Problem โ and How to Work Around It
AWS Cost Anomaly Detection โ the native AWS service for billing-based alerting โ has a published detection latency of up to 24 hours, because it operates on CUR data which delivers with that lag. For a cryptomining campaign that generates hundreds of dollars per hour, 24-hour detection latency means potentially $2,400+ in damage before the first alert fires.
The workaround is to run behavioral queries directly against fresh FOCUS export files. AWS delivers updated billing exports up to three times per day. A Lambda function triggered by an S3 event on new export delivery can run the new-region query within minutes of a new charge appearing โ reducing detection latency from 24 hours to under 30 minutes for the most critical security patterns.
The architecture: S3 event notification โ Lambda trigger โ Athena query โ SNS alert. The entire pipeline runs in under 60 seconds per export delivery. The Lambda function needs only three permissions: S3 read on the billing bucket, Athena query execution, and SNS publish. Cost: negligible โ Athena charges $5 per TB scanned, and a behavioral query against a filtered partition of billing data scans megabytes, not terabytes.
What to Build First
- Enable FOCUS 1.2 exports today. AWS Data Exports โ Create export โ FOCUS 1.2 with AWS columns โ Hourly โ Parquet. Run in parallel with any existing CUR export until you have validated the FOCUS data matches.
- Create the Athena table. Use the DDL above. Run the baseline query to confirm data is flowing. This is your billing source of truth โ everything else depends on it.
- Add the new-region detection query. Schedule it as a daily Athena Named Query or wrap it in a Lambda triggered by new export delivery. Route results to an SNS topic that hits your security channel. This is the highest-ROI security control you can add in under an hour.
- Build service-specific cost baselines. For each major service in your account, compute a 30-day average daily cost by service and region. Any day more than 2ร baseline is an anomaly worth investigating. This catches cost spikes before they become month-end surprises.
- Add AI-specific behavioral queries. The cross-service patterns (OpenSearch OCU with no Bedrock inference, SageMaker endpoint hours with no invocations) are not detectable in Cost Explorer โ they require joining across service rows in the billing dataset. These are the patterns that standard FinOps tooling misses.
See which billing patterns your current setup can detect
The DropInFinOps free assessment maps your billing export configuration against the behavioral query library โ showing which cost and security patterns are detectable today and which require a FOCUS migration or additional instrumentation.
Take the free assessment โ