Cloud Billing ETL Architecture: Building the Pipeline That Feeds Cost and Security Detection
One FOCUS-native pipeline โ AWS Data Exports โ S3 โ Glue โ Athena โ feeds both cost anomaly detection and security billing signals. Here is how to build it correctly, including the idempotency trap most implementations get wrong.
A billing ETL pipeline is not a reporting tool. It is the data layer that makes every downstream capability possible: cost anomaly detection, security billing signals, AI spend attribution, commitment utilization tracking. Get the pipeline right and all of those come for free. Build it wrong โ wrong format, wrong partition strategy, wrong idempotency model โ and every query that runs against it is compromised at the foundation.
This article covers the production architecture for a FOCUS-native billing pipeline: the three stages, the configuration choices that matter, the idempotency quirk specific to billing data, and the Terraform to deploy it.
Why FOCUS, Not CUR
The architectural choice at the top of the pipeline โ CUR 2.0 or FOCUS โ determines the schema everything downstream depends on. The case for FOCUS 1.2 as the pipeline format:
- Stable, standardized schema. FOCUS 1.2 has 48 columns. CUR 2.0 has 300+, with service-specific columns added and removed as AWS changes billing mechanics. A FOCUS pipeline has a stable DDL; a CUR pipeline requires ongoing schema maintenance as CUR evolves.
- ServiceName is normalized. CUR's
product/serviceNamehas inconsistent casing and formatting across services. FOCUSServiceNameis standardized โAmazon OpenSearch Service,Amazon Bedrockโ making LIKE pattern matching reliable without account-specific normalization tables. - Multi-cloud compatible. Azure and GCP both publish FOCUS exports. The same Athena table schema, the same detection queries, the same ETL pipeline handles all three providers. CUR is AWS-only.
- CUR 2.0 has one advantage. Split Cost Allocation Data for ECS and EKS container-level cost breakdown is only available in CUR 2.0. If you need container-level attribution, run CUR 2.0 in parallel for that specific use case. For all other detection and analysis, FOCUS 1.2 is the right choice.
The Idempotency Quirk You Must Understand First
AWS billing exports are snapshots, not appends. Each delivery does not contain only new charges since the last delivery โ it contains all charges for the entire billing period up to the delivery timestamp. AWS delivers updated snapshots up to three times daily, and each delivery replaces the prior snapshot for that billing period.
This means a naive ETL that appends each delivery to a database table will create duplicate rows. The correct model: each delivery lands in S3 at a path that includes the billing period and delivery timestamp. The downstream Athena table points at the latest delivery per billing period. On each new delivery, you either replace the partition or let Athena's S3 path partitioning handle reading only the latest file.
The practical implementation: partition your S3 bucket by billing period (billing-period=2026-04), and within each billing period, overwrite with each new delivery. Athena reads all files in the partition โ which is fine as long as each delivery is a complete replacement, not a delta. Configure your Glue crawler to update the partition definition rather than add to it.
Stage 1 โ AWS Data Exports to S3
# Terraform: S3 bucket for billing exports
resource "aws_s3_bucket" "billing" {
bucket = "${var.prefix}-billing-exports"
tags = {
Purpose = "billing-etl-source"
ManagedBy = "terraform"
Environment = var.environment
}
}
resource "aws_s3_bucket_versioning" "billing" {
bucket = aws_s3_bucket.billing.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_public_access_block" "billing" {
bucket = aws_s3_bucket.billing.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
After creating the bucket, configure the export in AWS Console โ Cost Management โ Data Exports โ Create export:
- Type: FOCUS 1.2 with AWS columns
- Granularity: Hourly (daily misses intra-day anomaly signals)
- Format: Parquet with SNAPPY compression
- S3 path prefix:
focus/ - Delivery: Overwrite existing report
AWS will deliver files at a path like: s3://bucket/focus/FOCUS-YYYY-MM/YYYYMMDDTHHMMSSZ/. The timestamp subfolder is the delivery timestamp โ multiple deliveries per billing period each land in their own subfolder.
Stage 2 โ Glue Catalog
The Glue Data Catalog is the metadata layer that makes the S3 files queryable in Athena. A Glue crawler scans the S3 path, detects the Parquet schema, and creates or updates the table definition automatically.
resource "aws_glue_catalog_database" "billing" {
name = "${var.prefix}_billing"
description = "FOCUS billing data โ cost and security detection"
}
resource "aws_iam_role" "glue_crawler" {
name = "${var.prefix}-glue-crawler"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = { Service = "glue.amazonaws.com" }
Action = "sts:AssumeRole"
}]
})
}
resource "aws_iam_role_policy_attachment" "glue_service" {
role = aws_iam_role.glue_crawler.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole"
}
resource "aws_iam_role_policy" "glue_s3" {
name = "glue-s3-billing-access"
role = aws_iam_role.glue_crawler.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = ["s3:GetObject", "s3:ListBucket"]
Resource = [
aws_s3_bucket.billing.arn,
"${aws_s3_bucket.billing.arn}/*"
]
}]
})
}
resource "aws_glue_crawler" "billing" {
name = "${var.prefix}-billing-crawler"
role = aws_iam_role.glue_crawler.arn
database_name = aws_glue_catalog_database.billing.name
s3_target {
path = "s3://${aws_s3_bucket.billing.bucket}/focus/"
}
# IMPORTANT: Use MERGE_NEW_COLUMNS, not NONE.
# AWS adds x_ columns to FOCUS exports without notice.
# MERGE_NEW_COLUMNS adds them to the catalog without breaking existing queries.
schema_change_policy {
update_behavior = "UPDATE_IN_DATABASE"
delete_behavior = "DEPRECATE_IN_DATABASE"
}
configuration = jsonencode({
Version = 1.0
CrawlerOutput = {
Partitions = { AddOrUpdateBehavior = "InheritFromTable" }
}
Grouping = {
TableGroupingPolicy = "CombineCompatibleSchemas"
}
})
# Run three times daily to match AWS export delivery frequency
schedule = "cron(0 6,12,18 * * ? *)"
tags = {
ManagedBy = "terraform"
Environment = var.environment
}
}
The schema change policy matters. AWS silently adds new columns to FOCUS exports when they update the spec. UPDATE_IN_DATABASE ensures new columns are picked up automatically. DEPRECATE_IN_DATABASE on delete means removed columns are marked deprecated rather than dropped โ preserving query compatibility with historical data that still contains those columns.
Stage 3 โ Athena Workgroup
resource "aws_athena_workgroup" "billing" {
name = "${var.prefix}-billing"
description = "Billing analytics โ cost detection and security signals"
configuration {
enforce_workgroup_configuration = true
publish_cloudwatch_metrics_enabled = true
result_configuration {
output_location = "s3://${aws_s3_bucket.billing.bucket}/athena-results/"
encryption_configuration {
encryption_option = "SSE_S3"
}
}
# Cap per-query data scanned to prevent accidental full-table scans
# $5/TB ร 100GB cap = $0.50 max per query
bytes_scanned_cutoff_per_query = 107374182400 # 100 GB
}
tags = {
ManagedBy = "terraform"
Environment = var.environment
}
}
output "athena_database" {
value = aws_glue_catalog_database.billing.name
}
output "athena_workgroup" {
value = aws_athena_workgroup.billing.name
}
Near-Real-Time Trigger: S3 Event โ Lambda
The Glue crawler runs on a schedule (three times daily). For security signals โ new-region detection, credential compromise โ you want to run behavioral queries within minutes of a new export landing, not hours. An S3 event notification triggers a Lambda function on each new export delivery:
resource "aws_s3_bucket_notification" "billing_export" {
bucket = aws_s3_bucket.billing.id
lambda_function {
lambda_function_arn = aws_lambda_function.billing_processor.arn
events = ["s3:ObjectCreated:*"]
filter_prefix = "focus/"
filter_suffix = ".parquet"
}
}
resource "aws_lambda_permission" "s3_invoke" {
statement_id = "AllowS3Invoke"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.billing_processor.function_name
principal = "s3.amazonaws.com"
source_arn = aws_s3_bucket.billing.arn
}
The Lambda function that receives this event kicks off a short Athena query โ the new-region detection query from the CUR Deep Dive article โ against the freshly landed partition. Latency from export delivery to alert: under 90 seconds. Compare this to AWS Cost Anomaly Detection's 24-hour latency โ for security signals, the difference matters.
Partition Strategy for Multi-Account Organizations
For organizations with multiple AWS accounts, the partition structure determines query performance and cost. Two viable strategies:
| Strategy | S3 path | Best for | Tradeoff |
|---|---|---|---|
| Billing period only | focus/billing-period=2026-04/ |
Small orgs, consolidated billing, most queries span all accounts | Every query scans all accounts in the period โ Athena cost scales with org size |
| Billing period + account | focus/billing-period=2026-04/account=123456789/ |
Large orgs, account-scoped detection, per-account security alerting | More partitions, faster account-scoped queries, Glue crawler takes longer |
For most teams: start with billing-period-only partitioning. Add account-level partitioning when Athena query costs on the billing data exceed $10/month โ that signals query volume high enough to justify the partition overhead.
Two Workloads, One Pipeline
The same Athena table feeds both cost detection and security detection. This is not incidental โ it is the architectural advantage of treating billing data as a first-class event log rather than an accounting artifact.
Cost workload queries (daily schedule): orphaned KB detection (OpenSearch OCU with no Bedrock inference), idle SageMaker endpoint hours (endpoint-hours with zero invocations cross-referenced in CloudWatch), commitment utilization drift (EffectiveCost diverging from BilledCost on Reserved Instance rows).
Security workload queries (triggered on new export delivery): new-region zero-threshold detection (SubAccountId + RegionId combination not seen in prior 90 days), new service type (ServiceName not seen in prior 30 days), data transfer spike relative to compute (egress-to-compute ratio inversion).
Both workloads run against the same table, the same partitions, the same schema. There is no second pipeline for security, no separate data store, no additional ingestion cost. The billing data is the security sensor โ if you have already built the cost pipeline, the security layer costs nothing to add.
Schema Evolution
AWS updates FOCUS exports when the spec advances (1.0 โ 1.1 โ 1.2 โ 1.3). The Glue crawler with UPDATE_IN_DATABASE handles new column additions automatically. What requires manual handling:
- Column renames: Rare in FOCUS (the spec is designed for stability), but if a column is renamed, existing queries using the old name will fail. The Glue crawler will add the new column and deprecate the old one โ update detection queries to use the new name.
- Value format changes: More common than column renames. If AWS changes how a field like
x_UsageTypeis formatted (e.g. adds a region prefix to usage type values), LIKE pattern queries may need updating. Test detection queries against a fresh export after any AWS billing announcement. - New AWS x_ columns: AWS adds proprietary columns (prefixed
x_) to the FOCUS export without versioning them. The crawler picks these up automatically. No action needed unless you want to use them in queries.
See which detection patterns your pipeline is ready to run
The DropInFinOps free assessment maps your current billing export format and pipeline setup against the behavioral query library โ showing which cost and security patterns are queryable today.
Take the free assessment โ