Cloud Billing ETL Architecture: Building the Pipeline That Feeds Cost and Security Detection

One FOCUS-native pipeline โ€” AWS Data Exports โ†’ S3 โ†’ Glue โ†’ Athena โ€” feeds both cost anomaly detection and security billing signals. Here is how to build it correctly, including the idempotency trap most implementations get wrong.

A billing ETL pipeline is not a reporting tool. It is the data layer that makes every downstream capability possible: cost anomaly detection, security billing signals, AI spend attribution, commitment utilization tracking. Get the pipeline right and all of those come for free. Build it wrong โ€” wrong format, wrong partition strategy, wrong idempotency model โ€” and every query that runs against it is compromised at the foundation.

This article covers the production architecture for a FOCUS-native billing pipeline: the three stages, the configuration choices that matter, the idempotency quirk specific to billing data, and the Terraform to deploy it.

Why FOCUS, Not CUR

The architectural choice at the top of the pipeline โ€” CUR 2.0 or FOCUS โ€” determines the schema everything downstream depends on. The case for FOCUS 1.2 as the pipeline format:

The Idempotency Quirk You Must Understand First

AWS billing exports are snapshots, not appends. Each delivery does not contain only new charges since the last delivery โ€” it contains all charges for the entire billing period up to the delivery timestamp. AWS delivers updated snapshots up to three times daily, and each delivery replaces the prior snapshot for that billing period.

This means a naive ETL that appends each delivery to a database table will create duplicate rows. The correct model: each delivery lands in S3 at a path that includes the billing period and delivery timestamp. The downstream Athena table points at the latest delivery per billing period. On each new delivery, you either replace the partition or let Athena's S3 path partitioning handle reading only the latest file.

The practical implementation: partition your S3 bucket by billing period (billing-period=2026-04), and within each billing period, overwrite with each new delivery. Athena reads all files in the partition โ€” which is fine as long as each delivery is a complete replacement, not a delta. Configure your Glue crawler to update the partition definition rather than add to it.

Stage 1 โ€” AWS Data Exports to S3

# Terraform: S3 bucket for billing exports
resource "aws_s3_bucket" "billing" {
  bucket = "${var.prefix}-billing-exports"

  tags = {
    Purpose     = "billing-etl-source"
    ManagedBy   = "terraform"
    Environment = var.environment
  }
}

resource "aws_s3_bucket_versioning" "billing" {
  bucket = aws_s3_bucket.billing.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_public_access_block" "billing" {
  bucket                  = aws_s3_bucket.billing.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

After creating the bucket, configure the export in AWS Console โ†’ Cost Management โ†’ Data Exports โ†’ Create export:

AWS will deliver files at a path like: s3://bucket/focus/FOCUS-YYYY-MM/YYYYMMDDTHHMMSSZ/. The timestamp subfolder is the delivery timestamp โ€” multiple deliveries per billing period each land in their own subfolder.

Stage 2 โ€” Glue Catalog

The Glue Data Catalog is the metadata layer that makes the S3 files queryable in Athena. A Glue crawler scans the S3 path, detects the Parquet schema, and creates or updates the table definition automatically.

resource "aws_glue_catalog_database" "billing" {
  name        = "${var.prefix}_billing"
  description = "FOCUS billing data โ€” cost and security detection"
}

resource "aws_iam_role" "glue_crawler" {
  name = "${var.prefix}-glue-crawler"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { Service = "glue.amazonaws.com" }
      Action    = "sts:AssumeRole"
    }]
  })
}

resource "aws_iam_role_policy_attachment" "glue_service" {
  role       = aws_iam_role.glue_crawler.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole"
}

resource "aws_iam_role_policy" "glue_s3" {
  name = "glue-s3-billing-access"
  role = aws_iam_role.glue_crawler.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = ["s3:GetObject", "s3:ListBucket"]
      Resource = [
        aws_s3_bucket.billing.arn,
        "${aws_s3_bucket.billing.arn}/*"
      ]
    }]
  })
}

resource "aws_glue_crawler" "billing" {
  name          = "${var.prefix}-billing-crawler"
  role          = aws_iam_role.glue_crawler.arn
  database_name = aws_glue_catalog_database.billing.name

  s3_target {
    path = "s3://${aws_s3_bucket.billing.bucket}/focus/"
  }

  # IMPORTANT: Use MERGE_NEW_COLUMNS, not NONE.
  # AWS adds x_ columns to FOCUS exports without notice.
  # MERGE_NEW_COLUMNS adds them to the catalog without breaking existing queries.
  schema_change_policy {
    update_behavior = "UPDATE_IN_DATABASE"
    delete_behavior = "DEPRECATE_IN_DATABASE"
  }

  configuration = jsonencode({
    Version = 1.0
    CrawlerOutput = {
      Partitions = { AddOrUpdateBehavior = "InheritFromTable" }
    }
    Grouping = {
      TableGroupingPolicy = "CombineCompatibleSchemas"
    }
  })

  # Run three times daily to match AWS export delivery frequency
  schedule = "cron(0 6,12,18 * * ? *)"

  tags = {
    ManagedBy   = "terraform"
    Environment = var.environment
  }
}

The schema change policy matters. AWS silently adds new columns to FOCUS exports when they update the spec. UPDATE_IN_DATABASE ensures new columns are picked up automatically. DEPRECATE_IN_DATABASE on delete means removed columns are marked deprecated rather than dropped โ€” preserving query compatibility with historical data that still contains those columns.

Stage 3 โ€” Athena Workgroup

resource "aws_athena_workgroup" "billing" {
  name        = "${var.prefix}-billing"
  description = "Billing analytics โ€” cost detection and security signals"

  configuration {
    enforce_workgroup_configuration    = true
    publish_cloudwatch_metrics_enabled = true

    result_configuration {
      output_location = "s3://${aws_s3_bucket.billing.bucket}/athena-results/"

      encryption_configuration {
        encryption_option = "SSE_S3"
      }
    }

    # Cap per-query data scanned to prevent accidental full-table scans
    # $5/TB ร— 100GB cap = $0.50 max per query
    bytes_scanned_cutoff_per_query = 107374182400  # 100 GB
  }

  tags = {
    ManagedBy   = "terraform"
    Environment = var.environment
  }
}

output "athena_database" {
  value = aws_glue_catalog_database.billing.name
}

output "athena_workgroup" {
  value = aws_athena_workgroup.billing.name
}

Near-Real-Time Trigger: S3 Event โ†’ Lambda

The Glue crawler runs on a schedule (three times daily). For security signals โ€” new-region detection, credential compromise โ€” you want to run behavioral queries within minutes of a new export landing, not hours. An S3 event notification triggers a Lambda function on each new export delivery:

resource "aws_s3_bucket_notification" "billing_export" {
  bucket = aws_s3_bucket.billing.id

  lambda_function {
    lambda_function_arn = aws_lambda_function.billing_processor.arn
    events              = ["s3:ObjectCreated:*"]
    filter_prefix       = "focus/"
    filter_suffix       = ".parquet"
  }
}

resource "aws_lambda_permission" "s3_invoke" {
  statement_id  = "AllowS3Invoke"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.billing_processor.function_name
  principal     = "s3.amazonaws.com"
  source_arn    = aws_s3_bucket.billing.arn
}

The Lambda function that receives this event kicks off a short Athena query โ€” the new-region detection query from the CUR Deep Dive article โ€” against the freshly landed partition. Latency from export delivery to alert: under 90 seconds. Compare this to AWS Cost Anomaly Detection's 24-hour latency โ€” for security signals, the difference matters.

Partition Strategy for Multi-Account Organizations

For organizations with multiple AWS accounts, the partition structure determines query performance and cost. Two viable strategies:

StrategyS3 pathBest forTradeoff
Billing period only focus/billing-period=2026-04/ Small orgs, consolidated billing, most queries span all accounts Every query scans all accounts in the period โ€” Athena cost scales with org size
Billing period + account focus/billing-period=2026-04/account=123456789/ Large orgs, account-scoped detection, per-account security alerting More partitions, faster account-scoped queries, Glue crawler takes longer

For most teams: start with billing-period-only partitioning. Add account-level partitioning when Athena query costs on the billing data exceed $10/month โ€” that signals query volume high enough to justify the partition overhead.

Two Workloads, One Pipeline

The same Athena table feeds both cost detection and security detection. This is not incidental โ€” it is the architectural advantage of treating billing data as a first-class event log rather than an accounting artifact.

Cost workload queries (daily schedule): orphaned KB detection (OpenSearch OCU with no Bedrock inference), idle SageMaker endpoint hours (endpoint-hours with zero invocations cross-referenced in CloudWatch), commitment utilization drift (EffectiveCost diverging from BilledCost on Reserved Instance rows).

Security workload queries (triggered on new export delivery): new-region zero-threshold detection (SubAccountId + RegionId combination not seen in prior 90 days), new service type (ServiceName not seen in prior 30 days), data transfer spike relative to compute (egress-to-compute ratio inversion).

Both workloads run against the same table, the same partitions, the same schema. There is no second pipeline for security, no separate data store, no additional ingestion cost. The billing data is the security sensor โ€” if you have already built the cost pipeline, the security layer costs nothing to add.

Schema Evolution

AWS updates FOCUS exports when the spec advances (1.0 โ†’ 1.1 โ†’ 1.2 โ†’ 1.3). The Glue crawler with UPDATE_IN_DATABASE handles new column additions automatically. What requires manual handling:


See which detection patterns your pipeline is ready to run

The DropInFinOps free assessment maps your current billing export format and pipeline setup against the behavioral query library โ€” showing which cost and security patterns are queryable today.

Take the free assessment โ†’