The Hidden Cross-AZ Tax: Why Network Costs Keep Growing Silently
Most cloud cost anomalies arrive suddenly โ a spike, a new billing line, an unexpected region appearing in your export. Data transfer misconfiguration is different. It was baked into your architecture on day one, and it has been compounding quietly with every gigabyte of traffic ever since.
There is no alert to fire. No threshold to cross. The billing pattern looks completely normal โ networking costs that track proportionally with compute, the way you would expect them to. The only signal is a ratio that is higher than it should be, visible only if you know to look for it.
For many engineering teams, this pattern represents 15โ25% of their total cloud bill โ more than their database costs, more than their Kubernetes nodes โ paid for architecture decisions made years ago that nobody has revisited.
Two Failure Modes, One Signal
Data transfer misconfiguration shows up in two distinct forms. Both produce the same diagnostic signal: networking costs disproportionate to compute costs.
Failure Mode 1: Cross-AZ Traffic
When resources in different Availability Zones communicate, AWS (and GCP, and Azure) charge for every byte that crosses the AZ boundary. The rate is $0.01/GB in each direction โ $0.02/GB effective, because both the sending side and the receiving side are billed.
Common architectures that generate this cost invisibly:
- Kubernetes pods distributed across 3 AZs calling a database primary in a single AZ โ every pod in the "wrong" AZ pays the cross-AZ toll on every request
- Application servers in AZ-A connecting to Redis (AZ-B) and PostgreSQL replica (AZ-C) on every page load
- Load balancers routing traffic to backend instances in different AZs than the originating request
- Istio or service mesh certificate distribution broadcasting across pods spanning multiple AZs
The last example is worth understanding in detail. In one documented case, a single Kubernetes deployment caught in CrashLoopBackOff caused Istio to repeatedly push certificates to all pods across three AZs. The cross-AZ transfer cost for that certificate churn became the single largest line item in the account โ exceeding the cost of a 100-node EKS cluster. After enabling Istio topology-aware routing to prefer same-AZ communication: 95% reduction in daily cross-AZ traffic, 23% reduction in overall AWS costs.
Failure Mode 2: NAT Gateway Misrouting
Private subnets need to reach AWS services (S3, DynamoDB, ECR) or the internet. Many accounts route this traffic through a NAT Gateway, incurring a $0.045/GB processing fee. For S3 and DynamoDB specifically, this is entirely avoidable: Gateway VPC Endpoints route traffic directly to those services at $0.00/GB, bypassing the NAT Gateway entirely.
The compounding failure: a single NAT Gateway deployed in one AZ, serving subnets in all three AZs. This generates two charges simultaneously for cross-AZ subnets:
- Cross-AZ transfer charge: $0.02/GB (to reach the NAT Gateway in the other AZ)
- NAT Gateway processing: $0.045/GB
- Combined: $0.065/GB for traffic that could be $0.00 (S3 via Gateway Endpoint)
What Appears in FOCUS Billing Data
Cross-AZ traffic
| FOCUS Field | AZ-aligned (healthy) | Cross-AZ misconfiguration |
|---|---|---|
ServiceName |
EC2 โ minimal DataTransfer rows | EC2 โ DataTransfer-Regional-Bytes dominates |
X_UsageType |
Minimal *-DataTransfer-Regional-Bytes |
Growing *-DataTransfer-Regional-Bytes rows, two per GB (InterZone-In + InterZone-Out) |
BilledCost (networking vs compute) |
<5% of service compute cost | 15โ30%+ of service compute cost |
| Temporal pattern | Flat, minimal | Grows proportionally with request volume โ no spike, just steady accumulation |
NAT Gateway misrouting
| FOCUS Field | Gateway Endpoint (healthy) | NAT routing (misconfigured) |
|---|---|---|
ServiceName = "VPC" |
Only NatGateway-Hours (hourly provisioning fee, expected) |
NatGateway-Bytes dominates โ 80%+ of VPC spend is byte processing, not hourly fees |
NatGateway-Bytes cost trend |
Flat (no traffic through NAT for S3) | Tracks exactly with S3 access volume โ grows in lockstep |
| Dollar impact | <$50/month (just hourly overhead) | $225โ$8,000+/month per account |
The NAT-S3 cost coupling is the key diagnostic. When NatGateway-Bytes billing moves in lockstep with S3 GetObject charges, S3 traffic is routing through NAT. Add a Gateway Endpoint and that coupling disappears โ NAT costs stay flat while S3 costs continue to scale with your data access patterns.
Real-World Incidents
Cross-AZ as the largest AWS line item (Devora Roth Goldshmidt, engineering case study): For a production Kubernetes workload, cross-AZ transfer charges exceeded the cost of a 100-node EKS cluster โ representing 25% of total AWS billing. Root cause: Istio certificate churn across AZ-misaligned pods. Fix: topology-aware routing to prefer same-AZ pod communication. Result: 95% reduction in daily cross-AZ traffic, 23% reduction in total AWS costs.
ECR through NAT: $8,010/month (CloudZero case study): A containerized workload pulling 178,000 GB of container images from ECR through a NAT Gateway generated $8,010/month in NAT processing charges. Switching to VPC Endpoints for ECR eliminated the charge entirely. The endpoints are free. Implementation time: under an hour.
S3 through NAT: $225/month eliminated in 30 minutes: A production account routing 5 TB/month of S3 traffic through NAT Gateway generated $225/month in NAT processing fees. A free S3 Gateway Endpoint eliminated the charge. The endpoint took under 30 minutes to configure.
GCP cross-zone reduction: 20% total bill reduction (Sachin Arote, GCP case study): Kubernetes pod scheduling changes to prefer same-zone communication, combined with eliminating unnecessary cross-zone logging and monitoring traffic, reduced GCP Network Inter Zone Data Transfer Out charges by 45% โ producing a 20% reduction in total GCP spend.
Detection: Three Sub-Signals
The query looks for any of three conditions in a 30-day billing window:
-- Signal 1: NAT byte processing dominates VPC spend
-- (indicates traffic flowing through NAT that shouldn't be)
SUM(NatGateway-Bytes cost) / SUM(total VPC cost) > 0.60
-- Signal 2: Networking cost disproportionate to compute
-- (indicates cross-AZ architectural misalignment)
SUM(DataTransfer-Regional-Bytes cost) / SUM(EC2 compute cost) > 0.15
-- Signal 3: NAT-S3 cost coupling
-- (confirms S3 traffic routing through NAT Gateway)
CORR(daily NatGateway-Bytes cost, daily S3 access cost) > 0.75
-- Dollar floor: suppress noise on tiny accounts
combined networking waste > $100/month
Any one signal is sufficient to flag. Signal 3 (NAT-S3 coupling) is the most actionable: it pinpoints the fix (add S3 Gateway Endpoint) with near certainty. Signals 1 and 2 require more investigation to determine whether the root cause is NAT misrouting, AZ misalignment, or both.
Provider Notes
GCP: Look for the SKU "Network Inter Zone Data Transfer Out" under Compute Engine billing. Intra-zone traffic is free; any cross-zone traffic generates this charge. Detection ratio: if this SKU exceeds 15% of Compute Engine spend for the same project, AZ-alignment is worth investigating.
Azure: Look for MeterCategory = "Virtual Network" with MeterName = "VNet Peering". Regional VNet peering costs $0.01/GB at both ends โ $0.02/GB effective. Global VNet peering is $0.035โ0.07/GB. Hub-and-spoke architectures with Azure Firewall can multiply charges by routing all spoke-to-spoke traffic through the hub twice.
Fix Checklist
-
S3 and DynamoDB: add Gateway VPC Endpoints first โ this is the highest-ROI change in FinOps. Gateway Endpoints are free. They redirect S3 and DynamoDB traffic from the NAT Gateway routing path to a direct path. Implementation: create the endpoint in the VPC console, select the route tables for each subnet, done. Under 30 minutes. Verify by watching
NatGateway-Bytesdrop within 24 hours. - For ECR, SQS, Secrets Manager, and other AWS services: Interface VPC Endpoints (not free โ $0.01/GB + $0.01/hr provisioning) reduce NAT processing costs for high-volume services where the NAT charge exceeds the endpoint cost.
- For Kubernetes workloads: Enable topology-aware routing (Kubernetes ≥1.27) or Istio locality-weighted load balancing. These features preferentially route pod-to-pod traffic to same-AZ endpoints, reducing cross-AZ requests without changing application code.
- For cross-AZ RDS/ElastiCache: Verify that your application tier and your database/cache primaries are in the same AZ. Read replicas can be cross-AZ by design (HA), but the primary traffic path should be AZ-local. In EKS, use node affinity rules to colocate pods with their dependencies.
- For single-NAT-multi-AZ architectures: Deploy one NAT Gateway per AZ. The additional NAT hourly cost ($0.045 ร 24 = $1.08/day per NAT) is typically offset by eliminating the cross-AZ transfer charge on NAT-bound traffic from the non-gateway AZs.
-
Verify with billing data: After changes, the
NatGateway-Bytesline item should decouple from S3 access trends. TheDataTransfer-Regional-Bytesratio to EC2 compute should drop below 0.10. If it doesn't, there are additional cross-AZ paths not yet addressed.
See if this pattern is in your billing data
The 5-question DropInFinOps assessment takes 2 minutes and tells you which anomaly patterns your current billing setup is positioned to catch โ and which ones are slipping through.
Take the free assessment โ