Common AWS Cost Mistakes (And How to Avoid Them)

AWS billing surprises follow patterns. The same five mistakes appear repeatedly across engineering teams β€” and they're getting more expensive as AI workloads enter the picture.

AWS offers more than 200 services, each with its own pricing model, usage type taxonomy, and billing quirks. Most engineering teams learn the same lessons the hard way: a surprise line item in Cost Explorer, a month-end bill that doesn't match expectations, or an AWS support ticket asking for a refund on something that should have been obvious in hindsight.

What follows are five mistakes that consistently show up in AWS billing data β€” each with a real incident, the specific billing signal to watch for, and what to do about it. The list skews toward AI workloads because that is where the most expensive surprises are happening right now.


Mistake 1: Routing AI workload traffic through NAT Gateway instead of VPC Endpoints

What happens: A NAT Gateway charges $0.045 per GB of data processed β€” on top of the $0.065/hour base rate. For most web traffic this is negligible. For AI workloads it is not. Large language model container images run 10–30 GB each. Embedding pipelines move gigabytes of document data. Bedrock API calls made from within a VPC route through NAT by default. The billing category is NatGateway-Bytes β€” easy to overlook because it appears under "EC2-Other" in Cost Explorer, not under the service that caused the traffic.

The incident: Geocodio, a geocoding API company, saw 20,167 GB of NAT Gateway data transfers in a single day β€” $907.53 in one day, over $1,000 month-to-date. The cause: S3 traffic was routing through the NAT Gateway instead of through a VPC Endpoint. From the billing row, there was nothing to suggest the traffic was S3-related β€” it simply appeared as NatGateway-Bytes. AWS refunded the charge after they explained the misconfiguration. A team running EKS-based AI workloads that pull large model containers from ECR will see the same billing shape β€” one team reported $10,000/month in NAT Gateway charges for container image pulls, reduced to $2,000/month after adding a VPC Interface Endpoint for ECR ($70,000 in annual savings).

The billing signal: NatGateway-Bytes in the AmazonEC2 service line, often appearing as the largest data transfer line item in an account that doesn't appear to move large volumes of data. Correlate against the S3 and ECR access logs to find the source.

The fix: Add VPC Endpoints for every AWS service your workloads access from within a VPC. The priority list for AI workloads: S3 (Gateway Endpoint β€” free), ECR (Interface Endpoint β€” ~$7/month per AZ, trivial vs. NAT cost), Bedrock (Interface Endpoint β€” eliminates NAT cost on all Bedrock API calls from within the VPC), SageMaker Runtime. A VPC Gateway Endpoint for S3 takes 5 minutes to configure and immediately eliminates NAT Gateway charges for all S3 traffic in that VPC.


Mistake 2: Leaving AI inference endpoints running when no one is using them

What happens: SageMaker Real-Time Endpoints provision instances that run 24/7 from the moment of deployment until explicit deletion β€” regardless of whether any requests arrive. A single ml.g4dn.xlarge endpoint costs $0.736/hour: $17.66/day, $530/month, at zero queries. A team running three endpoints for a model comparison experiment will accumulate $1,590/month in endpoint costs before a single production request is served. Unlike EC2, there is no "stopped" state β€” the only options are running (billing) or deleted (not billing).

The incident: This is not a single named incident β€” it is an industry-wide pattern well-documented by SageMaker practitioners. A developer who deploys an endpoint for a Friday afternoon experiment and forgets to delete it over the weekend has generated $105 in charges by Monday morning with zero requests processed. At scale, organizations with multiple ML teams running model comparisons can accumulate dozens of idle endpoints. AWS introduced a scale-to-zero path via the Inference Components architecture at re:Invent 2024, which supports MinInstanceCount: 0. This applies to the newer Inference Components deployment model β€” classic real-time endpoint configurations, which are the most common pattern and the default in most AWS tutorials and Quick Create flows, still require a minimum of one instance and must be explicitly deleted to stop billing.

The billing signal: SageMaker endpoint hours with no corresponding InvokeEndpoint API calls in the same account and time window. In FOCUS billing data: ServiceName = 'Amazon SageMaker' rows with x_UsageType containing Endpoint-Hours, cross-referenced against CloudWatch Invocations metric for the endpoint (zero invocations, non-zero billing). The cross-service signal is the tell β€” a running endpoint with no invocations for 3+ days is almost always idle.

The fix: Three layers. First, for new endpoints built using Inference Components, set MinInstanceCount: 0 β€” this enables scale-to-zero after the configured idle timeout (default 15–20 minutes). For classic real-time endpoint configurations, deletion is the only control. Second, add an AWS Budgets alert specifically on SageMaker endpoint hours above a threshold you set based on your active model count. Third, run a weekly Lambda function that lists all SageMaker endpoints, cross-references each against its CloudWatch Invocations metric for the past 7 days, and posts a Slack message for any endpoint with zero invocations. This takes less than 50 lines of boto3 and prevents endpoint sprawl from compounding.


Mistake 3: AI service charges appearing under a different service name than expected

What happens: AWS does not always bill an AI feature under the service you used to create it. The most documented example: creating an Amazon Bedrock Knowledge Base using the default Quick-create flow silently provisions an Amazon OpenSearch Serverless collection as the vector store. The OpenSearch collection bills at a minimum of 2 OCUs ($0.24/OCU/hour each), producing a floor cost of $345/month β€” but every line item appears under Amazon OpenSearch Service, not Amazon Bedrock. There is no "Bedrock Knowledge Base" line in the bill. FinOps tools scanning for Bedrock anomalies find nothing. The waste is invisible unless you know to look under the wrong service name.

The incidents: Five independently documented cases: a developer who discovered a $107 charge under "Amazon OpenSearch Service" after building a RAG pipeline he believed was purely a Bedrock project (Hemanth D, Medium); a near-$200 bill after minimal document embedding and query testing (MaruAI, Medium); a $10 charge for 10 queries racked up in 21 hours (Reddit user); a user whose monthly bill jumped from $20 to over $300 with two more users in the same thread reporting $260 and an orphaned collection still billing 2 days after they deleted the Knowledge Base (AWS re:Post); and a FinOps analysis estimating $170–$700+/month per idle Knowledge Base (Binadox). The billing disguise is the mechanism β€” the waste accumulates specifically because it doesn't show up where anyone is looking.

The billing signal: Amazon OpenSearch Service rows with x_UsageType of SearchOCU or IndexingOCU, with a ResourceId containing bedrock-knowledge-base-, in an account with low or zero Amazon Bedrock inference charges in the same period. The OCU charges are perfectly flat β€” zero variance, no diurnal pattern β€” which distinguishes them from a real OpenSearch workload.

The fix: Set a monthly AWS Budgets alert on Amazon OpenSearch Service spend above $50. Any OCU charges in an account that doesn't deliberately run OpenSearch Serverless are almost certainly from a Bedrock Knowledge Base. To clean up: AWS Console β†’ Amazon OpenSearch Service β†’ Serverless β†’ Collections β†’ delete any collection named bedrock-knowledge-base-* with no corresponding active Knowledge Base. For new projects: use Amazon S3 Vectors as the vector store (available as of December 2025, up to 90% cheaper, no OCU floor).


Mistake 4: Data egress from AI workloads that doesn't get modeled upfront

What happens: Egress pricing is well-known but systematically under-modeled for AI workloads specifically. Traditional application egress is predictable β€” a web server serving 10 GB/day of user traffic generates a consistent bill. AI workloads are different: embedding large document sets moves data from S3 through Lambda or ECS into the Bedrock or SageMaker API; RAG pipeline retrievals fetch document chunks and return them in model context windows; streaming AI responses to end users over WebSockets generate sustained egress that compounds with user count. None of these patterns show up in standard capacity planning models.

The incidents: Recall.ai discovered they were paying $1 million per year in unexpected AWS WebSocket data processing fees β€” a cost driven entirely by streaming AI response data to end users at scale, which they had not modeled when designing the system. A developer running what started as a $23/month personal site saw costs jump to $2,657 overnight and eventually $12,800/month as traffic grew β€” the culprit was cross-region data transfers that seemed negligible at low volume but compounded linearly with scale. 37signals (Basecamp/HEY) accumulated a $250,000 egress bill that AWS ultimately comped; DHH posted about it publicly on LinkedIn. These are not edge cases β€” they are what happens when networking cost is treated as an afterthought in architecture decisions.

The billing signal: DataTransfer-Out-Bytes rows growing faster than the corresponding compute or inference spend. For WebSocket-based AI streaming: look for USE1-DataTransfer-Out-Bytes (or your region equivalent) in Cost Explorer, grouped by service, with a growth rate that exceeds your user growth rate β€” that ratio inversion is the signal that the egress model is broken. Cross-region transfer appears as DataTransfer-Regional-Bytes under the originating service.

The fix: Three architecture-level interventions. First, keep AI data processing in the same region and same AZ as your storage β€” cross-AZ transfer is $0.01/GB and cross-region is $0.02–$0.09/GB, both adding up at AI data volumes. Second, for WebSocket AI streaming, model the egress cost explicitly before launch: (average response tokens Γ— bytes per token Γ— expected requests per day Γ— $0.09/GB) as a line item in your architecture doc. Third, set a DataTransfer budget alert by service β€” AWS Budgets supports filtering by usage type, so you can alert specifically on DataTransfer-Out-Bytes above a baseline that reflects your current architecture, not a round number.


Mistake 5: Untagged AI resources that make cost alerts route to nobody

What happens: AWS Quick-create flows β€” including Bedrock Knowledge Bases, SageMaker notebooks, and Bedrock Agents β€” provision resources without enforcing the tagging standards applied to IaC-managed infrastructure. The resulting resources have no owner, team, environment, or project tags. When an alert fires on the associated cost, there is no routing target β€” it arrives in a shared inbox, gets ignored, or reaches someone who doesn't recognize the resource. When the resource becomes an anomaly (idle endpoint, orphaned collection, runaway inference), the investigation starts from zero: who created this, when, for what project, and is it still needed?

The pattern: Industry data consistently shows 30–50% of cloud resources are untagged or under-tagged at any given time. For AI resources specifically, the rate is higher because the dominant provisioning path β€” console Quick-create β€” has no tagging step. The compounding effect: untagged AI resources are also invisible to security tooling. A Bedrock endpoint provisioned without tags that gets compromised via stolen credentials will not route a cost anomaly alert to the owning team, will not be caught by tag-based IAM policies, and will not be matched to a known project in a post-incident review. The cost alert and the security alert both fail at the same layer β€” missing attribution metadata.

The billing signal: Amazon Bedrock, Amazon OpenSearch Service, or Amazon SageMaker line items in Cost Explorer with no corresponding tag values for your required tag keys. In FOCUS data: rows where ServiceName matches an AI service and the Tags column is empty or contains only AWS-generated tags (no user-defined keys). These rows have spend but no owner β€” they are the highest-risk line items in any account running AI workloads.

The fix: Enforce tagging at provisioning time, not at billing review time. Three mechanisms: first, AWS Config rules that flag any Bedrock, SageMaker, or OpenSearch resource missing required tags within 24 hours of creation β€” Config sends an SNS notification to the team's channel. Second, require IaC (Terraform or CDK) for any AI resource in production or staging environments β€” no console Quick-create. IaC enforces tags at the template level before any resource is created. Third, run a weekly Lambda that scans AI service resources via boto3 (bedrock.list_knowledge_bases(), sagemaker.list_endpoints(), opensearchserverless.list_collections()), cross-references each against your required tag keys, and escalates untagged resources to the team's channel with the resource ARN and creation timestamp. The fix takes 10 minutes per resource β€” the value is in finding it before it becomes a 3-month mystery line item.


The Pattern Behind the Mistakes

These five mistakes share a structure: a billing charge that does not appear where you expect it, for a resource that does not show up in your standard monitoring, accumulating cost against a budget no one is watching. That structure is not accidental β€” it is the result of AWS's service architecture and billing taxonomy, which were designed for composability rather than cost transparency.

The defense is not a better dashboard. It is a set of explicit signals: which service names carry AI-adjacent charges, which usage types are the cost drivers, which resource naming patterns distinguish intended from unintended resources, and which cross-service ratios are the anomaly indicators. Once those signals are defined, they can be automated β€” budget alerts, behavioral queries against FOCUS billing data, Lambda cleanup functions, and Config rules that catch tagging gaps within 24 hours of provisioning.

The billing data has all of this. The question is whether you're asking the right questions of it.


Find out which of these patterns are in your billing data right now

The DropInFinOps free assessment takes 2 minutes and maps your current billing export setup against the behavioral query library β€” showing which of these cost patterns your current tooling is positioned to catch, and which ones require additional instrumentation.

Take the free assessment β†’