Architecture Guide · April 2026

Claude on AWS Bedrock: Enterprise Architecture Guide (2026)

AWS Bedrock is the preferred deployment path for Claude in enterprise environments — particularly for Latin American organizations that need data residency, VPC isolation, and integration with existing AWS infrastructure. This guide covers the complete architecture: from network design to RAG pipelines, MCP integration, cost optimization, and monitoring. Whether you’re deploying your first Claude workload or scaling an existing implementation, this is the reference architecture.

Why AWS Bedrock for Claude

You can access Claude through two paths: Anthropic’s direct API or AWS Bedrock. For enterprise, Bedrock is almost always the right choice. The reasons are structural, not cosmetic.

Data stays in your perimeter. When you call Claude through Bedrock, the request never leaves your AWS account’s region. For a bank in Mexico City processing credit applications, this means customer PII travels from your VPC to Bedrock’s endpoint in the same AWS region — not to Anthropic’s servers in the US. Combined with Claude’s native Zero Data Retention (data is never stored or used for training), this creates a double layer of data protection.

Enterprise controls you already know.Bedrock integrates with IAM for access control, CloudTrail for audit logging, CloudWatch for monitoring, and AWS Config for compliance. Your security team doesn’t need to learn a new vendor’s console — Claude fits into the same governance framework as your other AWS services.

Consolidated billing. Claude usage appears on your existing AWS bill, tracked by tags, cost allocation, and existing FinOps processes. No separate Anthropic invoice to reconcile.

Direct API vs AWS Bedrock: Comparison

Dimension	Anthropic Direct API	AWS Bedrock
Data Path	Your app → Anthropic servers (US)	Your VPC → Bedrock endpoint (same region)
Authentication	API key	IAM roles & policies
Network Isolation	Internet-facing	VPC Endpoint (PrivateLink) — no internet
Monitoring	Anthropic console + custom	CloudWatch, CloudTrail, X-Ray
Audit Logging	Custom implementation	CloudTrail (automatic)
Data Residency	US-based	São Paulo (sa-east-1), US, EU, APAC
Billing	Separate Anthropic invoice	Consolidated AWS bill with tags
RAG Support	Build your own	Knowledge Bases for Amazon Bedrock (native)
Prompt Caching	Supported	Supported (same pricing)
Setup Complexity	Low (API key + SDK)	Medium (VPC, IAM, endpoint config)
Best For	Prototyping, startups, simple apps	Enterprise, regulated industries, production

Bottom line: Use the direct API for prototyping and development. Use Bedrock for anything that touches production data, especially in regulated industries. The extra setup is a one-time cost; the security and governance benefits are ongoing.

Enterprise Architecture: Layer by Layer

A production Claude deployment on Bedrock follows a layered architecture. Each layer serves a specific function and can be configured independently.

Layer 1: Network (VPC & Connectivity)

The foundation is a VPC with private subnets. Claude requests travel through a VPC Endpoint for Bedrock (powered by AWS PrivateLink), meaning traffic never traverses the public internet. This is a hard requirement for financial institutions and healthcare organizations.

VPC Endpoint: Create a com.amazonaws.region.bedrock-runtime interface endpoint in your private subnets
Security Groups: Allow HTTPS (443) from your application subnets to the VPC endpoint only
No NAT Gateway needed for Bedrock traffic — PrivateLink keeps everything internal
DNS: Enable private DNS on the VPC endpoint so the standard Bedrock SDK endpoints resolve to private IPs

Layer 2: Identity & Access (IAM)

IAM controls who and what can invoke Claude. Best practices for enterprise:

Service roles: Each application that calls Claude gets its own IAM role with bedrock:InvokeModel permission scoped to specific model ARNs
Least privilege: Don’t grant bedrock:* — scope to bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream
Model access: Use Bedrock Model Access to explicitly enable only the Claude models your organization needs (Opus, Sonnet, Haiku)
Resource tags: Tag each invocation with project, environment, and cost center for billing granularity
SCPs: Use Service Control Policies to prevent teams from enabling unauthorized models or regions

Layer 3: Monitoring & Observability (CloudWatch)

Production Claude deployments need three types of monitoring:

Operational metrics: Invocation count, latency (P50/P95/P99), error rate, throttling — all available via CloudWatch Bedrock metrics
Cost tracking: Input/output token counts per invocation, tagged by application and model. Set CloudWatch alarms for unexpected cost spikes
Quality monitoring: Log prompts and responses (encrypted at rest in S3 or CloudWatch Logs) for quality review and prompt optimization. Use Bedrock Model Invocation Logging for automated capture

Set up CloudWatch dashboards that show: requests per minute by model, P95 latency trends, error rates, and daily cost by application. This gives operations teams visibility without needing to check multiple consoles.

São Paulo Region: LATAM Data Residency

The AWS São Paulo region (sa-east-1) is the critical enabler for Claude in Latin America. When you deploy Claude on Bedrock in sa-east-1, data stays physically in Brazil — satisfying data residency requirements across LATAM.

Why this matters for specific regulations:

LGPD (Brazil): Personal data processing must occur within Brazilian territory or in countries with adequate data protection. sa-east-1 satisfies the territorial requirement directly
CNBV (Mexico): Financial data must be processed in controlled environments with full audit trails. Bedrock + CloudTrail in sa-east-1 provides this
SFC (Colombia): Cloud usage for financial services requires demonstrable data controls. Bedrock’s VPC isolation and IAM provide the necessary governance

Latency consideration: For applications in Mexico and Central America, sa-east-1 adds ~60-80ms of network latency compared to us-east-1. For batch processing and async workflows, this is negligible. For real-time chatbots, consider whether the data residency requirement mandates sa-east-1 or if us-east-1 (Virginia) is acceptable for your regulatory context.

RAG Architecture on Bedrock

Retrieval-Augmented Generation is the most common enterprise Claude pattern — particularly for compliance, knowledge management, and customer service. Bedrock provides native RAG support through Knowledge Bases for Amazon Bedrock.

Architecture Components

Data Sources: S3 buckets containing your documents (PDFs, regulatory texts, policies, manuals). Bedrock supports automatic ingestion and chunking
Embeddings: Bedrock generates vector embeddings using Amazon Titan Embeddings or Cohere Embed. These are stored automatically in your chosen vector database
Vector Store: Amazon OpenSearch Serverless, Aurora PostgreSQL with pgvector, or Pinecone. OpenSearch Serverless is the lowest-maintenance option for most enterprise deployments
Generation: Claude (Sonnet or Opus) receives the retrieved context and generates the response with citations

Compliance RAG: Best Practices

For regulatory RAG systems (CNBV circulars, LGPD requirements, SFC dispositions), follow these design principles:

Citation-mandatory responses: Configure Claude’s system prompt to always cite the specific document section and page number. This is non-negotiable for compliance use cases
Chunk overlap: Use 20% overlap between chunks to avoid splitting regulatory provisions across chunk boundaries
Metadata filtering: Tag each document with regulation type, effective date, and jurisdiction. Claude can then filter retrieval to only relevant regulations
Version control: When regulations update, maintain both versions in the knowledge base with date tags. Claude can answer questions about current vs. previous requirements

MCP Integration Patterns

MCP (Model Context Protocol) is Anthropic’s open standard for connecting Claude to external systems. In a Bedrock architecture, MCP servers run as separate services within your VPC, and Claude invokes them through tool use.

Common MCP Patterns for Enterprise

Database MCP Server: Connects Claude to your PostgreSQL, MySQL, or DynamoDB. Claude can query data directly without your application building custom data retrieval logic
CRM MCP Server: Connects Claude to Salesforce, HubSpot, or custom CRM. Enables customer service agents to ask Claude questions about customer history in natural language
ERP MCP Server: Connects Claude to SAP, Oracle, or NetSuite. Claude can pull inventory data, financial records, or supply chain information on demand
Internal API Gateway: A single MCP server that proxies multiple internal APIs, providing Claude access to your entire internal service mesh through one integration point

MCP servers should run in private subnets alongside your application tier. They authenticate to backend systems using IAM roles or secrets stored in AWS Secrets Manager — never hardcoded credentials. See our services page for implementation support.

Cost Optimization Strategies

Claude on Bedrock offers the same token-based pricing as the direct API. The key to cost optimization is choosing the right model tier and leveraging caching.

Model Selection: Opus, Sonnet, or Haiku

Model	Input/Output (per M tokens)	Best For
Claude Opus 4	$15 / $75	Complex regulatory analysis, credit decisions, multi-step reasoning
Claude Sonnet 4	$3 / $15	RAG queries, document summarization, customer service agents — the sweet spot for most enterprise
Claude Haiku	$0.80 / $4	High-volume classification, routing, simple extraction, intent detection

The cascading model pattern: Route requests through Haiku first for classification and simple tasks. Escalate to Sonnet for medium-complexity tasks. Reserve Opus for tasks that require deep reasoning. A well-designed cascade can reduce average cost per request by 60-70% compared to using Opus for everything.

Prompt Caching

Prompt caching is the single highest-impact cost optimization available. When you include a large system prompt or reference context (regulatory text, company policies, product catalogs), prompt caching stores these tokens after the first request. Subsequent requests that include the same cached prefix pay only 10% of the standard input price.

Minimum cache size: 1,024 tokens for Haiku, 2,048 for Sonnet and Opus
Cache TTL: 5 minutes (refreshed on each use)
Savings example: A compliance RAG system with a 50K-token regulatory context prefix saves $135 per 1,000 Sonnet queries through caching

Additional Cost Controls

Max tokens: Set max_tokens to the minimum needed for each use case. A classification task needs 50 tokens, not 4,096
Streaming: Use streaming for long responses to reduce perceived latency without increasing cost
Batching: For offline processing (document review, report generation), use Bedrock batch inference for lower per-token pricing
Budget alerts: Set AWS Budgets alerts at 50%, 80%, and 100% of projected monthly spend

Security Best Practices

Security for Claude on Bedrock extends beyond network isolation. A comprehensive security posture includes:

Encryption in transit: TLS 1.2+ for all Bedrock API calls (enforced by default)
Encryption at rest: Enable KMS encryption for CloudWatch Logs and S3 buckets storing prompt/response logs. Use customer-managed keys (CMK) for regulated workloads
Input validation: Sanitize user inputs before sending to Claude. Prevent prompt injection by validating input structure and content
Output filtering: Implement post-processing checks on Claude’s responses. For financial services, validate that generated recommendations include required disclaimers
Guardrails: Use Amazon Bedrock Guardrails to define content policies, PII detection, and topic restrictions. Guardrails run before and after Claude’s response, adding a defense-in-depth layer
Access reviews: Quarterly review of IAM policies granting Bedrock access. Remove unused roles and permissions
Incident response: Include Bedrock in your incident response playbook. Monitor CloudTrail for unusual invocation patterns (volume spikes, off-hours usage, new model access)

For a deeper discussion of compliance in LATAM financial services, see our guide on Claude vs ChatGPT for Enterprise.

Production Monitoring Architecture

A production Claude deployment requires four monitoring dimensions:

Availability: Bedrock endpoint health, VPC endpoint status, DNS resolution. Alert on any 5xx errors or connection timeouts
Performance: P50/P95/P99 latency by model and use case. Track latency trends to detect degradation before users complain
Cost: Daily token consumption by model, application, and environment. Compare against forecasts and alert on anomalies
Quality: Sample-based review of prompt/response pairs. Track key metrics: citation accuracy for RAG, classification precision for routing, customer satisfaction for chatbots

Deploy a CloudWatch dashboard per application with: invocations/minute (real-time), P95 latency (5-minute average), error rate (1-minute resolution), and daily cost (cumulative). Set SNS alerts for: error rate > 1%, P95 latency > 10s, daily cost > 120% of forecast. For more on how VORANTIS implements monitoring, see our services overview.

Getting Started: Implementation Roadmap

A typical enterprise Bedrock deployment follows this timeline:

Week 1-2: Infrastructure setup — VPC endpoints, IAM roles, CloudWatch configuration, model access enablement
Week 2-3: First workload — deploy a single Claude Sonnet use case (e.g., document summarization or compliance Q&A) in a dev environment
Week 3-6: Production hardening — add monitoring, Guardrails, prompt caching, cost optimization, security review
Week 6-12: Scale — add RAG pipelines, MCP integrations, additional use cases, cascading model routing

VORANTIS specializes in Claude-on-Bedrock architecture for LATAM enterprises. We handle the infrastructure setup, RAG design, MCP integration, and security hardening so your team can focus on the business logic. Visit our FAQ for common questions or contact us to start a Discovery engagement.

Frequently Asked Questions

Is Claude available on AWS Bedrock in Latin America?

Yes. Claude is available on AWS Bedrock in the São Paulo region (sa-east-1), providing data residency in South America. This is critical for LATAM enterprises subject to data localization requirements under LGPD (Brazil), CNBV (Mexico), and SFC (Colombia). All Claude model tiers — Opus, Sonnet, and Haiku — are accessible through Bedrock.

What is the difference between Claude direct API and AWS Bedrock?

The direct API connects to Anthropic’s servers and is simpler to set up. AWS Bedrock runs Claude within your AWS account, providing VPC isolation, IAM-based access control, CloudWatch monitoring, and consolidated billing. Bedrock is preferred for enterprise because it keeps data within your AWS perimeter and integrates with existing infrastructure.

How much does Claude on Bedrock cost compared to the direct API?

Bedrock pricing is comparable to direct API pricing: Sonnet 4 costs $3/$15 per million input/output tokens on both. Bedrock adds no markup but you pay for associated AWS infrastructure (VPC, NAT Gateway, CloudWatch). Prompt caching on Bedrock can reduce input costs by up to 90% for repeated context.

Can I build a RAG system with Claude on Bedrock?

Yes. AWS Bedrock provides native RAG support through Knowledge Bases for Amazon Bedrock. You can connect S3 data sources, use Amazon OpenSearch or Aurora pgvector for vector storage, and query with Claude as the generation model. This is the recommended architecture for compliance RAG systems in regulated industries.