Architecture Guide · April 2026
Claude on AWS Bedrock: Enterprise Architecture Guide (2026)
AWS Bedrock is the preferred deployment path for Claude in enterprise environments — particularly for Latin American organizations that need data residency, VPC isolation, and integration with existing AWS infrastructure. This guide covers the complete architecture: from network design to RAG pipelines, MCP integration, cost optimization, and monitoring. Whether you’re deploying your first Claude workload or scaling an existing implementation, this is the reference architecture.
Why AWS Bedrock for Claude
You can access Claude through two paths: Anthropic’s direct API or AWS Bedrock. For enterprise, Bedrock is almost always the right choice. The reasons are structural, not cosmetic.
Data stays in your perimeter. When you call Claude through Bedrock, the request never leaves your AWS account’s region. For a bank in Mexico City processing credit applications, this means customer PII travels from your VPC to Bedrock’s endpoint in the same AWS region — not to Anthropic’s servers in the US. Combined with Claude’s native Zero Data Retention (data is never stored or used for training), this creates a double layer of data protection.
Enterprise controls you already know.Bedrock integrates with IAM for access control, CloudTrail for audit logging, CloudWatch for monitoring, and AWS Config for compliance. Your security team doesn’t need to learn a new vendor’s console — Claude fits into the same governance framework as your other AWS services.
Consolidated billing. Claude usage appears on your existing AWS bill, tracked by tags, cost allocation, and existing FinOps processes. No separate Anthropic invoice to reconcile.
Direct API vs AWS Bedrock: Comparison
| Dimension | Anthropic Direct API | AWS Bedrock |
|---|---|---|
| Data Path | Your app → Anthropic servers (US) | Your VPC → Bedrock endpoint (same region) |
| Authentication | API key | IAM roles & policies |
| Network Isolation | Internet-facing | VPC Endpoint (PrivateLink) — no internet |
| Monitoring | Anthropic console + custom | CloudWatch, CloudTrail, X-Ray |
| Audit Logging | Custom implementation | CloudTrail (automatic) |
| Data Residency | US-based | São Paulo (sa-east-1), US, EU, APAC |
| Billing | Separate Anthropic invoice | Consolidated AWS bill with tags |
| RAG Support | Build your own | Knowledge Bases for Amazon Bedrock (native) |
| Prompt Caching | Supported | Supported (same pricing) |
| Setup Complexity | Low (API key + SDK) | Medium (VPC, IAM, endpoint config) |
| Best For | Prototyping, startups, simple apps | Enterprise, regulated industries, production |
Bottom line: Use the direct API for prototyping and development. Use Bedrock for anything that touches production data, especially in regulated industries. The extra setup is a one-time cost; the security and governance benefits are ongoing.
Enterprise Architecture: Layer by Layer
A production Claude deployment on Bedrock follows a layered architecture. Each layer serves a specific function and can be configured independently.
Layer 1: Network (VPC & Connectivity)
The foundation is a VPC with private subnets. Claude requests travel through a VPC Endpoint for Bedrock (powered by AWS PrivateLink), meaning traffic never traverses the public internet. This is a hard requirement for financial institutions and healthcare organizations.
- VPC Endpoint: Create a
com.amazonaws.region.bedrock-runtimeinterface endpoint in your private subnets - Security Groups: Allow HTTPS (443) from your application subnets to the VPC endpoint only
- No NAT Gateway needed for Bedrock traffic — PrivateLink keeps everything internal
- DNS: Enable private DNS on the VPC endpoint so the standard Bedrock SDK endpoints resolve to private IPs
Layer 2: Identity & Access (IAM)
IAM controls who and what can invoke Claude. Best practices for enterprise:
- Service roles: Each application that calls Claude gets its own IAM role with
bedrock:InvokeModelpermission scoped to specific model ARNs - Least privilege: Don’t grant
bedrock:*— scope tobedrock:InvokeModelandbedrock:InvokeModelWithResponseStream - Model access: Use Bedrock Model Access to explicitly enable only the Claude models your organization needs (Opus, Sonnet, Haiku)
- Resource tags: Tag each invocation with project, environment, and cost center for billing granularity
- SCPs: Use Service Control Policies to prevent teams from enabling unauthorized models or regions
Layer 3: Monitoring & Observability (CloudWatch)
Production Claude deployments need three types of monitoring:
- Operational metrics: Invocation count, latency (P50/P95/P99), error rate, throttling — all available via CloudWatch Bedrock metrics
- Cost tracking: Input/output token counts per invocation, tagged by application and model. Set CloudWatch alarms for unexpected cost spikes
- Quality monitoring: Log prompts and responses (encrypted at rest in S3 or CloudWatch Logs) for quality review and prompt optimization. Use Bedrock Model Invocation Logging for automated capture
Set up CloudWatch dashboards that show: requests per minute by model, P95 latency trends, error rates, and daily cost by application. This gives operations teams visibility without needing to check multiple consoles.
São Paulo Region: LATAM Data Residency
The AWS São Paulo region (sa-east-1) is the critical enabler for Claude in Latin America. When you deploy Claude on Bedrock in sa-east-1, data stays physically in Brazil — satisfying data residency requirements across LATAM.
Why this matters for specific regulations:
- LGPD (Brazil): Personal data processing must occur within Brazilian territory or in countries with adequate data protection. sa-east-1 satisfies the territorial requirement directly
- CNBV (Mexico): Financial data must be processed in controlled environments with full audit trails. Bedrock + CloudTrail in sa-east-1 provides this
- SFC (Colombia): Cloud usage for financial services requires demonstrable data controls. Bedrock’s VPC isolation and IAM provide the necessary governance
Latency consideration: For applications in Mexico and Central America, sa-east-1 adds ~60-80ms of network latency compared to us-east-1. For batch processing and async workflows, this is negligible. For real-time chatbots, consider whether the data residency requirement mandates sa-east-1 or if us-east-1 (Virginia) is acceptable for your regulatory context.
RAG Architecture on Bedrock
Retrieval-Augmented Generation is the most common enterprise Claude pattern — particularly for compliance, knowledge management, and customer service. Bedrock provides native RAG support through Knowledge Bases for Amazon Bedrock.
Architecture Components
- Data Sources: S3 buckets containing your documents (PDFs, regulatory texts, policies, manuals). Bedrock supports automatic ingestion and chunking
- Embeddings: Bedrock generates vector embeddings using Amazon Titan Embeddings or Cohere Embed. These are stored automatically in your chosen vector database
- Vector Store: Amazon OpenSearch Serverless, Aurora PostgreSQL with pgvector, or Pinecone. OpenSearch Serverless is the lowest-maintenance option for most enterprise deployments
- Generation: Claude (Sonnet or Opus) receives the retrieved context and generates the response with citations
Compliance RAG: Best Practices
For regulatory RAG systems (CNBV circulars, LGPD requirements, SFC dispositions), follow these design principles:
- Citation-mandatory responses: Configure Claude’s system prompt to always cite the specific document section and page number. This is non-negotiable for compliance use cases
- Chunk overlap: Use 20% overlap between chunks to avoid splitting regulatory provisions across chunk boundaries
- Metadata filtering: Tag each document with regulation type, effective date, and jurisdiction. Claude can then filter retrieval to only relevant regulations
- Version control: When regulations update, maintain both versions in the knowledge base with date tags. Claude can answer questions about current vs. previous requirements
MCP Integration Patterns
MCP (Model Context Protocol) is Anthropic’s open standard for connecting Claude to external systems. In a Bedrock architecture, MCP servers run as separate services within your VPC, and Claude invokes them through tool use.
Common MCP Patterns for Enterprise
- Database MCP Server: Connects Claude to your PostgreSQL, MySQL, or DynamoDB. Claude can query data directly without your application building custom data retrieval logic
- CRM MCP Server: Connects Claude to Salesforce, HubSpot, or custom CRM. Enables customer service agents to ask Claude questions about customer history in natural language
- ERP MCP Server: Connects Claude to SAP, Oracle, or NetSuite. Claude can pull inventory data, financial records, or supply chain information on demand
- Internal API Gateway: A single MCP server that proxies multiple internal APIs, providing Claude access to your entire internal service mesh through one integration point
MCP servers should run in private subnets alongside your application tier. They authenticate to backend systems using IAM roles or secrets stored in AWS Secrets Manager — never hardcoded credentials. See our services page for implementation support.
Cost Optimization Strategies
Claude on Bedrock offers the same token-based pricing as the direct API. The key to cost optimization is choosing the right model tier and leveraging caching.
Model Selection: Opus, Sonnet, or Haiku
| Model | Input/Output (per M tokens) | Best For |
|---|---|---|
| Claude Opus 4 | $15 / $75 | Complex regulatory analysis, credit decisions, multi-step reasoning |
| Claude Sonnet 4 | $3 / $15 | RAG queries, document summarization, customer service agents — the sweet spot for most enterprise |
| Claude Haiku | $0.80 / $4 | High-volume classification, routing, simple extraction, intent detection |
The cascading model pattern: Route requests through Haiku first for classification and simple tasks. Escalate to Sonnet for medium-complexity tasks. Reserve Opus for tasks that require deep reasoning. A well-designed cascade can reduce average cost per request by 60-70% compared to using Opus for everything.
Prompt Caching
Prompt caching is the single highest-impact cost optimization available. When you include a large system prompt or reference context (regulatory text, company policies, product catalogs), prompt caching stores these tokens after the first request. Subsequent requests that include the same cached prefix pay only 10% of the standard input price.
- Minimum cache size: 1,024 tokens for Haiku, 2,048 for Sonnet and Opus
- Cache TTL: 5 minutes (refreshed on each use)
- Savings example: A compliance RAG system with a 50K-token regulatory context prefix saves $135 per 1,000 Sonnet queries through caching
Additional Cost Controls
- Max tokens: Set
max_tokensto the minimum needed for each use case. A classification task needs 50 tokens, not 4,096 - Streaming: Use streaming for long responses to reduce perceived latency without increasing cost
- Batching: For offline processing (document review, report generation), use Bedrock batch inference for lower per-token pricing
- Budget alerts: Set AWS Budgets alerts at 50%, 80%, and 100% of projected monthly spend
Security Best Practices
Security for Claude on Bedrock extends beyond network isolation. A comprehensive security posture includes:
- Encryption in transit: TLS 1.2+ for all Bedrock API calls (enforced by default)
- Encryption at rest: Enable KMS encryption for CloudWatch Logs and S3 buckets storing prompt/response logs. Use customer-managed keys (CMK) for regulated workloads
- Input validation: Sanitize user inputs before sending to Claude. Prevent prompt injection by validating input structure and content
- Output filtering: Implement post-processing checks on Claude’s responses. For financial services, validate that generated recommendations include required disclaimers
- Guardrails: Use Amazon Bedrock Guardrails to define content policies, PII detection, and topic restrictions. Guardrails run before and after Claude’s response, adding a defense-in-depth layer
- Access reviews: Quarterly review of IAM policies granting Bedrock access. Remove unused roles and permissions
- Incident response: Include Bedrock in your incident response playbook. Monitor CloudTrail for unusual invocation patterns (volume spikes, off-hours usage, new model access)
For a deeper discussion of compliance in LATAM financial services, see our guide on Claude vs ChatGPT for Enterprise.
Production Monitoring Architecture
A production Claude deployment requires four monitoring dimensions:
- Availability: Bedrock endpoint health, VPC endpoint status, DNS resolution. Alert on any 5xx errors or connection timeouts
- Performance: P50/P95/P99 latency by model and use case. Track latency trends to detect degradation before users complain
- Cost: Daily token consumption by model, application, and environment. Compare against forecasts and alert on anomalies
- Quality: Sample-based review of prompt/response pairs. Track key metrics: citation accuracy for RAG, classification precision for routing, customer satisfaction for chatbots
Deploy a CloudWatch dashboard per application with: invocations/minute (real-time), P95 latency (5-minute average), error rate (1-minute resolution), and daily cost (cumulative). Set SNS alerts for: error rate > 1%, P95 latency > 10s, daily cost > 120% of forecast. For more on how VORANTIS implements monitoring, see our services overview.
Getting Started: Implementation Roadmap
A typical enterprise Bedrock deployment follows this timeline:
- Week 1-2: Infrastructure setup — VPC endpoints, IAM roles, CloudWatch configuration, model access enablement
- Week 2-3: First workload — deploy a single Claude Sonnet use case (e.g., document summarization or compliance Q&A) in a dev environment
- Week 3-6: Production hardening — add monitoring, Guardrails, prompt caching, cost optimization, security review
- Week 6-12: Scale — add RAG pipelines, MCP integrations, additional use cases, cascading model routing
VORANTIS specializes in Claude-on-Bedrock architecture for LATAM enterprises. We handle the infrastructure setup, RAG design, MCP integration, and security hardening so your team can focus on the business logic. Visit our FAQ for common questions or contact us to start a Discovery engagement.
Frequently Asked Questions
Is Claude available on AWS Bedrock in Latin America?
Yes. Claude is available on AWS Bedrock in the São Paulo region (sa-east-1), providing data residency in South America. This is critical for LATAM enterprises subject to data localization requirements under LGPD (Brazil), CNBV (Mexico), and SFC (Colombia). All Claude model tiers — Opus, Sonnet, and Haiku — are accessible through Bedrock.
What is the difference between Claude direct API and AWS Bedrock?
The direct API connects to Anthropic’s servers and is simpler to set up. AWS Bedrock runs Claude within your AWS account, providing VPC isolation, IAM-based access control, CloudWatch monitoring, and consolidated billing. Bedrock is preferred for enterprise because it keeps data within your AWS perimeter and integrates with existing infrastructure.
How much does Claude on Bedrock cost compared to the direct API?
Bedrock pricing is comparable to direct API pricing: Sonnet 4 costs $3/$15 per million input/output tokens on both. Bedrock adds no markup but you pay for associated AWS infrastructure (VPC, NAT Gateway, CloudWatch). Prompt caching on Bedrock can reduce input costs by up to 90% for repeated context.
Can I build a RAG system with Claude on Bedrock?
Yes. AWS Bedrock provides native RAG support through Knowledge Bases for Amazon Bedrock. You can connect S3 data sources, use Amazon OpenSearch or Aurora pgvector for vector storage, and query with Claude as the generation model. This is the recommended architecture for compliance RAG systems in regulated industries.
Related Articles
¿Listo para implementar Claude en tu empresa?
Agenda un Discovery Call de 30 minutos sin compromiso.
Book a Discovery Call