Skip to main content

Command Palette

Search for a command to run...

Deploying a clinical RAG to AWS with Terraform

Updated
10 min read
Deploying a clinical RAG to AWS with Terraform

Building the demo was the fun part. Now let's deploy it to AWS.

In the previous post I built a patient-scoped clinical RAG: a NestJS API running an agentic loop over Bedrock, Postgres with pgvector for hybrid retrieval, and a React frontend that streams responses. It ran great on localhost. This post is about getting it onto AWS, with a pattern I'd recommend for any demo: deploy once, validate, destroy. Spin it up, prove it works, take the screenshots, tear it down before the bill grows.

It worked. Here are six gotchas worth knowing, and how to handle each one.

The architecture

Here's how everything fits together:

Everything lives in one AWS account, one region (us-west-2):

  • A VPC with two public and two private subnets across two AZs. No NAT Gateway. The private subnets reach AWS services through VPC endpoints instead. That single decision saves about $32/month and shrinks the attack surface (the tasks literally cannot reach the public internet).

  • ECS Fargate runs three things: the web service (nginx serving the React build), the query service (the NestJS API), and a run-to-completion ingestion task.

  • RDS Postgres 16 with the pgvector extension, on a db.t3.micro.

  • An Application Load Balancer with path-based routing: /api/* goes to the query service, everything else to the web service.

  • EventBridge fires the ingestion task whenever a FHIR bundle lands in the S3 docs bucket.

  • Secrets Manager holds the database credentials, injected into the task at runtime.

The Terraform module layout

I split the infrastructure into focused modules:

infra/
├── bootstrap/            # one-time: S3 state bucket (local state, chicken-and-egg)
├── modules/
│   ├── network/          # VPC, subnets, SGs, 5 VPC endpoints
│   ├── ecr/              # 3 image repos
│   ├── storage/          # S3 docs bucket + EventBridge notifications
│   ├── database/         # RDS + Secrets Manager
│   ├── alb/              # ALB + 2 target groups + listener rule
│   ├── ecs/              # cluster, IAM, task defs, services
│   └── ingestion_trigger/ # EventBridge rule -> ECS RunTask
└── environments/dev/     # wires it all together

The state backend is S3 with use_lockfile = true, the native locking introduced in Terraform 1.10. No DynamoDB table for locks anymore, which is one less thing to provision.

The deploy script

This demo uses a bash script for deployment, since it's a deploy-and-destroy pattern in a short window. For ongoing CI/CD I'd recommend GitHub Actions with OIDC instead, never long-lived access keys in CI. Here's the script's interface:

./infra/deploy.sh help

# bootstrap   Create the S3 state bucket (one-time per AWS account)
# apply       terraform apply the dev environment
# secrets     Populate Bedrock creds into Secrets Manager
# push        Build + push 3 images to ECR
# migrate     Run Prisma migration as a one-shot ECS task
# redeploy    Force new deployment of query + web services
# smoke       Curl /health, /, /api/patients via the ALB
# destroy     terraform destroy (state bucket preserved)
# all         Run the whole sequence end to end

The most important safety feature in the script: a pre-flight check that prints the active AWS account and identity before any destructive operation. When you have more than one AWS profile, this is the difference between "deployed to the demo account" and "deployed to the wrong account."

Here's what that pre-flight check looks like in the terminal:

Six gotchas worth knowing

1. An em-dash crashed a security group

InvalidParameterValue: Invalid rule description. Valid descriptions are
strings less than 256 characters from the following set:
a-zA-Z0-9. _-:/()#,@[]+=&;{}!$*

AWS imposes an undocumented charset restriction on security group rule descriptions. Only the ASCII subset shown in the error is accepted; common Unicode characters like em-dashes () and arrows () are not. The error message doesn't surface which specific character triggered the failure.

Fix: replace every with to and every with -. Takeaway: keep AWS descriptions in plain ASCII. The charset rules are non-obvious and the error doesn't tell you which character is the problem.

2. Fargate couldn't pull images through the VPC endpoints

The tasks sat in PENDING for three minutes and then died:

CannotPullContainerError: ... dial tcp <public-S3-IP>:443: i/o timeout

Pulling ECR images through VPC endpoints requires a non-obvious security group rule. The symptoms can be misleading. The ECR endpoints are reachable, DNS resolves, the security groups look fine. The clue is in the failure address: the task is trying to reach a public S3 IP.

ECR stores image layers in S3. Pulling an image means talking to S3. The S3 Gateway endpoint routes that traffic privately, but a gateway endpoint has no security group of its own. The source security group must explicitly allow egress to the AWS-managed S3 prefix list. The route table entry alone isn't enough.

resource "aws_vpc_security_group_egress_rule" "ecs_to_s3" {
  security_group_id = aws_security_group.ecs.id
  prefix_list_id    = aws_vpc_endpoint.s3.prefix_list_id
  from_port         = 443
  to_port           = 443
  ip_protocol       = "tcp"
  description       = "ECS to S3 gateway endpoint (ECR layer storage)"
}

Takeaway: gateway endpoints need an egress rule to the prefix list, or you'll spend 30 minutes convinced the problem is anywhere except where it is.

3. Prisma + Alpine + pnpm: a three-way Docker bug

The migration task exited 1 with:

Error: request to https://binaries.prisma.sh/.../linux-musl-openssl-3.0.x/
libquery_engine.so.node.gz.sha256 failed

Prisma tried to download its query engine at runtime, from a Fargate task with no public internet. The download failed, predictably.

The root cause is a chain. node:24-alpine ships libssl.so.3 but no openssl binary on the PATH. Prisma's @prisma/engines postinstall detects the OpenSSL version to pick the right engine binary, and without the openssl binary it guessed wrong (openssl-1.1.x) and only downloaded the legacy engine. At runtime, with OpenSSL 3 actually present, Prisma wanted the openssl-3.0.x engine that was never bundled, so it tried to fetch it.

Fix: install openssl in both the builder stage (so postinstall picks the right binary) and the runner stage, plus pin binaryTargets = ["native", "linux-musl-openssl-3.0.x"] in the schema as a belt-and-suspenders.

Takeaway: if you ship Prisma on Alpine, apk add openssl twice.

4. The IAM 10-policy wall

AWS caps managed policies at 10 per IAM user by default. A demo stack with services across ECS, RDS, ECR, S3, Secrets Manager, and Bedrock trips this quickly:

LimitExceeded: Cannot exceed quota for PoliciesPerUser: 10

Fix: create an IAM Group, attach the policies to the group, add the user to the group. Group policies don't count against the user's quota.

Takeaway: if you're attaching more than five policies to one user, stop and use a group.

5. Healthy container, 504 from the load balancer

The ALB-to-ECS security group rule needs one ingress entry per container port. When this stack runs a second service on a new port (web on 80, with the existing query on 3000), the SG must allow both. If it only allows the first port, the ALB can't reach the second container at all.

This failure mode is tricky because the container's own health check still passes: wget localhost/health inside the task returns 200. The ALB returns 504 and marks the target unhealthy with "Request timed out." The container is healthy, the load balancer just can't reach it.

resource "aws_vpc_security_group_ingress_rule" "ecs_from_alb_web" {
  security_group_id            = aws_security_group.ecs.id
  referenced_security_group_id = aws_security_group.alb.id
  from_port                    = 80
  to_port                      = 80
  ip_protocol                  = "tcp"
  description                  = "ALB to web (nginx) container"
}

After adding the rule, the target group goes healthy:

Takeaway: a container health check tells you the app is running, not that the load balancer can reach it; they're different network paths, so test both.

6. RDS deprecated my Postgres version

InvalidParameterCombination: Cannot find version 16.4 for postgres

AWS retires Postgres patch versions on a rolling basis to push security updates. On May 31, 2026, for example, RDS deprecated minor versions 16.1 through 16.7. Anyone with one of those hardcoded in Terraform now gets Cannot find version 16.X for postgres on the next apply.

Fix: bump to a current version (16.13 at the time). Better: query aws rds describe-db-engine-versions --engine postgres before deciding.

Takeaway: hardcoded Postgres patch versions in Terraform have a half-life. Expect the patch number to change under you.

Closing the ingestion loop

One more piece worth flagging, less a bug than a missing handler. The EventBridge rule fires the ingestion task on S3 uploads, but the task needs a handler that reads the S3 key from the event. The handler is a small script that reads the S3_INGEST_BUCKET and S3_INGEST_KEY env vars EventBridge injects, downloads the bundle, and ingests it.

Then I uploaded one synthetic patient to S3 and watched the whole chain fire:

S3 PutObject → EventBridge rule → ECS RunTask → handler downloads and parses the bundle → Postgres. No manual step in the chain.

Security posture: locked down vs. demo trade-offs

What's hardened, and what's a deliberate trade-off:

Locked down:

  • No NAT Gateway: tasks have no route to the internet, only to AWS services via endpoints

  • Security groups reference each other by ID, never by CIDR

  • Specific ports only, no 0-65535 rules

  • RDS is private and encrypted at rest

  • Secrets in Secrets Manager, never in task definition env vars

  • Task roles are least-privilege (S3 read on the docs bucket + Bedrock invoke, nothing else)

  • S3 docs bucket blocks all public access

Demo trade-offs (I'd change these for production):

  • The ALB is HTTP only. HTTPS needs a domain + ACM certificate.

  • rds.force_ssl = 0 allows non-SSL database connections for dev convenience.

  • No WAF on the ALB.

  • Single-AZ RDS (multi-AZ doubles the cost).

  • deletion_protection = false so terraform destroy runs clean.

These are deliberate choices for a deploy-and-destroy demo, not oversights. I'd flip every one of them for anything real.

Cost

Left running 24/7, the stack costs about \(93/month (or ~\)80 with the 12-month RDS free tier). The line that surprises people is the VPC interface endpoints: about $35/month for five of them. For the deploy-validate-destroy pattern, my actual bill for a couple of hours was well under a dollar.

Component Monthly
RDS db.t3.micro ~$13 (free 12 mo if eligible)
ECS Fargate query (24/7) ~$18
ECS Fargate web (24/7, 0.25 vCPU) ~$9
ECS Fargate ingestion (per run) ~$0.01
ALB ~$16
5 interface VPC endpoints ~$35
S3, EventBridge, ECR, Secrets, Logs ~$2
Total ~\(93/mo (or ~\)80/mo on free tier)

Two recommendations for anyone running this stack

  • Start with the architecture diagram. For any new stack on AWS, mapping the network topology before writing modules surfaces missing security group rules and routing assumptions before they become deploy-time errors.

  • Lint before apply. Add a deploy-script subcommand that checks for known misconfigurations before the actual terraform apply.

Try it yourself

The whole thing is on GitHub: github.com/codeanding/healthcare-rag. Clone it, run ./infra/deploy.sh all (it'll warn you about cost), and you'll have the same stack. Open an issue if you hit a gotcha I didn't document, there are surely more.

Keep coding and learning!