Deploying MongoDB on Amazon EKS with Terraform, Helm & Ingress: A Story of Stateful Apps on Kubernetes

Have you ever wondered how stateful applications like databases fit into the world of Kubernetes? Or maybe you’ve been curious about what happens when a database pod is replicated, where does the data actually live?

I had these same questions. Stateless apps were easy for me to understand, spin up a pod, scale it up and down, and if one dies, Kubernetes simply replaces it. But with a database? Things get trickier. Data can’t just disappear when a pod restarts.

That curiosity led me into this project: deploying MongoDB on Amazon EKS. Not only MongoDB, but also Mongo Express, a web-based admin interface to interact with the database. And I wanted to automate it end-to-end using Terraform, Helm, and Kubernetes manifests.

Here’s how the journey unfolded:

The Challenge: Stateful vs Stateless Apps

Stateless apps (like an API or frontend) don’t care if pods are destroyed and recreated — the state is stored elsewhere (e.g., S3, DynamoDB).
Stateful apps (like MongoDB, PostgreSQL, MySQL) are different. They need persistent storage that survives pod restarts.

This means when you run a database in Kubernetes, you need to think about:

How data is stored (Persistent Volumes)
How pods are managed (StatefulSets)
How storage integrates with your cloud provider (AWS EBS in my case)

My Approach

Here’s the stack I used to make this work:

Terraform – Provisioned the VPC, subnets, and Amazon EKS cluster.
AWS EBS CSI Driver – Enabled dynamic provisioning of EBS volumes for persistent storage.
Helm – Deployed MongoDB as a StatefulSet with customized values.
Mongo Express – A simple web-based UI to interact with MongoDB.
NGINX Ingress Controller – Exposed Mongo Express via an AWS Load Balancer.

Architecture Diagram

Here’s what the setup looks like:

Step 1: Provision Infrastructure with Terraform

Terraform took care of the networking and cluster setup. At a high level:

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.1.2"
  # Public & private subnets defined here...
}

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "20.24.2"
  cluster_name    = "mongo-eks-cluster"
  cluster_version = "1.30"
  # Managed node groups defined here...
}

Once applied, I had a working VPC + EKS cluster.

Step 2: Install AWS EBS CSI Driver

kubectl apply -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/ecr/?ref=release-1.22"
kubectl get pods -n kube-system -l app=ebs-csi-controller

This command pulls the stable release manifest and applies all necessary resources (controller, daemonset, RBAC, etc.) and verify installation.

Without installing this, its just like trying to claim a persistent volume that doesnt exist, the cluster can not spin up a volume without a provisioner installed which is the EBS CSI driver

Step 3: Attach policy to the Node IAM role

aws iam attach-role-policy \
  --role-name <NodeInstanceRoleName> \
  --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy

This command gives EKS worker nodes permission to create and manage EBS volumes so MongoDB (or any pod) can use persistent storage.

Step 4: Install MongoDB via Helm

Before installing mongodb, set the mongodb-values.yaml for the required replicaset and persistent volume. Incase you are wondering what values.yaml file is, its the configuration file that overrides the default configuration when using helm to install any app.

helm install my-mongodb oci://registry-1.docker.io/bitnamicharts/mongodb \
  -f mongodb-values.yaml

Step 5: Create a service for Mongodb deployment

After installing mongodb on the cluster, it creates a headless service (with clusterIP: None), which is useful for internal communication between MongoDB pods, for example when they need to discover each other in a replica set. However, this is not suitable for external applications like Mongo Express, which need a stable endpoint to connect to MongoDB.

In statefulset, one of the replicaset has the read and write which is the primary pod, while secondary pods have just read to reduce the load on primary pod. If a headless service of a secondary pod is used in the configmap for example, the UI might be accessible but one might be unable to write to it therefore limiting the functions of the app. Its quite important to connect to the database as a whole than connecting to the individual pod.

To make MongoDB accessible to other applications inside the cluster, we need a ClusterIP service. This service provides a single stable DNS name and load balances connections across the MongoDB pods.

kubectl apply -f mongodb-svc.yaml

Check all the pods and services to comfirm they are running;

kubectl get pod
kubectl get svc
kubectl get all

Step 6: Deploy Secret and Configmap Volume

This is needed for the Mongoexpress to connect with the mongodb, these are some of the values we defined in the values.yaml

kubectl apply -f secret.yaml
kubectl apply -f configmap.yaml

Step 7: Deploy Mongo Express

Mongoexpress in the UI for the database, this will be deployed using kubernetes deployment.

kubectl apply -f mongoexpress.yaml

Step 8: Install Nginx ingress controller via helm

Nginx ingress controller automatically spin up cloud loadbalancer in my case, aws loadbalancer and route traffic from the ingress to the loadbalancer behind the hood.

helm install ingress-nginx oci://ghcr.io/nginxinc/charts/ingress-nginx \
  --namespace ingress-nginx \
  --create-namespace

Step 9: Deploy Ingress rule

This file contains the rule that ingress controller apply, basically how the traffic needs to be routed. Also this was exposed to all instead of a domain name because this is a testing environment.

kubectl apply -f ingress.yaml

Step 9: Accessing via mongoexpress UI

kubectl get all

The IP address on the ingress service would be accessible on the browser, you can access and store data on the database Mongo Express credentials (default): admin / pass.

Bonus: you can try to scale down the replicaset after saving data in the database, confirm there is no pod running, then scale up the replicaset and you will find your database would be attached back to it, thats the power of persistent volume in the cloud, pod or node restarting doesnt mean storage is lost.

Limitations of Self-Hosted

While self-hosting gives control and flexibility, it also introduces challenges such as:

Backup and restore processes.
Security patching and updates.
Monitoring and scaling for production workloads.
High availability and failover management.

Key Lessons Learned

Linode vs AWS Storage Provisioning – Linode’s Kubernetes engine comes with a default storage provisioner, but AWS EKS requires installing and configuring the EBS CSI Driver with proper IAM permissions.
StatefulSet Basics – Understanding PVC binding, PV lifecycle, and how pods reconnect to the same volumes after restarts.
Ingress Integration – AWS LB + NGINX Ingress can seamlessly route traffic to internal services and external services

Improvements

mongodb-values.yaml, terraform lock files and statefiles, .tfvars should all be in the .gitignore file
EBS CSI drive can be installed, IAM policy attached, mongodb deployed via helm all usinf terraform, this helps maintain versions
Secret.yaml can be done using third party tool like aws secret manager and referencing it in the secret.yaml instead of exposing it
Lastly, our application can be accessed via domain name, it can be configured in the ingress rule.
TLS for a secured connection using certmanager

Next Steps

To take this setup closer to production readiness, the following can be implemented:

Automated Backups – configure scheduled snapshots or use tools like Velero.
Monitoring & Alerting – integrate with Prometheus + Grafana for insights.
Maintenance – apply updates and patches regularly.
Scaling – configure replica sets for fault tolerance.
Migration to Managed Services – consider AWS DocumentDB for reduced operational overhead.

You can find all codes snippet in my github repo

AzeematRaji @azeemah