Good news for developers, SREs, and cloud engineers — Amazon CloudWatch Agent now supports collecting detailed performance statistics for EBS volumes attached to EC2 and EKS nodes.
This means you can finally monitor and troubleshoot your EBS storage like a pro — with visibility into NVMe-level metrics such as:
- 🔁 IOPS (read/write operations)
- 📦 Throughput (bytes read/written)
- ⏱️ I/O wait time
- 🎯 Queue depth
Let’s break it down with a real-world example.
🔧 Use Case: App is Slow, But CPU & RAM Look Fine?
You’re running a production web app on EC2 with a gp3 EBS volume.
The app gets sluggish during peak hours, but CloudWatch shows:
- CPU: fine
- Memory: fine
- Network: fine
Now, thanks to the new update, you can collect EBS disk-level metrics and discover the real problem.
🧪 Step-by-Step Example
Step 1: Enable EBS Metrics in CloudWatch Agent
Update your amazon-cloudwatch-agent.json config:
{
"metrics": {
"metrics_collected": {
"diskio": {
"resources": ["*"],
"measurement": [
"reads", "writes", "read_bytes", "write_bytes",
"io_time", "await", "util", "queue"
],
"metrics_collection_interval": 60
}
}
}
}
Then restart the agent:
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
-a fetch-config -m ec2 -c file:/path/to/config.json -s
Step 2: View in CloudWatch
You'll now see custom metrics like:
- await → time the app waits for I/O
- queue → how many I/O ops are waiting
- io_time → total time EBS spends on operations
- read_bytes, write_bytes → data throughput
Step 3: Analyze & Act
During peak load:
- queue = 22 (too high)
- await = 120ms (delays noticeable)
- write_bytes drops sharply
🧠 Root cause: EBS is bottlenecked. Time to provision more IOPS or switch from gp3 to io2.
✅ Why This Matters
Benefit | Impact |
---|---|
Granular storage insights | Understand app latency at disk level |
Real-time metrics | Catch slowdowns before users do |
Automation ready | Build alarms & dashboards |
Works with EC2 + EKS | Great for both VMs & containers |
🧾 TL;DR
🚀 CloudWatch Agent now supports:
- NVMe-based EBS performance metrics
- Queue depth, IOPS, throughput, and more
- Alarms, dashboards, and smarter diagnostics
No more guessing — now you can see and solve storage bottlenecks confidently.
Have you started using EBS metrics in your monitoring stack?
Drop your setup or questions in the comments