AlertInsightHub: AI-Powered Cloud Alert Triage & Visualization Platform Using Postmark
Shankar Somasundaram

Shankar Somasundaram @cloudcraftcurator

About: Curious tech enthusiast exploring AI/ML, cloud, and automation. Love learning, building, and sharing ideas to solve real-world problems creatively.

Joined:
May 3, 2025

AlertInsightHub: AI-Powered Cloud Alert Triage & Visualization Platform Using Postmark

Publish Date: May 25
3 0

This is a submission for the Postmark Challenge: Inbox Innovators.

AlertInsightHub: AI-driven cloud alert triage and visualization for faster incident resolution in hybrid environments.

💡 What I Built

AlertInsightHub is an AI-powered cloud alert triage and visualization platform built for SREs and cloud engineers managing hybrid, multi-cloud, and on-premise infrastructures. It streamlines the processing of infrastructure monitoring alerts, such as AWS SNS email notifications, by seamlessly integrating with Postmark’s inbound webhook system.

AlertInsightHub uses a smart AI agent to extract critical metadata—such as service name, resource, alert metric and resource name —from incoming alert emails. These alerts are transformed into structured, actionable insights, stored in a scalable backend datastore, and visualized through a real-time, interactive dashboard.

This enables engineering teams to triage cloud incidents faster by filtering and investigating alerts based on cloud account, service, instance, and metric type.


📘 Use Case

In hybrid cloud environments, operations teams face a deluge of alerts originating from various monitoring tools. These alerts—typically delivered via email—are unstructured and require manual processing, which is inefficient and error-prone. AlertInsightHub streamlines this process by transforming unstructured alert emails into structured, actionable insights and providing a unified dashboard for triage and resolution.

Cloud and on-premise monitoring tools generate high volumes of alerts, often delivered as plain-text emails. Manual triage of these alerts leads to:

  • Alert fatigue due to excessive, repetitive, and often low-priority notifications
  • Delayed response times to critical incidents
  • Inconsistent resolution paths for similar recurring issues
  • Lack of historical traceability and difficulty analyzing trends over time

These challenges are amplified in hybrid environments where alert sources span across cloud platforms, legacy systems, and third-party tools.

Target Users

  • DevOps Engineers **and **SREs in hybrid/multi-cloud environments
  • NOC teams managing high-volume monitoring and alerting systems
  • Incident response teams seeking structured, AI-assisted remediation workflows

Solution Summary

AlertInsightHub addresses these challenges by:

  • Automated Alert Ingestion & Processing
  • Intelligent Alert Insights
  • Unified Incident Response

Key Benefits

  • Reduced MTTR: Faster issue identification and resolution using structured alert data
  • Operational Efficiency: Frees teams from repetitive manual alert triage
  • Pattern Visibility: Enables insight into recurring incidents and noisy systems
  • Institutional Knowledge: Builds a repository of incidents and resolutions for future reference
  • Hybrid Flexibility: Supports cloud, on-premises, and hybrid environments seamlessly

🏗️ System Design & Architecture

The system is built with a layered architecture:

  • Data Collection Layer: Captures incoming Postmark webhooks and queues them for processing.
  • Processing Layer: Converts Postmark webhooks into standardized alerts and generates remediation suggestions using Groqcloud API.
  • Data Storage Layer: Uses DynamoDB to store raw data, alerts, configurations, and AI-driven recommendations.
  • Presentation Layer: Offers a web dashboard for real-time alert monitoring, queue tracking, and settings configuration.

AlertInsightHub High Level Design Architecture


🧪 Demo Video

Watch a walkthrough of AlertInsightHub showcasing how incoming AWS cloud SNS Topic alerts are automatically processed via Postmark, parsed using an AI agent, and visualized on an interactive dashboard.

CloudAlert Dashboard: Summary of AWS Account Alerts


🧪 Testing Instructions

  1. Send a formatted alert email (e.g., SNS notification) to: support@cloudcraftcurator.tech (or Postmark Inbound Email Address)
  2. Postmark routes the email to a webhook endpoint.
  3. AI agent processes the content and stores parsed data.
  4. View the alert on the dashboard.

Refer to the Readme for detailed instructions.


🧰 Code Repository

🔗 GitHub – AlertInsightHub

You can find the complete source code for the project here. Feel free to explore, contribute, or even fork the project to adapt it for your own needs. The repository is regularly updated and pull requests are always welcome!


⚙️ How I Built It

  • Postmark Inbound Webhook: Receives alerts from support@cloudcraftcurator.tech and triggers webhook delivery.
  • FastAPI App + AI Agent: Processes inbound JSON payloads using a lightweight AI agent hosted locally. The agent identifies key metadata from the alert—service, resource, metric type, etc.
  • DynamoDB (Local): Stores structured alert records efficiently with support for fast queries and grouping.
  • Interactive Dashboard (React/Streamlit): Renders alert summaries, supports drill-down views by account → service → instance → metric.
  • Devcontainer Support: Local development uses Docker and devcontainer.json to install dependencies (e.g., DynamoDB local, FastAPI app, etc.).
  • Self-hosted Webhook: Deployed using Docker and exposed securely using services like ngrok or custom domain.

📬 Postmark Features

In the AlertInsightHub project, I've leveraged several key Postmark features to create a robust webhook processing system:

1. 🔗 Webhook Integration

  • Inbound Webhook Endpoint Implemented a dedicated endpoint (/api/webhook) that receives and processes incoming Postmark webhook payloads.
  • Webhook URL Display Created a user-friendly display of the webhook URL in the dashboard with a copy button for easy integration with Postmark.

2. ⚙️ Event Processing

  • Event Queueing All incoming webhook events are immediately stored in a queue system (DynamoDB) for reliable processing.
  • Asynchronous Processing Implemented a non-blocking design where webhooks are received quickly and processed asynchronously to prevent timeouts.

3. 🗃️ Data Storage & Management

  • Raw Data Preservation All incoming webhook payloads are stored in their original form in the postmark_data table for audit and debugging purposes.
  • Structured Data Extracted and stored relevant metadata (timestamp, status, etc.) in the webhook_queue table for efficient querying.

4. 📊 Monitoring & Visualization

  • Status Tracking Implemented a comprehensive status tracking system (pending, processed, error) for all webhook events.
  • Dashboard Visualization Created charts showing webhook distribution by status and date.
  • Filtering Capabilities Added filtering by date and status to help analyze webhook data effectively.

5. 🚨 Error Handling & Resilience

  • Error Tracking Captured and stored error messages when webhook processing fails.
  • Reprocessing Capability Implemented a reprocessing feature for failed webhooks.
  • Graceful Degradation System continues to function even when some components fail.

🔄 Key Iterative Enhancements

  • Iteration 1: Integrated Postmark with Cloudflare, built alert hub with sample data.
  • Iteration 2: Integrated GroqCloud AI agent, updated AI roles and processing logic, created webhook_queue table for raw data storage and reprocessing.
  • Future: Future plans include supporting multiple AI agents to improve alert classification accuracy and provide enhanced remediation suggestions.

💡 Lessons Learned

  • Managing email routing with Postmark requires careful DNS configuration to avoid conflicts.
  • Running local development environments with DynamoDB local and FastAPI in Docker greatly improves reproducibility.
  • AI agents can significantly reduce manual alert triage but require continuous refinement to handle diverse alert formats.
  • Building an intuitive drill-down dashboard enhances operational visibility and speeds incident investigation.

✉️ Contact

For questions or suggestions, please reach out via GitHub Issues or open a discussion in the repository!. Thank you.

Comments 0 total

    Add comment