Spring Batch Clustering With Zero Messaging: Introducing spring-batch-db-cluster-partitioning (v1.0.0)
Janardhan Chejarla

Janardhan Chejarla @jchejarla

About: Lead Software Engineer | Java & Spring Boot | Cloud-Native Systems | Open-source contributor to distributed job frameworks.

Location:
New Jersey
Joined:
Jun 10, 2025

Spring Batch Clustering With Zero Messaging: Introducing spring-batch-db-cluster-partitioning (v1.0.0)

Publish Date: Jul 8
2 1

⚡ TL;DR

A production-ready Spring Batch extension for dynamic clustering using only the database — no Kafka, no RabbitMQ, no coordination servers. Version 1.0.0 is now released on Maven Central!

🚀 Why I Built This
Distributed batch processing often requires complex infrastructure — think messaging systems, zookeeper, or centralized schedulers. In many real-world deployments, especially in financial institutions, this adds cost, risk, and tight operational coupling.

To solve this, I built a lightweight, pluggable Spring Batch extension that enables cluster-aware partitioned step execution using nothing but a shared database as the coordination layer.

🛠️ Key Features

Cluster Coordination Using DB Only
Each node registers, heartbeats, and participates in job execution using lightweight database tables — no brokers required.

Dynamic Partition Assignment
Partitions are assigned and rebalanced at runtime based on node availability — perfect for ephemeral cloud-native deployments.

Failover & Recovery
If a node dies mid-job, remaining nodes detect the loss and reassign unfinished partitions.

Pluggable Coordination Tables
Custom schemas (BATCH_NODES, BATCH_PARTITIONS, BATCH_JOB_COORDINATION) provide fine-grained visibility and control.

Zero External Dependencies
Built purely on Spring Batch 5.x, Spring JDBC, and standard transaction semantics.

📦 Released Artifacts

<dependency>
  <groupId>io.github.jchejarla</groupId>
  <artifactId>clustering-core</artifactId>
  <version>1.0.0</version>
</dependency>
Enter fullscreen mode Exit fullscreen mode

Available now on Maven Central.

🔍 Architecture Overview

  1. Master Node — Automatically elected (no Zookeeper!) based on job launcher.
  2. Coordinator Tables — Shared DB tables used to track:
    • Active nodes
    • Step partition states
    • Ongoing executions
  3. PartitionHandler — Custom implementation that uses SQL to dynamically assign work.

✅ Example Code: GitHub - examples directory for ready-to-run Spring Boot projects demonstrating the cluster partitioning in action.

Architecture Diagram

💡 Use Cases

  • Spring Batch jobs in Kubernetes (nodes scale up/down)
  • FinTech ETL pipelines where messaging systems are overkill
  • On-prem enterprise environments with restricted tech stacks
  • Batch workloads on serverless compute (like AWS Fargate or Google Cloud Run)

🧪 Example: Dynamic Step Execution

@Bean
public Step partitionedStep() {
    return stepBuilderFactory.get("partitionedStep")
        .partitioner("workerStep", customPartitioner())
        .partitionHandler(clusterAwarePartitionHandler())
        .build();
}
Enter fullscreen mode Exit fullscreen mode

The clusterAwarePartitionHandler is where the DB magic happens.

📈 What’s Next
🔹 Monitoring endpoints
🔹 Retry policies and smarter failure detection

If you’re interested in how this framework works under the hood, I’ve started a detailed article series:
📘 Distributed Spring Batch Coordination: Part 1 – The Problem with Traditional Scaling

More parts coming soon!

🙌 Contribute or Follow

🧠 Behind the Scenes

This project was born from practical needs in building scalable ETL pipelines in a large financial services ecosystem. By removing the messaging layer, we reduced infra cost and simplified failover handling — while maintaining full reliability.

👋 Final Thoughts
If you're tired of spinning up Kafka just to run partitioned jobs — this is for you.

This library is designed for engineers, architects, and teams who want reliability without orchestration overload. Try it out, share feedback, and help shape the roadmap.

Comments 1 total

  • Janardhan Chejarla
    Janardhan ChejarlaJul 30, 2025

    This article is based on a real-world use case where we eliminated messaging/middleware orchestration like Kafka or Zookeeper for Spring Batch job coordination — relying solely on a relational database.

    I'm planning to write a full article series covering architecture, partition handling, failover, node load balancing, and real deployment tips.

    👉 If you find this useful or interesting, feel free to ⭐ star the GitHub repo— it helps surface the project to others in the community.

    Would love to hear your feedback or suggestions on what to cover next.

Add comment