Optimizing API Performance – Part 2: Load Balancing
Fredrick Oladipupo

Fredrick Oladipupo @fredrick_oladipupo

About: Senior Software Engineer When you know enough, you teach

Location:
Mother Earth
Joined:
Mar 13, 2025

Optimizing API Performance – Part 2: Load Balancing

Publish Date: Mar 25
8 4

APIs are the backbone of modern applications, handling thousands (or even millions) of requests daily. But as traffic grows, a single server can quickly become overwhelmed, leading to slow responses, timeouts, crashes and ultimately loss in business revenue.

This is where load balancing comes in. Load balancers are capable of distributing traffic across multiple servers, ensuring your API remains fast, scalable, and highly available. This tutorial is part of a series on Optimising api performance. You can read part 1 of this series on Caching here


In this guide, we’ll explore:

  1. When you should consider load balancing in your application
  2. How load balancing works
  3. Load balancing strategies
  4. Real-World Case Study: PayU
  5. Common load balancing issues and fixes
  6. Best practices
  7. Conclusion & Resources

1. When Should I consider Load Balancing?

Not every API needs load balancing from day one. But you should strongly consider it if:

  • Your API receives 1,000+ requests per second.
  • Your server CPU usage consistently exceeds 70%.
  • You need high availability (e.g., financial transactions, streaming)
  • You’ve experienced downtime due to traffic spikes - Those graphs are telling you something
  • Reliability - Robust Load balancer like Elastic load balancer are capable of performing health checks on upstreams servers before distributing requests - helping maximise performance and resiliency.
  • You need an added layer of Security - Load balancers are capable of playing crucial role in security. They are capable of acting as a buffer against DDoS attacks, and enabling features like SSL encryption and WAF integration.
  • Likewise, Load balancers can be instructed to drop requests from suspicious source. Ensuring these malicious requests do not reach your core application resources.

If any of these apply to your system, it’s time to implement a load balancer.


2. How Load Balancing Works

A load balancer acts as a middle layer between clients and backend servers. It distributes incoming networks and or application requests across multiple servers or "targets" based on predefined rules.

Load balancer prevents overload and improve performance, availability, and scalability of an application. Load balancer acts as a single point of contact for clients, routing requests to healthy servers and monitoring their health.

Example:

  • Without load balancing: Requests go to one server, which can get overloaded.
  • With load balancing: Requests are spread across multiple servers, improving speed and reliability and make it easy to scale.

Here’s a basic visualisation of how it works:

Client Requests  --->  Load Balancer  --->  Server 1
                                      --->  Server 2
                                      --->  Server 3

Enter fullscreen mode Exit fullscreen mode
  • Nginx as a Load Balancer *
http {
    upstream backend_servers {
        server api-server-1.local;
        server api-server-2.local;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://backend_servers;
        }
    }
}

Enter fullscreen mode Exit fullscreen mode

This setup distributes requests between api-server-1.local and api-server-2.local.


3. Load Balancing Strategies

Load balancing strategies are methods or techniques load balancer uses in distributing network or application traffic across multiple targets. This strategy defines the rules to optimize resource utilisation, improve performance, and enhance reliability of the system as a whole. We will explore few of the common strategies below.

1. Round Robin (Default & Simple)

Distributes requests evenly across all servers.
Best for: APIs with similar server capacities.
Nginx illustration:

upstream backend {
    server api1.example.com;
    server api2.example.com;
}

Enter fullscreen mode Exit fullscreen mode

2. Least Connections (Great for Heavy Requests)

Sends traffic to the server with the fewest active connections.
Best for: APIs with long-running or high-latency requests.
Nginx illustration:

upstream backend {
    least_conn;
    server api1.example.com;
    server api2.example.com;
}

Enter fullscreen mode Exit fullscreen mode

3. IP Hash (Good for Session Persistence)

Assigns each client’s request to the same server based on their IP.
Best for: APIs that need session consistency (e.g. authentication, sticky sessions).
Nginx illustration:

upstream backend {
    ip_hash;
    server api1.example.com;
    server api2.example.com;
}

Enter fullscreen mode Exit fullscreen mode

4. Weighted Load Balancing (Fine-Tuned Traffic Control)

Some servers are more powerful than others—this method sends more traffic to stronger servers.
Best for: APIs running on mixed-capacity servers.
Example:

upstream backend {
    server api1.example.com weight=3;
    server api2.example.com weight=1;
}

Enter fullscreen mode Exit fullscreen mode

Here:


4. Real-World Case Study: PayU

PayU one of India's leading payments and fintech companies *serves over 450,000 merchants using over 100+ payment methods * and cannot afford API slowdowns. PayU uses application load balancers to scale their service 10x to meet demands during peak periods.


5. Common Load Balancing Issues & Fixes

Issue Fix
Some servers get more traffic than others Use Least Connections instead of Round-Robin.
High latency even with load balancing Enable Geo-DNS to route users to nearby servers.
Users randomly get logged out Use session persistence (sticky sessions).
Unhealthy target servers Implement health checks and use auto scaling groups to ensure new servers are spin up when one or more servers becomes unhealthy.

6. Best Practices for Load Balancing APIs

  • Choose the right strategy – Round Robin is simple, but Least Connections or Weighted Load Balancing may be better.
  • Enable health checks – Automatically remove failing servers from the pool.
  • Use cache alongside load balancing – Combining both boosts performance further.
  • Monitor and tweak – Use tools like Prometheus & Grafana to track load balancer health.

7. Conclusion & Resources

Load balancing is essential for scaling APIs and ensuring high availability. Whether you’re handling a small SaaS app or a global platform like PayU, the right load balancing strategy prevents slowdowns, reduces costs, and improves user experience.

Continue with the 3rd part of this series Asynchronous Processing & Queues here

Resources

Comments 4 total

  • Gbemi
    GbemiMar 29, 2025

    Well explained. 👏

  • Valentine Offiah
    Valentine OffiahMar 30, 2025

    Good read

  • 7075T6 extruded aluminum
    7075T6 extruded aluminumMar 30, 2025

    Looking for affordable laser hair removal in London? At London Laser Lounge, we offer safe, effective, and budget-friendly treatments to help you achieve silky smooth skin. Our expert team uses advanced laser technology to provide long-lasting results for all skin types. Affordable laser hair removal London Say goodbye to unwanted hair with our cost-effective solutions and enjoy flawless, hair-free skin. Book your consultation today!

  • 7075T6 extruded aluminum
    7075T6 extruded aluminumMar 30, 2025

    Looking for a trusted Smithtown NY remodeling contractor? Bildco specializes in high-quality home renovations, transforming spaces with expert craftsmanship and attention to detail. Whether you're updating your kitchen, bathroom, or entire home, our experienced team delivers outstanding results tailored to your vision and budget. Enhance your home’s beauty and functionality with our professional remodeling services. Contact us today for a consultation!

Add comment