Service Mesh
Surendra Kumar

Surendra Kumar @dearcomrade

Location:
Bangaluru
Joined:
Jul 19, 2025

Service Mesh

Publish Date: Jul 19
0 0

Imagine you are an engineer at a fast-growing e-commerce company like Amazon. Your team started with a monolithic application—a single, large codebase handling user requests, inventory, and payments. As the company grew, you migrated to a micro-services architecture with separate services for:

  • User Service (handles user accounts)
  • Order Service (processes orders)
  • Inventory Service (tracks stock)
  • Payment Service (handles payments)

At first, everything ran smoothly. However, as complexity increased, several problems emerged:

  • 🔴 Problem #0: Authentication & Authorization

    Each micro-service must handle logins & permissions separately, resulting in duplicated and inconsistent security logic.

  • 🔴 Problem #1: Hard-to-Debug Failures

    PopCustomers report orders failing randomly. Logs indicate errors in the Payment Service, but it’s unclear whether these issues are from network glitches, load spikes, or faulty service updates.

  • 🔴 Problem #2: Security Risks

    Sensitive data, like payment details, is transmitted between services unencrypted. Without proper encryption, interception becomes a major risk.

  • 🔴 Problem #3: Load Balancing Issues

    The Order Service is overwhelmed while the Inventory Service remains underutilized. Efficient traffic distribution is missing.

  • 🔴 Problem #4: Slow Rollouts & Deployments

    Introducing a new version of the Payment Service poses a risk. Testing it on only 10% of users is ideal, but gradual rollouts without downtime are challenging.

This growing complexity makes managing your microservices painful. This is where a Service Mesh comes in! 🚀


1. Why Do You Need a Service Mesh?

Before service meshes, organizations manually managed service-to-service communication using:

  • Custom code in each micro-service
  • Reverse proxies (e.g., NGINX, HAProxy)
  • Load balancers
  • API gateways

While this approach worked, it had serious limitations:

  • No standardized way to handle retries, timeouts, and failures.
  • Security risks due to unencrypted communication.
  • Limited observability making it hard to trace requests across multiple services.

Real-World Example: Netflix

Netflix’s microservices faced issues like:

  • Slow response times because of ineffective traffic control.
  • DDoS threats due to the absence of a central security layer.
  • Authentication issues from each microservice handling its own login logic.

Netflix Implementation

https://netflixtechblog.com/zero-configuration-service-mesh-with-on-demand-cluster-discovery-ac6483b52a51


2. How Service Mesh Helps

Internal Working of Service Mesh

A Service Mesh is an infrastructure layer that manages service-to-service communication within a distributed micro-services architecture. It abstracts away networking concerns, thereby enhancing:

  • Security: Through secure, encrypted communication (mTLS).
  • Traffic Management: Via intelligent routing, load balancing, and retries.
  • Observability: With integrated logging, tracing, and monitoring.
  • Resilience: By using circuit breakers and fault injection.

Key benefits include decoupling networking logic from application code and centralizing security policies.


3. Service Mesh Architecture

Service Mesh Architecture

A service mesh consists of two primary components:

3.1 Data Plane (Sidecar Proxies)

Each micro-service runs alongside a sidecar proxy that intercepts all incoming and outgoing traffic.

Example: Envoy Proxy

  • Responsibilities:
    • Service Discovery & Load Balancing: Efficiently routes traffic among service instances.
    • Retries & Circuit Breaking: Enhances resilience by managing failures.
    • Security Enforcement: Uses mTLS for zero-trust security.
    • Telemetry Collection: Gathers metrics, logs, and traces.

3.2 Control Plane

  • Responsibilities:
    • Manages and configures the sidecar proxies.
    • Applies traffic rules, security policies, and observability settings.

These components work together to enforce consistent communication policies across your micro-services.

How It Works: Without vs. With Service Mesh

Without Service Mesh:

  • Service A → Service B(Custom retry logic, security, and logging embedded in application code)
  • Service B → Database(Direct connection without security enforcement)

With Service Mesh:

  • Service A → Envoy Proxy (handles retries, security, monitoring) → Service B
  • Service B → Envoy Proxy → Database(All communications are secured and monitored)

4. Popular Service Mesh Implementations

Popular Service Mesh Implementations

Service Mesh Proxy Used Best For
Istio Envoy Kubernetes-native applications
Linkerd Linkerd-proxy Simplicity and lightweight setups
Consul Envoy Multi-cloud and hybrid environments
AWS App Mesh Envoy AWS-native applications

For more details, see the Comparison of Service Meshes.


5. Benefits of Service Mesh

5.1 Security: Encrypted & Secure Communication (mTLS)

  • Problem: Unencrypted traffic is vulnerable to interception.
  • Example: A banking app transmitting user passwords in plain text.
  • Solution: Enforce mTLS to ensure all communication is secure and authenticated.
kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: default
spec:
  mtls:
    mode: STRICT
EOF
Enter fullscreen mode Exit fullscreen mode

5.2 Traffic Control

💡 Benefit: Manage how traffic flows between services, allowing canary deployments, fault injection, and load balancing.

🔀 A/B Testing with Traffic Splitting

📌 Route 80% of traffic to v1 and 20% to v2 of the reviews service


kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews
  namespace: default
spec:
  hosts:
  - reviews
  http:
  - route:
    - destination:
        host: reviews
        subset: v1
      weight: 80
    - destination:
        host: reviews
        subset: v2
      weight: 20
EOF

Enter fullscreen mode Exit fullscreen mode

🔹 Now, when you access productpagemost users will see v1, while 20% get v2.

  • Example: Netflix gradually releases a new video recommendation algorithm.\

5.3 Observability: Debugging Made Easy

  • Problem: Locating the root cause of failures across multiple services is challenging.
  • Example: Uber’s thousands of microservices made debugging nearly impossible.
  • Solution: Integrate with Jaeger and Kiali for distributed tracing and visualization.

istioctl dashboard kiali  # Open the Kiali dashboard for service mesh visualization
Enter fullscreen mode Exit fullscreen mode

5.4 Load Balancing: Efficient Traffic Distribution

  • Problem: Imbalanced traffic can overwhelm some services while underutilizing others.
  • Example: Amazon’s checkout service receives millions of requests per minute.
  • Solution: Dynamically distribute requests using advanced load balancing algorithms.

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: checkout-service
spec:
  host: checkout-service
  trafficPolicy:
    loadBalancer:
      simple: LEAST_CONN  # Directs traffic to the least busy instance
Enter fullscreen mode Exit fullscreen mode

Additional Capabilities

  • Traffic Management: Intelligent routing, load balancing, retries, and failover.
  • Security Enhancements: mTLS, role-based access control (RBAC), and centralised authentication/authorization.
  • Observability & Monitoring: Distributed tracing (Jaeger, Zipkin), logging/metrics collection (Prometheus, Grafana, Kiali), and dynamic service discovery.
  • Resilience & Fault Tolerance: Circuit breaking, rate limiting, and fault injection for chaos engineering.

7. When to Use a Service Mesh?

Scenario Use Service Mesh?
Small app with fewer than 5 services ❌ No (Overhead is too high)
10+ microservices ✅ Yes (Provides standardized communication)
Multi-cloud or hybrid deployments ✅ Yes (Simplifies cross-cloud networking)
High-security environments (banking, healthcare) ✅ Yes (Enforces mTLS and centralized authentication)

8. Real-World Benefits & Use Cases

✅ Case Study #1: Airbnb – Scaling Microservices

  • Problem:

    Airbnb had thousands of microservices, making debugging nearly impossible due to scattered logs.

  • Solution:

    They implemented Istio (Service Mesh) along with Jaeger for distributed tracing.

  • Outcome:

    Engineers could trace failures in seconds, reducing incident response time by 70%.


✅ Case Study #2: Stripe – Secure Payment Transactions

  • Problem:

    Stripe processes millions of financial transactions and required end-to-end encryption between microservices.

  • Solution:

    They deployed a Service Mesh with mTLS to encrypt all service-to-service communication automatically.

  • Outcome:

    • Zero unencrypted transactions
    • Full compliance with banking security standards

✅ Case Study #3: Netflix – Canary Deployments

  • Problem:

    Netflix needed to test new versions of its recommendation engine on a small subset of users to avoid downtime.

  • Solution:

    They used Istio’s traffic splitting feature to direct 10% of traffic to the new version while maintaining 90% on the stable version.

  • Outcome:

    • Safer rollouts
    • No downtime during updates

Conclusion

A Service Mesh—using tools like Istio, Linkerd, or Consul—streamlines inter-service communication in a microservices architecture by abstracting networking, security, and observability away from application code. This approach lets developers focus on core business logic while ensuring robust, secure, and resilient interactions between services.

Feel free to reach out if you need further clarification or want to explore a live demo of these concepts in your environment!

References:
https://www.alibabacloud.com/blog/getting-started-with-service-mesh-origin-development-and-current-status_597241


Istio Config for path based service redirection

apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
  name: core-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts:
    - '*'
    port:
      name: http
      number: 80
      protocol: HTTP
---
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: core-virtualservice
  namespace: istio-system
spec:
  gateways:
  - core-gateway
  hosts:
  - '*'
  http:
  - match:
    - uri:
        prefix: /cg
    route:
    - destination:
        host: stg-content-service-api.all-staging.svc.cluster.local
        port:
          number: 8085
  - match:
    - uri:
        prefix: /
    route:
    - destination:
        host: stg-go-mobile-api.all-staging.svc.cluster.local
        port:
          number: 3333

Enter fullscreen mode Exit fullscreen mode

Istio Config for route based service redirection

apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
  name: core-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 30012
      name: content-port
      protocol: HTTP
    hosts:
    - '*'
  - port:
      number: 30013
      name: mobile-port
      protocol: HTTP
    hosts:
    - '*'
---
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: combined-virtualservice
  namespace: istio-system
spec:
  hosts:
  - '*'
  gateways:
  - core-gateway
  http:
  - match:
    - port: 30012
    route:
    - destination:
        host: stg-content-service-api.all-staging.svc.cluster.local
        port:
          number: 8085
  - match:
    - port: 30013
    route:
    - destination:
        host: stg-go-mobile-api.all-staging.svc.cluster.local
        port:
          number: 3333
Enter fullscreen mode Exit fullscreen mode

Comments 0 total

    Add comment