Microservices communications. Why you should switch to message queues.

Publish Date: Feb 23 '18

154 31

This is actually a more elaborate version of an old comment I wrote here

When doing microservices, a fairly crucial design point is: how should my services communicate with each other?
Many people usually choose to design some RESTful HTTP API that each service expose and then have the other services invoke it with a normal HTTP client.
This has some advantages, primarily making it easy to discover services by using DNS resolution and API Gateways, but it has many drawbacks too.
For example, what if the called service has crashed down and cannot respond? Your client service has to implement some kind of reconnection or failover logic, otherwise, you risk to lose requests and pieces of information. A cloud architecture should be resilient and recover gracefully from failures. Also, an HTTP request is a blocking API, making it asynchronous implies some tricky shenanigans on the client side.

Entering message queues

Using message queues is actually a fairly old solution when dealing with multiple intercommunicating services. Think about old approaches like Enterprise Service Buses or the Java JMS specification.

Let's start by introducing some terminology:

Message: a package for information, usually composed of two parts. Some headers, key-value pairs containing metadata, and a body, a binary package containing the actual message itself.
Producer: whoever creates and sends a message
Consumer: whoever receives and reads a message
Queue: a communication channel that enqueues messages for later retrieval by one or more consumers
Exchange: a queue aggregator that abstract away message queues and routes messages to the appropriate queue based on some predefined logic.

A message queue architecture requires an additional service called a message broker that is tasked with gathering, routing and distributing your messages from senders to the right receivers.

Think of it as a mail service. You (the producer) are sending a letter (your message) to someone (the consumer), and you do this by specifying the address (the routing logic for the message, such as the topic on which it is published) and by giving the letter to the local Post office (the message broker). After you deliver the letter, it's no longer your business to ensure that it actually arrives at your friend. The postal service will take care of that.

In a microservice environment, it's actually a really solid solution because it adds a layer of abstraction (the message broker itself) between loosely coupled services and allows for fully asynchronous communications.

Errors will occur. Let's deal with them instead of simply avoiding them

Remember, a cloud environment should be error-resilient and services must fail gracefully and not take down the whole application if one of them dies. With message queues if a microservice dies unexpectedly the broker can still receive incoming messages, store them for later (so that the dead service can recover them when it comes back online) and optionally send back a Last Will and Testament message to the other service saying that the receiver has died. Also, Push/Pull, Publish/Subscribe and Topic queue mechanisms give more flexibility and efficiency when designing inter-service communications, and the ability to send binary payloads allow for easy use of formats like Google ProtoBuff or MessagePack instead of JSON, which are much more efficient in terms of bandwidth usage.

Security is key

Advanced broker servers like RabbitMQ support multiple message protocols (AMQP, MQTT, STOMP...), have flexible authorization mechanism (access control, virtual hosts and even third party/custom auth services via many different transports such as HTTP or AMQP) and will basically remove the burden of orchestrating requests authorization from your service. Plug a custom microservice into Rabbit's auth backend and code your policies in there. The rest will be taken care of by the broker.

Infrastructure at scale

Using message queues with brokers like Rabbit helps a lot with scalability too, which is another crucial aspect of microservices.
If one of the services is struggling under load we want to be able to spin up more instances quickly and without having to reconfigure stuff around. With HTTP as our communication method we would normally have a self-registering Service Discovery Server (like Netflix Eureka or a container orchestration system like Kubernetes or Rancher) and some kind of load balancer integrated with it so that our traffic is distributed and routed to the various instances. With message queues, our broker IS the load balancer. If multiple consumers are listening to a topic at the same time messages will be delivered following a configured strategy (more on RabbitMQ's QoS here).

So, for example, we can have four instances of our service processing messages in a round robin fashion, but since the QoS parameter is customizable at runtime we could also switch to a fair dispatching strategy if one of the consumers is taking too long to complete its work and we need to balance the load out. Also, note that the client configuration is entirely done at runtime (and so is exchanges, queues and topics declarations by the way) so there is no need to tinker with the broker's configuration or restart anything.

Async == Efficiency

We said at the beginning that one of the key features of messages compared to HTTP requests is that they allow for fully async communications. If I need to make a request to another service I can send a message on it's topic and then move on with my work. I can deal with the response when it comes in on my own topic, without having to wait for it. Correlation IDs are used to keep track of which response message refers to which request. This works in concert with what we said about resiliency.
In an HTTP scenario, if I make a request and the callee is down, the connection would be refused and I would have to try again and again until it comes back up.
With message systems, I can send my request and forget about it. If the callee cannot receive it the broker will keep the message in the queue, and then deliver it when the consumer connects back. My response will come back when it can and I won't have to block or wait for it.

In conclusion, messages are great. Show them some love

This is by no mean a rant against HTTP (which is still great, especially for public front facing APIs) but rather a try to guide you towards a more efficient and proficient strategy to coordinate microservice communications and all of this can be achieved with open source software, open protocols such as AMQP that prevent vendor lock-in and with a low (and scalable) infrastructure cost (see RabbitMQ's requirements).

I used RabbitMQ for all my microservices project and it has been a great experience. Let me know in the comments what are your thoughts! Have you used message queues before? What is your experience with it?

Comments 31 total

Aldo SinanajFeb 25, 2018

Hi Matteo, nice article! I have a question about correlation id. Where do you store it so you can keep track of it once you get a response on the response topic? I was wondering if you store it in a memory cache it will not be reliable if the producer will crash, or if you have more running instances of this component then you need a shared cache?! Can you share your experience how you have deal with this scenario? Thanks, Aldo
- Matteo JoliveauFeb 26, 2018
  
  Hi Aldo, thanks for the question, it's actually a very good point I missed in the article!
  Correlation IDs are only useful if someone is actually waiting for a response in particular. Since most microservices are stateless and routing is taken care of by the broker itself, the Correlation ID can be kept in the message headers and passed around (and logged, always log CorIDs!) between services without them caring much about it. The only one that needs to keep track of it is the request initiator (e.g. the HTTP API Gateway). It will create an ID and assign it to the request message, then wait on its response topic for a message with the same ID.
  The other services will not need to care about it since the topic on which to send responses to is provided by the request message itself (in the reply_to RabbitMQ property for example) so every req/res is self-contained.
  You receive a request, you process it and you send it to the provided topic.
  
  On the other hand, if you need to contact another service in order to answer a request you just received, you become an initiator yourself, so when you send your secondary request you can use the same Correlation ID when waiting for the response.
  
  Now to actually answer your question, the IDs you are traking can be kept in memory (normally you use a thread pool or an event loop to deal with requests/responses so the ID can be maintained there) or you can use a cache like Redis, but by no means, it must be shared. Remember that microservices are independent and must only care about themselves.
  
  I will update the article to add this and other enhancements :)
  - Aldo SinanajFeb 27, 2018
    
    Thanks for the reply! Waiting for the update :-)
Matteo JoliveauFeb 26, 2018

Thanks, I'm curious to read it!
Rafael JesusMar 2, 2018

Thanks for the article, quick question, how do you produce messages when the Broker is unavailable? Yes I know there are several ways to manage this, I am just curious how you're doing ^{_^}
- Matteo JoliveauApr 4, 2018
  
  Ideally, your broker will never be unavailable.
  I know, this is silly because we don't live in an ideal world, but you should consider your broker a supporting service similar to a database or email sender.
  In a cloud environment it is best to have the broker set up as a managed service (e.g. Amazon SQS) or in a high-availability cluster separated from your core application (Rabbit supports clustering as a primary feature).
  
  If by any chance your broker actually went down, it is an unexpected crisis that must be dealt with depending on your project and infrastructure. Your services are allowed to ungracefully die (but it is always best to implement some failover, by logging the issue and halting operations while the service attempt to reconnect, maybe with some exponential backoff) in this case because it's a disaster situation and not an operational incident.
Ferdinand MütschMar 12, 2018

Great article with good arguments! I really like the asynchronous messaging concept for inter-service communication.
Lately, I went to a talk by Spring Data project lead Oliver Gierke. He presented some kind of a hybrid approach between Microservice- and monolith architectures as well as a pretty interesting communication concept, which somehow is a mixture of direct API invocation and messaging. If you're interested, here's a recording.
- Matteo JoliveauMar 14, 2018
  
  That's a very interesting talk! Thanks for sharing it :D
Scott BellwareJun 9, 2018
What countermeasures do you use for when an ack message (whether RabbitMQ, SQS, or other) is lost due to either a network fault (or any other reason why a broker is unreachable) and a message is resent. How do you avoid processing the message a second (or more) time?
- Matteo JoliveauJun 11, 2018
  
  It depends on the way the message is processed.
  Is it some kind of background job (e.g. updating a DB table or uploading a file to an object storage)? Then do your best to encapsulate the computation in a single, atomic transaction. It either succeeds entirely or fails entirely. Acks will be sent back once the transaction is successfully completed. The chances of an acknowledgement being lost at this point are very low, but a best practice is to have some kind of "commit" functionality, so you can process your job, send the ack and then commit when the broker confirms. This way, if something happens and the message cannot be acknowledged, you simply rollback the transaction and re-enqueue the job. This is basically what job queues like Sidekiq do.
  
  Is it some kind of RPC? Then you don't protect against duplicate messages. If an acknowledgement is lost, it means that the call response will probably never be delivered, so it is better to re-enqueue the message and potentially duplicate it than risk loosing it.
  - Scott BellwareJun 11, 2018
    I'm specifically talking about service architecture, and not background jobs. I recognize that the two are only tangentially related. Definitely not talking about RPC.
    
    So, distributed transactions that span a message acknowledgement and whatever actions a service takes are anathema to service architecture. They'd also increase the likelihood of dead locks, which would contradict the non-locking benefits of the architecture.
    
    The three commonly-recognized guarantees of distributed, message-based systems are that messages will arrive out of order, that messages will not arrive at all, and that message will arrive more than once. This includes ACK signals - especially with regard to messages not arriving at all.
    
    Irrespective of whether it happens infrequently, it will happen. Whether it happens one in a million times or a million in a million times, the work of implementing the countermeasures is the same. The presence of networks and computers and electricity guarantees that ACK messages will be lost and that messages will have to be reprocessed (messages will arrive more than once).
    
    So, what I'm interested in is how you specifically account for the occurrence of message reprocessing that messaging systems guarantee.
    - Galdin RaphaelMay 15, 2019
      Here's my understanding: microservices have smart endpoint and dumb pipes. So it's the service's responsibility to account for message redeliveries like you already pointed out.
      
      One way out would be to make the message processing operation idempotent. Two cases here:
      
      The operation is idempotent in and of itself. Doing nothing here will cause no side-effects.
      
      The operation is not idempotent in nature. Use an identifier (eg. a random unique guid, or hash of the message, etc.) to ensure a message is processed once per request.
      
      Your thoughts?
      - Scott BellwareJun 6, 2019
        Indeed. However, a unique message identifier would have to be persisted in order to know that it has been previously processed. Where possible, and where the messaging tool is homogenous (within a logical service, rather than integrating disparate logical services), then a message sequence number can be a better option (especially in cases where event sourcing is in use).
        
        Galdin RaphaelJun 6, 2019
        
        The message sequence number is interesting, thanks!
TARUN GOELAug 24, 2018

Hi Matteo, great article! I have a question that in what format messages must be communicated among the microservices. I am currently using hashmaps or json strings and dont think they are the best industrial practices. Is there anything you could advice.
- Matteo JoliveauAug 24, 2018
  
  If you're using brokers that support binary payloads, I recommend you look at MessagePack if you want a schema-less format (very akin to JSON). If on the other hand, you want to have a strongly typed message schema, with known data structures very similar to programming language classes, then Protocol Buffers or Apache Thrift are much more robust solutions.
  
  I recommend the latter if you have many different microservices. MessagePack is easier but can be less ideal when scaling.
Owen RubelFeb 5, 2019
In security you didn't even discuss security (CORS, Tokens, 2FA) and REST IS ASYNC by default. There are so many things wrong with this article I don't even know where to begin.
- Matteo JoliveauFeb 5, 2019
  In order:
  
  CORS don't apply here since we are not in a browser. We are talking about microservices talking to each other.
  
  there are many ways to authenticate microservices with each other. Tokens are one way, however since we have a middleman between services (the message broker) which already requires them to authenticate, you can trust messages you receive (to some degrees).
  
  what does 2FA even have to do with microservices? We are not talking about users authenticating with an application, this article is about inter-service communications.
  
  REST is NOT async, as it's just a pattern built on top of HTTP which is synchronous (you open a connection, you send a request, you wait for a response, you close the connection). The fact that languages like JavaScript allow you to perform request asynchronously doesn't mean that the protocol itself is asynchronous. Otherwise everything could be considered async.
  
  If you have specific points you wanna discuss, possibly with some real arguments to back them up, I'd be more than happy to do so. Otherwise please refrain from submitting such pointless comments again.
  - Owen RubelFeb 5, 2019
    
    I was just throwing CORS, 2FA out as examples. API's use LOTS of security and authorization. And that is implemented on the backend.
    
    If you are saying AMQP is better than HTTP then you are saying it is a replacement for HTTP based API's.
    
    So, how are you going to implement alot of the protocols that you will need to implement LIKE CORS, 2FA, Authorization? :)
    
    Fill in the gaps that you want to replace. Thats a huge gap you don't have solutions for.
    
    ... and yes, REST API's can be async. Check it out sometime.
    - Matteo JoliveauFeb 6, 2019
      I already answered to all of you points and I'm starting to believe you are not interested in having a fair, healthy discussion but to just start some flame.
      As I already stated, this article is about inter-service communications, not public facing APIs to be consumed by a user. So you don't have CORS, you don't have 2FA and you don't need microservices to authenticate with each other directly because this stuff simply doesn't apply in this context.
      I really don't see where you're going with this discussion.
      
      And again no, REST is not async on its own as I already said. You can use asynchronously, you can do funny stuff with webhooks and Websockets to make it behave like an async process, but REST itself does not provide any primitive for async message passing. And just saying "check it out sometime" without actually explaining your self or linking some resources says a lot about your way of managing discussions.
      Now if you have some actual constructive feedback to give please be my guest, otherwise I will not continue with this pointless conversation
      - Owen RubelFeb 6, 2019
        
        You haven't answered anything. You dodge. You merely state 'this is not an api' but neglect to answer how AMQO provides equivalent services to what HTTP provides
        
        Delicia BrummittAug 10, 2019
        
        Hi Mateo,
        Great article. Addressing the security aspect for any future readers exploring messaging for their Microservice architecture.
        
        About REST
        
        REST is an architectural pattern that is by convention used over HTTP but does not have to be over HTTP.
        
        A RESTful Web API is an API over HTTP that follows the REST architectural pattern.
        
        In theory, you could have a RESTful SOAP (but OMG why would you do that to yourself???). Here is an example of a RESTful UDP RFC coap.technology/
        
        About Security
        
        IMO it is probably best practice to secure your Microservice even when communicating with other microservices. Why? The safest possible stance is to assume that incoming data is harmful; so you authenticate, authorize, and scan for an attack.
        
        Do most microservices do this? I don't know. But I would be willing to bet that any multi-billion dollar company that deals in personal identifiable information(PII) or financial data does. If they didn't I would avoid them like the plague.
        
        How to do it? Here are a few ways:
        
        some type of firewall and/or ACL combined with
        
        place some services on the private network (no access to public network)
        
        entire message is encrypted with a shared secret between the services. Each service has its own unique secret.
        
        single-use, short-lived token used in headers or a part of the body. (usually when there is money involved, credit cards etc)
        
        signed certificates (with passwords)
        
        Would I secure every microservice? Probably not.
        
        When would I definitely secure a micros service? Any service that touches PII/financial data, OR tangentially touches PII/financial data. For example:
        
        Service A deals with credit cards (secure it).
        
        Service B talks to Service A (secure it).
        
        Service C does not talk to Service A nor Service B (may not secure it).
        
        Service E talks to Service B but not Service A (secure it)
        
        so on and so forth.
        
        Can this approach lead to a slower microservice? Yes, but there are occasions where security is more important than latency.
        
        NOTES:
        - The methods discussed above are top of mind solutions and not specific to any type of transport(HTTP, AMQP, SQS, etc) between microservices...
        - Google "defense in depth microservices" for more info on item 5 of About Security.
        - Checkout what OWASP has to say about security.
        
        Lastly, I love using messaging for my microservices. Especially where high volume is a concern.
        
        Happy Coding!!
        
        Matteo JoliveauAug 12, 2019
        
        Very good points. In light of the main topic (AMQP-based microservices), advanced brokers like Rabbit do have ACL systems that can and should be used to restrict what each service can do. Maybe you have a topic where sensible data are exchanged and only some services can access them, with Rabbit you can set up ACL rules to block read access to those topics.
        
        Or even segregate them on their own virtual hosts (kinda like namespaces or tenants) so that they are completely separated. Services don't need to know that this kind of access control is happening, it's enforced by the broker itself (which I prefere compared to security-aware microservices).
  - Muntazir FadhelDec 16, 2019
    
    To add to this, the fact that message queues are natively based on asynchronous protocols allow them to be more efficient and scalable. Sure you might be able to implement async patterns over HTTP, but the cost of setting creating the TCP connection and negotiating SSL/TLS for every request creates significant overhead on your system.
    
    I also feel like the word async gets misunderstood so often because it can take on different meanings depending on the context when discussing microservices. For those who are interested, I've tried to outline some of these different meanings here.
Vijay MareddyMay 17, 2019

Hi Matteo, nice article! How about a new one titled : "Microservices communications. Why you should switch to reactive restful web service "
JamesAug 12, 2019

rabbitmq does not adhere to sequence of queue
Bruno OliveiraOct 11, 2019
Hi Matteo! Very nice article we have here, thank you for it.

I'm starting to study strategies for microservices and async based communications and I have some questions that, maybe, you can help me with.
- The client should connect directly to the broker and send/receive messages? Is this a good or bad practice? What are your thoughts on security regarding the connection being "available" for the client (on the browser)?
- If I decide to make an API that I call and the API sends the message to the broker, how should the client receive the message? Does the API have to hang the request while consuming the broker until receives a message that IS the waited response?
I'm kinda stuck into how this flux should be. If you have any resources that I could read/watch on this subject (some github repos, maybe), I believe they can be very helpful too.
- Matteo JoliveauOct 11, 2019
  Hi Bruno, thank you for reading it!
  
  The client COULD connect directly to the broker and use an async API to consume data from the backend, for example by using a protocol such as MQTT (which RabbitMQ supports) that is designed for transmission over unstable networks. But I generally do not proceed this way, and I instead implement a "gateway" service (for example exposing a GraphQL o REST API) that will translate the requests to async messages.
  
  Yes, the API has to wait for a response to come back on the reply queue. While this sounds bad and potentially slow, I assure you it is not (normally). Actually, with the correct setup I often see this approach being faster than regular HTTP calls between microservices, being AMQP capable of transmitting binary payloads (compared to text for HTTP/1.1) and the generally lower overhead introduced by AMQP compared to HTTP.
  From the client point of view, it's just an API call and it will return with a response in acceptable time. Nonetheless, whenever you have to wait for a reply you should implement timeouts so that you don't hang indefinitely. Maybe the calling service has gone away or has crashed, the response may never arrive. So as for HTTP, request timeouts are very important.
Gergely MészárosMay 21, 2020
Maybe I misunderstood what you want to say, but I can't see the inherent superiority. Here you seems to be talking about different architectural designs and all the shortcomings/profits are direct result of the architecture not the protocol used. Message queue broker is not really async (at least at protocol level) since the messages must be ack-ed. The message broker will become your single point of failure. So what did we achieve? We transferred the whole problem to some (arguably more regulated) dedicated service.

I think the real difference between the two is that HTTP (originally) was not designed to handle "push" notification therefore only one side could initiate the information xchange. Using HTTP/2 push or websockets we could easily implement full blown messaging over http (and still use REST).

It seems you implicitly suppose the service must immediately execute the whole business logic in response of REST request. That's not true. REST is about state transfer, not about business logic. If it's a HTTP REST (most common) we even have standard response code for delayed execution: HTTP 202.
It's completely valid to send POST: BeginMyTask: uuid and getting HTTP 202 "Task started." and later getting some notification of the finish. Just like you would do in case of message broker.

We may have bad architecture and good architecture. It might be harder to create bad architecture with message broker but it's still possible. However if I try to solve everything with message brokers, the threat of over-engineering is high and I may get inferior design.

For example if I have 5 services communicating independently over REST calls I have a very resilent network without single point of failure. If one service goes down, some feature will be inaccessible but the system will run happily. If I have a message broker and the message broker goes down... end of story.

Neither method is superior, everything depends on what would you like to achieve. What do you think?
Dorian NetoDec 25, 2020
How the producer know that a specific or all messages were processed by consumer? I mean, how it works the consumer's response?

Add comment

Matteo Joliveau @matteojoliveau