Apache Kafka® - Hot and Cold Retries
A demo project for elaborating how hot and cold retries can be applied in Apache Kafka®. This repository will be the source and reference codebase of my blog post.
It includes:
- Application configuration and configuration class to read it
- Consumer factory to create necessary listener container beans and dead letter queue forwarder
- Simple service layer which logs the upcoming messages and to create the erroneous cases
- Topic forwarding strategies and header utilization examples in those strategies
- Consumer examples for hot and cold retries
Utilities:
- Also, you can find a simple Kafka cluster to create your own playground before running the application
How to run
The repository comes with a simple Apache Kafka® cluster utilized by docker compose
. As a preliminary condition, you may need to run it if there is no connectable Kafka cluster available. Otherwise, you can configure host and port information from application.yml
$ cd kafka-hot-and-cold-retries
$ docker compose up # To make Kafka cluster up and running
$ gradle bootRun # To make the application up and running
Flow
As depicted in the below figure, our journey starts with our producer. It publishes the message on the target topic, and the consumer starts processing. There will not be any problem while consuming in the happy path, and the consumer will be able to process the message.
But, if the consumer encounters an exception, it will first check whether any hot retry is defined or not. If so, we will try the message consumption up until maximum hot retry attempt or it succeeds in consuming the message. If it fails and reaches the maximum threshold, then it checks whether the cold retry configuration is defined or not. In the worst-case scenario where we do not have any cold retry configured, the message goes to the dead letter queue (DLQ). Otherwise, we will forward the message to the cold retry topic, and make it be processed with some delay. That is why these kinds of topics are called delayed topics.
Note that the delayed topics do not differ from the normal topics in terms of topology but in terminology. You will see that they could also have a hot retry defined in them or even another cold retry topic chained next to another delay topic.
Now, we have forwarded the message to our cold retry topic. One of the main aims here is to chain delay topics with themselves or other ones without blocking the other messages in the same partition. To do that, we will implement the topic forwarding logic for cold retries. Hence, a failure on a cold retry topic will lead to publishing the same message into the same topic again and skipping the current offset. Also, to have a retry threshold, we put custom headers on the topics and check the attempt value in the next consumption. By doing so, the current offset’s message will not block the next offset, and the current failed message will get its new offset at the end of the same partition after incrementing the attempt count (Yay!). Still, it is possible that we can reach the maximum attempt count limit on the cold retry. As a result, this will cause either publishing the message to the next cold retry topic if defined or the dead letter queue as a sink point.
As you can see from the overall flow, the consumers will handle each forwarded message as a fresh message and apply the hot and cold retrials on them again and again in case of failure.