Some useful Replay patterns

Options
himanshu
himanshu Member, Employee Posts: 67 Solace Employee
edited July 2021 in Blogs & Tutorials #1

Hi all,

As some of you might know already, I am a Solutions Architect at Solace working with our clients and helping them understand and design features based on Solace's PubSub+ Platform. In the last few years, we have released several new features which our clients have been keen on taking advantage of for their custom usecases. One of the most popular new features is Replay.

Message Replay, as the name suggests, allows you to replay messages hours or even days after those messages were first received by the event broker. Without Replay, once a particular message has been consumed by all the consumers, it would be deleted from the underlying message spool. However, with Replay enabled, the message is stored in replay log and the underlying spool even after all the consumers have consumed the message.

Replay functionality helps protect applications by giving them the ability to correct database issues they might have arising from events such as misconfiguration, application crashes, or corruption, by redelivering data to the applications. You can read more about these usecases here.

Currently, Message Replay can be initiated on a queue in the following ways:
1. Start of the Replay Log
2. From a specific date/time to now
3. After a specific message (specified by replication group message id (RGMI) ) to now

What happens when Replay is initiated?

When you initiate Replay on a queue, any existing enqueued messages are removed and older messages are replayed to that queue. Once all the older messages have been replayed and consumed, the consumer automatically starts consuming the live stream of messages. This relieves the consumer from the burden of monitoring the older messages, disconnecting, reconnecting, and then consuming the live stream of messages. With Solace's Message Replay feature, all of this is automated!

You might ask, why do the existing enqueued messages get deleted when Replay is initiated on a queue? That is done to preserve message order. There are many usecases for event-driven architecture. Some of them require message order while others do not. Solace takes pride in the fact that we guarantee message ordering so naturally, we made sure that Message Replay was no exception.

The three ways of initiating Replay currently on an endpoint work for many usecases but not all. As you can tell, they are all from a specific time or checkpoint till now which allow you to maintain message order. Sometimes, especially with our enterprise users, you need to Replay only specific messages to a queue, inspect them, and then replay them to end consumers. This obviously breaks message order but is an integral part of operational workflow at large enterprises.

So, how can we achieve this? I have broken down this usecase into 3 key requirements:
1. Ability to replay specific messages
2. Ability to replay a range of messages
3. Ability to browse the messages before replaying to end consumers

Let's dive deeper!

Ability to Replay specific messages

If this is a key requirement for your usecase, we recommend putting some sort of id in the topic hierarchy. Why? Because, Replay is initiated on an endpoint with topic subscriptions. For example, you can initiate Replay on a queue with specific topics mapped to it. If you want to replay all messages from 9am for AAPL's stock, you simply create a queue, map marketdata/eq/us/nyse/aapl topic to it and initiate replay. It's as simple as that.

What level of granularity you get to select which messages are replayed is dependent on your topic hierarchy. If I wanted to replay messages for NYSE, I would simply map the topic marketdata/eq/us/nyse/> to my queue. If it was for multiple specific stocks, I would map topics such as marketdata/eq/us/nyse/ibm and marketdata/eq/us/nasdaq/fb.

So, if I would like to replay specific messages, I need to be able to identify those specific messages via topics. One way to do that is by having an id in the topic. For example, I might have trade order data being published to topic: tradedata/us/eq/<venue>/<order_id>. So, if I wanted to replay specific messages such as ID 5, 10, and 14 (you expected me to say 15, didn't you? :smile: ), I will simply map the following topics to my queue: tradedata/us/eq/nyse/5, tradedata/us/eq/nyse/10, and tradedata/us/eq/nyse/14. Then, I will initiate replay from a certain time before message 5 was published to make sure it is covered.

And that's it! It all comes down to your usecase and designing your topic taxonomy to take full advantage of it.

Ability to Replay a range of messages

Let's assume we have a consumer which has consumed some messages and would like to go back and replay some of them. The consumer is keeping track of the messages it has consumed and their Replication Group Message ID (RGMI) which was introduced in a recent SolOS release. Let's say our consumer has consumed messages till RGMI 10 and would like to replay messages from RGMI 2 to 5 (i.e. messages 2, 3, 4, and 5) due to a crash in downstream system or database error.

To be able to replay this range of messages, here are the steps your consumer would need to follow:
1. Initiate Replay via the Replay after RGMI option and specific RGMI 1 (so it can replay messages after RGMI 1 which is message 2 and onwards).
2. This will clear everything in the queue and start replaying older messages.
3. As your consumer is consuming these messages from the queue, it will be able to keep track of the messages using their RGMI.
4. With the out-of-box method, your consumer will continue consuming any new messages that came after RGMI 5. But since we don't want that, you can just unbind from the queue and stop the flow of messages after RGMI 5.
5. When your consumer is ready to consume new messages and join the live stream, it will kick off another replay after RGMI 10. This will lead it to get all the messages that were received after RGMI 10. Eventually, your consumer will join the live stream.

Ability to browse the messages before replaying to end consumers

Currently, it is not feasible to browse the Replay log, decide which messages you want to replay and then replay those messages. But it is a popular usecase at many large enterprises. So, here is what you can do, if you need to browse the messages before replaying them:
1. Create an intermediary queue such as REPLAY_BROWSE_QUEUE
2. Add the relevant topics you would like to replay messages for
3. Initiate replay and use our APIs to browse the queue.
4. You can use selectors when browsing the queue or a more efficient manner of filtering would be to map the appropriate topics in step 2 above so you require minimum filtering.
5. Once you have browsed and inspected the messages, you can consume or delete the messages from the queue.

That's it for today! There is a lot of interest around Solace's Message Replay feature and if you have a usecase which you would like to discuss how to implement, reach out to us and we will be happy to help!

Tagged:

Comments

  • hong
    hong Guest Posts: 480 ✭✭✭✭✭
    Options

    Thank you for sharing this, @himanshu ! Nice summary!

  • marc
    marc Member, Administrator, Moderator, Employee Posts: 920 admin
    Options

    Awesome, thanks @himanshu. Great replay examples!

    I'm going to shamelessly also drop a link for my post about Replay from Message ID in case anyone comes across this thread and would like to see an example of how to do that.