Sequence Convoy Pattern with Solace

amrosalah
amrosalah Member Posts: 3
edited March 14 in PubSub+ Event Broker #1
Hi Team,
In my landscape, 20+ applications are generating events that are consumed by about 50 applications. Each consumer needs to receive all the messages from the 20+ in the same order they are generated per customerID. Meaning, for customer/123 all events from 20+ should be convoyed in sequence for all consumers. Each consumer get only a subset of each event.

I am Plan to use Integration platform to get the data from the producer, publish it to Solace topic to be consumed by multiple consumers. I am trying to simplify the architecture and not to add a lot of more components for complex transformation or data filtering or managing sequence. I was wondering if Solace can provide a mechanism within a topic where consumers can select only messages based on multiple attributes e.g. messageType, producerName, etc. but also make sure messages are grouped by customerID and processed sequentially and if anything went wrong for one event, it blocks the rest of events for this customer.

Best Answer

  • allmhhuran
    allmhhuran Member Posts: 44 ✭✭✭
    edited March 14 #2 Answer ✓

    Can you clarify your requirement slightly? I am not sure which interpretation is right. I will describe a couple of interpretations using an example scenario with 2 producer applications rather than 20 for simplicity, and I will use T=? to represent some point in time, where T=1 happens before T=2, etc.

    In application 1:
    T=1, event A11 occurs on customer 123 T=4, event A12 occurs on customer 123 In application 2:
    T=2, event A21 occurs on customer 123 T=3, event A22 occurs on customer 123

    Interpretation 1:
    Subscriber must receive messages in the order A11, A21, A22, A12

    Interpretation 2:
    Subscriber must receive A11 before A12,
    and it must receive A21 before A22,
    but not necessarily with a total order of A11, A21, A22, A12

    If the requirement matches interpretation 1, then the desired result cannot be achieved unless the events in application 1 and application 2 are causally connected (in which case you can use vector clocks).

Answers

  • allmhhuran
    allmhhuran Member Posts: 44 ✭✭✭
    edited March 14 #3 Answer ✓

    Can you clarify your requirement slightly? I am not sure which interpretation is right. I will describe a couple of interpretations using an example scenario with 2 producer applications rather than 20 for simplicity, and I will use T=? to represent some point in time, where T=1 happens before T=2, etc.

    In application 1:
    T=1, event A11 occurs on customer 123 T=4, event A12 occurs on customer 123 In application 2:
    T=2, event A21 occurs on customer 123 T=3, event A22 occurs on customer 123

    Interpretation 1:
    Subscriber must receive messages in the order A11, A21, A22, A12

    Interpretation 2:
    Subscriber must receive A11 before A12,
    and it must receive A21 before A22,
    but not necessarily with a total order of A11, A21, A22, A12

    If the requirement matches interpretation 1, then the desired result cannot be achieved unless the events in application 1 and application 2 are causally connected (in which case you can use vector clocks).

  • marc
    marc Member, Administrator, Moderator, Employee Posts: 954 admin

    Hi @amrosalah & @allmhhuran,

    allmhhuran is correct that you will need to decide what scenario you require. The Solace brokers will maintain order based on when the broker receives the messages, but if you can't trust that the messages will be published in the right order and you need to look into the message to do resequencing then that will need to be done outside of the broker. But if messages are published to the broker in the order they should be processed by the consumers then you should be able to use a combination of a well defined topic hierarchy + maybe partitioned queues (if needed) to ensure your consumers can process the messages in order.

    Hope that helps!

  • amrosalah
    amrosalah Member Posts: 3
    edited March 14 #5
    Thanks a lot @allmhhuran and @marc - in fact it's Interpretation 1. Hence, I need to merge all events into one topic. Generated events are guaranteed in sequence.

    Following questions
    1) Is Sequence supported on Topic as well as Queues ? or do I need to put a queue in front of topic.
    2) Say i have my topic /xyz/customers/customerID - also I will let my client subscriber to /xyz/customers/* [all topics one per customer], does it mean the sequence will be automatically maintained on for example /xyz/customers/123 separate from /xyz/customers/456 ? (meaning client will listen and process multiple topics in parallel (for all customers), but maintain sequence within each separately.

    3) Say one event has failed to get processed and I send NACK to the topic /xyz/customers/123, What happens then ? (Desired, stop processing this topic until the message is pushed) - also with manual interventions, to fix or push this event, how best to resume this topic ?

    I am new to Solace and appreciate your guidance on the best practices that simplify the design.

    Regards,
    Amr
  • allmhhuran
    allmhhuran Member Posts: 44 ✭✭✭
    edited March 15 #6

    The difficulty with interpretation 1 is one that is created by physical reality, not any particular technology (solace or otherwise), so we need to be clear about what is being claimed here.

    Suppose app1 and app2 are independent processes running on different hosts.

    Both app1 and app2 can generate events.

    These events must be transmitted to app3, which is another independent system.

    The transmission time is necessarily non-zero and variable. This is a fact of reality imposed by the universe, not any particular technology.

    OK, suppose app1 produces events E1 and E2 (in that order), and app2 produces events E3 and E4 (in that order).

    We can know for sure that E1 happened before E2, and we can know for sure that E3 happened before E4, because the order of each pair is defined within the scope of a single application, and it is that application which defines the order.

    But how do you know for sure that, say, E1 happened before E3?

    If the systems are producing events completely independently then it is impossible to know this for sure. You can't just rely on, say, the system time for the hosts of app1 and app2, and compare something like a date time stamp on the event, because these host clocks could be slightly out of sync.

    You also can't assume that "if E1 arrives at app3 before E3 arrives at app3, then E1 happened before E3", because it could be that E3 happened before E1, but took longer to transmit to app3.

    The only way to have a "known order" across the systems is if the events are causally connected. Suppose, for example, that app1 produces event E1, which is transmitted to app2, and then as a result of receiving E1, app2 produces E3. Now there is a way to order these events - we can use the causal order. E1 is definitely causally antecedent to E3.

    Without that causal connection, the relative order of events from different systems is undefined. So… are your events causally connected?

  • amrosalah
    amrosalah Member Posts: 3
    edited March 15 #7

    @allmhhuran 100% agree! Not they are not casually connected. However, let me give an example

    Say we have multiple stores and each store is at least 60 minutes away from each other. Say the events are generated by each store per customer. It's impossible that customer would be located physically in two stores and generating events such as (register customer, purchase item, etc).

    Given that, say

    app 1 (Store 1) generated E1 and E2 per Customer ID

    app 2 (Store 2) generated E3 and E4 per Same Customer ID

    My Listener (Subscriber) will run on a scheduler based say every 5 minutes, pull the messages grouped by Customer ID

    Listener #1/Scheduler #1:

    1. Run #1: Pull message E1 and E2 and assign timestamp then send them to Topic (T1 - for customer 123) → /stores/events/customers/123
    2. Run #2: could be irrelevant to this customer

    Listener #2/Scheduler #2:

    1. Run #1: Pull message E3 and E4 and assign timestamp then send them to Topic (T1 - same customer 123) → /stores/events/customers/123
    2. Run #2: could be irrelevant to this customer

    In this case, we guarantee that this topic has received in order. Even in case of scheduler delay with strong monitoring and management, we guarantee that Scheduler won't be that late

    However, those events are published and persisted by timestamp in case of failure to convoy them to the other side due to internet connectivity issue or latency or may be E1 and E2 had a race condition in the middle of the flow that let E2 win first. Then we have timestamp to control them again sequentially.

    Does this make sense?

  • marc
    marc Member, Administrator, Moderator, Employee Posts: 954 admin

    Hi @amrosalah,

    Thanks for the clarification (and thank you @allmhhuran for the excellent advice!)

    A few answers to the Solace specific questions:

    Following questions

    1) Is Sequence supported on Topic as well as Queues ? or do I need to put a queue in front of topic.

    It sounds like you want to have guaranteed delivery to your consumers so you will want your publishers to use Persistent messages and publish to topics. And then each of your consumers will want to have a queue which subscribers to the topics they need messages from.


    2) Say i have my topic /xyz/customers/customerID - also I will let my client subscriber to /xyz/customers/* [all topics one per customer], does it mean the sequence will be automatically maintained on for example /xyz/customers/123 separate from /xyz/customers/456 ? (meaning client will listen and process multiple topics in parallel (for all customers), but maintain sequence within each separately.

    The broker will ensure that messages stay in order based on the order they are received from the publisher. So if publisher A sends a message on /xyz/customers/123 and then /xyz/customers/456 they will stay in that order. This order is maintained if those messages are persisted in a queue. Note that if you think you will need to scale your consumers I would highly suggest you look at Partitioned Queues and have you publishers use the customer # in the Partition Key so you will be able to scale up and maintain order based on customer # if that is what you need to ensure order on. You can still use 1 consumer at first if desired but gives you the option to scale up in the future if needed.

    3) Say one event has failed to get processed and I send NACK to the topic /xyz/customers/123, What happens then ? (Desired, stop processing this topic until the message is pushed) - also with manual interventions, to fix or push this event, how best to resume this topic ?

    The Solace APIs recently introduced some more enhanced Nack capabilities that add flexibility based on what each consumer needs (Example with the Java API: https://solace.com/resources/videos/negative-acknowledgements-nacks-using-the-solace-messaging-api-for-java ). But yes, in general as long as you haven't hit the max redeliveries of a message (which by default is infinite) then message order will be maintained.

    Outside of those Solace specific details, you'll really want to consider your use case and requirements. Do you REALLY need order across publishers at different stores. If you do things get really tricky if stores can go offline for extended periods of time. You'd likely have to monitor for that and stop consumers until the store is back online…but that is not ideal b/c one store being offline is now affecting real-time processing of events happening from other stores which will likely negatively effect customer experience. Where possible you should try to embrace the concept of eventual consistency.