Partitioned Queues and XA transactions

vn01
vn01 Member Posts: 17 ✭✭✭
edited August 2024 in PubSub+ Event Broker #1

We noticed that we can not use XA transactions on a partitioned queues. Our primary use case is to process the messages in the order, and we wanted to use the "USER_ID" as the partition key and that would have taken care of our requirement of processing messages in order per user.

With this restriction of XA not supported on Partitioned Queues, is sticky load balancing as described here "https://solace.com/blog/consumer-groups-consumer-scaling-solace/" our only alternative to use XA transactions and set up a order processing of messages with parallel processing of the load

Answers

  • Tamimi
    Tamimi Member, Administrator, Employee Posts: 543 admin

    Hey @vn01 - you can check out the solace documentation on partition queues here: https://docs.solace.com/Messaging/Guaranteed-Msg/Queues.htm#pq-feature-interactions:~:text=XA%20transactions%20are%20not%20supported%20for%20either%20publish%20or%20subscribe%3A

    And in particular the Partitioned Queue Feature Interactions section on Transactions:

    Transactions

    Local transactions are supported for both publish and subscribe, with the caveat that the commit of a local publish transaction fails if the selected partition is deleted between the time the message is received and the time the transaction is committed (due to a configuration change in the interim). In this case, the transaction can simply be retried.

    XA transactions are not supported for either publish or subscribe

  • vn01
    vn01 Member Posts: 17 ✭✭✭

    @Tamimi Any road map to support XA transaction for partitioned queues.

  • Aaron
    Aaron Member, Administrator, Moderator, Employee Posts: 662 admin

    Hi @vn01, I wrote that blog you mentioned above. I don't believe this is on the roadmap for anytime soon. Can you explain your use case more? Why the need for XA transaction..? Definitely won't be as high-performance as an "eventually consistent" architecture approach. And/or, why do you need partitioned queues? A single exclusive queue (which guarantees order across any/all USER_IDs) can handle a LOT of traffic.

  • vn01
    vn01 Member Posts: 17 ✭✭✭

    Hi @Aaron We want to move away from a transactional outbox (https://microservices.io/patterns/data/transactional-outbox.html) to a traditional 2PC. Eventual Consistency is still our architectural approach. We want to move away from the CDC/Poller components in DB.outbox → CDC/OutboxPoller → MessageBroker. We are evaluating solace and it has most of the the features we need, including the flexible topic to queue routing/mapping and support for DMQ.

    Our transaction involves [Solace-Queue + DB + Solace-Topic/Queue], and this would need a an XA transaction which is working fine for non-partitioned (partition count zero) and exclusive queue.

    One of our key requirements is to partition our processing in FIFO order of the events per USER_ID.

    With exclusive queue, only single instance of our group of consumers will be utilized (other instances will be idle and will be like a stand by). Is there a way to use Exclusive Queues and have multiple of our consumer instances process ordered messages partitioned by key (USER_ID in our case) ?

  • vn01
    vn01 Member Posts: 17 ✭✭✭

    @Aaron do you have any further input here. As I mentioned most of the XA support is working well in the POC we did, only question that we are trying to answer is how do we setup a partitioned processing for services where we want to process events ordered by USER_ID, one option is the Exclusive Queue, which under utilizes multiple instances we have for the services to handle load. Other option is the sticky load balancing, but its quite a custom setup to build and manage. Ideally if partitioned queues supported XA transactions that would have been the best option.

  • Aaron
    Aaron Member, Administrator, Moderator, Employee Posts: 662 admin
    edited August 2024 #7

    Hey @vn01 … I've asked internally, but I'm pretty sure XA support for PQs is not on the near-term roadmap. So let's brainstorm a bit other possible patterns.

    You want XA because you'd like to have the DB commit, the outgoing message publish, and the ACK of the received message all handled as one "blob". Or possibly multiple messages? (batch) If multiple, how many?

    PQs do support regular Session transactions, so at least you could have some of it as a single operation.

    So: PQ with multiple consumers, key = USER_ID. A single consumer reads one (or a bunch?) of messages off the queue… then posts them into a DB. When the commit from the DB is successful, the app publishes a single (multiple?) notification messages on another topic (probably Guaranteed I'm guessing), and then ACK the messages back on the PQ. So the two Solace-related parts could be in a Session transaction, so you don't have to have additional logic that waits for the publish ACK before ACKing the PQ message(s).

    So the issue is: what happens if #1: the DB commit is successful, but then either the Solace notification publish fails (due to queue full maybe?) and therefore the Session transaction would fail… OR #2: if the consumer app crashes after the DB commit and the notification doesn't go out / PQ messages aren't ACKed. Right? Any other failure cases..?

    For #1, if the app is still up and doesn't lose state, it could just continue to try to resend the notification messages for some period of time. At least b/c of the PQ key, we know that further messages about this USER_ID won't be processed until this batch is sorted out, so we won't have any "gaps" in our notification messages, just a delay. If the app decides to eventually give up, then it would have to rollback/mark cancelled somehow the previous DB update. But again, because of PQ, no other apps should be dealing with this specific USER_ID. NOTE that the "max handoff time" on the PQ configuration is very important here… it must be at least the amount of time that the app will wait / try to deal with this message. This is because when consumer scaling up, a new consumer gets added to the PQ, the partition that this message came from could get yanked away and given to the new guy… and the original (blocking/waiting) app doesn't know… so we don't want the new consumer to get this message while the first app is still trying to deal with it.

    For #2, if the app crashes midway through, then the partition will eventually move over to a new consumer who will receive the same messages that weren't ACKed. The app can checked "redelivered flag" on each message to give a hint that this message might have been processed before. Obviously, the app would/should check the DB to make sure this message hadn't already been inserted in the DB. If not: carry on as normal; if so: attempt to send the notification message and ACK the received PQ message.

    So yeah… I think it could be done… just with some extra checks by the PQ consumer to make sure this message hadn't been inserted into the DB already. Might be a bit slower, but probably faster than an XA transaction / 2PC. Hopefully the messages received off the PQ, whoever is publishing them inserts some UUID or something (application message ID?) that the PQ consumers/DB inserter apps can use to verify this message has/hasn't been processed already.

  • vn01
    vn01 Member Posts: 17 ✭✭✭
    edited August 2024 #8

    @Aaron : Thanks for the write-up. This is similar to how we have our event processing today. So basically if we do the same with Solace - we will have 2 separate transactions

    1. solace.begin-tx (TX1)
    2. Receive the Message
    3. Insert/Update DB
    4. Put event to soalce.topic
    5. soalce-xa.prepare-tx (soalce.topic) (TX2)
    6. db-xa.prepare-tx (db) (TX2)
    7. db-xa.commit-tx (db) (TX2)
    8. soalce-xa.comit-tx (soalce.topic) (TX2)
    9. Delete the Message (ACK)
    10. solace.commit-tx (TX1)

    As you mentioned , we have to handle cases where TX1 fails, message is re-derivered