Potential missing message during failover in a HA group

rdesoju
rdesoju Member Posts: 66
edited February 2022 in PubSub+ Event Broker #1

Hi,
In a failover test that I'm doing while both producer and consumer are participating in a guaranteed message delivery, I am seeing below exception in the Producer JCSMPStreamingPublishEventHandler.handleError() callback.

Exception message:
Transport exception occurred when message Id not available. JCSMPTransportException Error receiving data from underlying connection.

The producer.send() call returns successfully but event handleError() call back prints this message. Does it mean message is lost before getting spooled? Could someone please clarify this?
My assumption is, producer.send() is a blocking call (i'm using the standard client - not non-blocking approach), therefore, call returns when the message is successfully spooled.

Thanks,
Raghu

Tagged:

Best Answer

Answers

  • TomF
    TomF Member, Employee Posts: 406 Solace Employee

    @rdesoju , Further to what @aaron has said, you've said you're using non-blocking send. That means errors (for instance a NACK, which is the broker telling you it couldn't accept the message) are returned asynchronously, as you've seen through the event handler. More to the point you can't assume the message has been sent until you've processed the acknowledgement, which is where responseReceived comes in - here you can mark the message as successfully sent. Until this has happened, you have to keep your data around in case you get an error. See the persistence tutorial for more details.

    So if you do get an error, it's up to you to decide what to do. Re-send? Stop execution? It's a business decision. But the message is only lost if you haven't kept it until you've been told it's been accepted by the broker.

  • rdesoju
    rdesoju Member Posts: 66

    @Aaron and @TomF ,
    Thank you both for your valuable insights into the observation. I have purposefully disabled default retry mechanism to observe the internal behavior of the broker during the failover activity. So, I have below parameters in my sender code:
    reconnectRetries=0
    connectRetries=0

    Also, closing down the producer and session objects on exception and recreating them repeatedly on exception. Therefore, my own retry mechanism instead of default one. Unfortunately, I am unable to share the code.

    I understood that both in case of confirmed delivery and normal scenarios (w/ Guaranteed messaging) it is not safe to assume that broker has persisted the message unless successful event callback comes back. Is my understanding correct?

    Also, Does the publisher circumvent this situation when I use default retry mechanism provided by JCSMPSession?

    Please clarify.

    Regards,
    Raghu