Potential missing message during failover in a HA group
Hi,
In a failover test that I'm doing while both producer and consumer are participating in a guaranteed message delivery, I am seeing below exception in the Producer JCSMPStreamingPublishEventHandler.handleError()
callback.
Exception message:Transport exception occurred when message Id not available. JCSMPTransportException Error receiving data from underlying connection.
The producer.send()
call returns successfully but event handleError()
call back prints this message. Does it mean message is lost before getting spooled? Could someone please clarify this?
My assumption is, producer.send()
is a blocking call (i'm using the standard client - not non-blocking approach), therefore, call returns when the message is successfully spooled.
Thanks,
Raghu
Best Answer
-
How many messages did you publish? How many messages ended up on the queue? That's an easy way to check for loss..?
Can you share all your code? JCSMP persistent/Guaranteed publishing is non-blocking (to the broker). The
send()
call returns when the message is written to the socket. The confirmation of spooling on the broker comes asynchronously via the JCSMPStreamingPublishEventHandler.If only doing failover testing, the API should have long enough reconnection parameters to allow the API to reconnect without throwing a Transport Exception. So that looks a bit strange to me. The API should block until the reconnection happens and then allow the message to be published on the backup. I'm also wondering why the Message ID is not available? Maybe this error is not related to a particular message, and just about the connection..?
And just a small point RE: terminology... the message is not "lost" if the broker returns a NACK (i.e. handleError())... the message is still inside the publisher application memory. The message can only be lost if the broker successfully acknowledges the message, and then loses it somehow. The point of Guarantee is only once you get the NACK... not the
send()
call returning.Anyhow, share your code if you can for more help. Thanks!
5
Answers
-
How many messages did you publish? How many messages ended up on the queue? That's an easy way to check for loss..?
Can you share all your code? JCSMP persistent/Guaranteed publishing is non-blocking (to the broker). The
send()
call returns when the message is written to the socket. The confirmation of spooling on the broker comes asynchronously via the JCSMPStreamingPublishEventHandler.If only doing failover testing, the API should have long enough reconnection parameters to allow the API to reconnect without throwing a Transport Exception. So that looks a bit strange to me. The API should block until the reconnection happens and then allow the message to be published on the backup. I'm also wondering why the Message ID is not available? Maybe this error is not related to a particular message, and just about the connection..?
And just a small point RE: terminology... the message is not "lost" if the broker returns a NACK (i.e. handleError())... the message is still inside the publisher application memory. The message can only be lost if the broker successfully acknowledges the message, and then loses it somehow. The point of Guarantee is only once you get the NACK... not the
send()
call returning.Anyhow, share your code if you can for more help. Thanks!
5 -
@rdesoju , Further to what @aaron has said, you've said you're using non-blocking send. That means errors (for instance a NACK, which is the broker telling you it couldn't accept the message) are returned asynchronously, as you've seen through the event handler. More to the point you can't assume the message has been sent until you've processed the acknowledgement, which is where responseReceived comes in - here you can mark the message as successfully sent. Until this has happened, you have to keep your data around in case you get an error. See the persistence tutorial for more details.
So if you do get an error, it's up to you to decide what to do. Re-send? Stop execution? It's a business decision. But the message is only lost if you haven't kept it until you've been told it's been accepted by the broker.
0 -
@Aaron and @TomF ,
Thank you both for your valuable insights into the observation. I have purposefully disabled default retry mechanism to observe the internal behavior of the broker during the failover activity. So, I have below parameters in my sender code:
reconnectRetries=0
connectRetries=0Also, closing down the producer and session objects on exception and recreating them repeatedly on exception. Therefore, my own retry mechanism instead of default one. Unfortunately, I am unable to share the code.
I understood that both in case of confirmed delivery and normal scenarios (w/ Guaranteed messaging) it is not safe to assume that broker has persisted the message unless successful event callback comes back. Is my understanding correct?
Also, Does the publisher circumvent this situation when I use
default
retry mechanism provided byJCSMPSession
?Please clarify.
Regards,
Raghu0