Pattern for handling RejectedMessageError events with guaranteed messages?

Thanks for providing such an extensive opinion on this. I will leave the question unanswered for now in the hope of attracting more opinions… or perhaps it should be turned into a discussion instead?

A “retry for a while with back-off” solution does seem to be a good first step no matter what. Part of your answer speaks to a different question I have on the forum about wrapping the send and callback mechanics into an async method by setting a TaskCompletionSource as the CorrelationKey on the outgoing message header. Doing this does serialize the sends, but it does so in a non-thread-blocking way. That eliminates the ordering problem without tying up a client thread, since you can respond to a nack’d message before sending any further messages, but as you described it’s not great for throughput.

I am personally using System.Threading.Tasks.Dataflow, so there’s a “natural” high performance solution, which is to have the n/acks come back totally independently of the sends (ie, no async wrapper around send-ack), and merge them n/acks into the dataflow pipeline using a JoinBlock. Given your confirmation in my other question that the n/acks always come back in order, that’s an ideal use of the JoinBlock.

Of course, that reintroduces the reordering problem if retransmission of nack’d messages is attempted. Hrm. I suppose this is an inherent problem when trying to combine high throughput with guaranteed delivery.
In my particular case, it is probably going to be easiest to err on the side of simplicity. My volume is not high on any one publication, but there might be many publications going out through one session (think many tables changing in one source database). So in my particular case, tying up threads is much more costly than blocking of messages on a single publication. Therefore I can probably get away with serialised async send-ack wrappers, add retry-with-back-off in case of RejectedMessageError events, and block further sends on that topic until the retry succeeds or terminate the publication pipeline with an alert if too many retries fail. I’m thinking that’s how I will write the code initially at least.

With this solution I would still have an ADWindowSize greater than 1, to allow multiple publishers to send “at the same time”, so to speak. Waiting (asynchronously) for acknowledgement before sending the next message (or retrying the current one) would be happening per-publisher, but not across the entire session.

But if performance needs to be bumped, I would move to the JoinBlock solution, and then be forced to deal with the ordering issue. I guess I’m kicking that can down the road for now, because it’s too damn hard to solve :smiley:

(Aside: It looks like editing an already-edited post on the forum to be lost right now)