This question follows on from this discussion :
The objective here is to…
- publish guaranteed messages, and
- be certain that the broker receives them in the order in which they were generated so that re-ordering does not have to be done on the subscriber side (for reasons summarised in the linked thread), and
- Avoid introducing logic that may unnecessarily cause duplicate publication (ie, try to achieve exactly-once delivery insofar as it is actually possible to do that).
To that end I have decided to use a transaction as a way of sending and then committing batches of messages, with retry performed at the level of the whole batch. In simplified pseudocode…
while (messages to publish and batch is not full) add message to batch
do
foreach (message in batch) Send()
Commit()
while (commit failed)
The solution makes use of a last value queue and queue browser to handle edge-case commit failures, and a ring-buffer of pre-allocated messages, which accounts for the bulk of the code not directly related to the above pseudocode.
The .csproj can be found on github .
What I have noticed is that the Commit()
call takes quite a long time - between 100 and 130 milliseconds. Here’s the send-and-commit block .
The overall throughput is heavily dependent on the duration of the Commit()
operation, since I have to block on that call before starting the next batch (otherwise the retransmission of the batch would put messages out of order in the overall sequence).
Using this technique, one would expect the maximum message rate per second to be roughly: commit_batch_size * 1000 / commit_wait_in_ms
which is 1600 messages per second if batch size is 200 and the commit blocks for ~125 milliseconds. And that is indeed what I am seeing.
I have a separate project which I am using to measure the “ping time” to the broker (the time to send one message and get an ack or nack), which for me averages around 20 ms. So the Commit seems to have about 100ms of overhead not accounted for by pure latency.
Just to add a bit more to the complexity of this question… ?
I don’t know whether this additional delay is happening due to actual commit processing time, or perhaps some kind of response buffering? I have a separate project on github which is my “ping” publisher, and in that project I provide the ability to set the maximum number of messages that are waiting for an ack/nack via the maxUnacked
parameter, as well as setting the value for the
.
I have noticed that in the ping publisher - which is not using transactions but rather just doing send/ack, with configurable concurrency - there are certain “threshholds” that significantly decrease throughput. For example, calling Ping(count: 1000, maxUnacked: 1, publishWindowSize: 5);
results in ~85 messages per second (completely reasonable since every send is effectively serialised if maxUnacked
is set to 1), but calling Ping(count: 1000, maxUnacked: 1, publishWindowSize: 6);
results in almost exactly one message per second. I expect this is due to a 1-second-maximum window release timer (per https://solace.com/blog/understanding-guaranteed-message-publish-window-sizes-and-acknowledgement/ ). Worth noting that the standard deviation is also enormous - the last few messages take a lot longer than the rest - again, presumably due to the solace router waiting for a timer to elapse before sending a window that is not yet complete (or more than 1/3rd complete), which (for the last handful of messages) will depend of the result of total_message_count % window_size
.
Could it be that the Commit()
response back to the solace library is being buffered in a similar way to acks/nacks?