Tuning pre-fetch for load balancing in Python
I have built a system comprised of a number of workers, N, (or application processing engines) fed by a single non exclusive queue. Each job on the queue either takes between 20 second and a minute to complete or under 0.1 seconds (because the job on the queue requires no work).
Ideally if there are 3 workers (so N = 3) and 3 jobs are on the queue each worker would get a job each. This wasn't happening by default - the default MAX_DELIVERED_UNACKNOWLEDGED_MESGS_PER_FLOW of 10000 ensured that all the work was going to worker 1 and workers 2 and 3 were left twiddling their thumbs.
Configuring "Pre-Fetch" for Optimized Load-Balancing - Solace addresses this quite well.
Setting MAX_DELIVERED_UNACKNOWLEDGED_MSGS_PER_FLOW to 1 (which has to be set at the queue level as the Python API can't do this) seemed to fix this - work was now being correctly distributed.
However it appears that after each job the worker pauses for about a second before a new message is picked from the queue and processed. Ie we have 1 second of dead time. This means that if we have a queue comprising of say 10 of the small 0.1s jobs and only 1 worker rather than take 1 second we actually take 10 seconds. And this isn't worker compute time, it appears to be Solace fetch time- time to fetch the next message from the queue.
Is this because whilst I have set MAX_DELIVERED_UNACKNOWLEDGED_MSGS_PER_FLOW to 1 but not yet set FLOW_WINDOW_SIZE to 1 as well?
The system is built in Python, so uses the Python API.