On Fri, Aug 29, 2025 at 11:52 AM Tomas Vondra <tomas@vondra.me> wrote:
> True. But one worker did show up in top, using a fair amount of CPU, so
> why wouldn't the others (if they process the same stream)?
It deliberately concentrates wakeups into the lowest numbered workers
that are marked idle in a bitmap.
* higher numbered workers snooze and eventually time out (with the
patches for 19 that make the pool size dynamic)
* busy workers have a better chance of staying on CPU between one job
and the next
* minimised duplication of various caches and descriptors
Every other wakeup routing strategy I've tried so far performed worse
in both avg(latency) and stddev(latency).
I have wondered if we might want to consider per-NUMA-node IO worker
pools with their own submission queues. Not investigated, but I
suppose it might possibly help with the submission queue lock, cache
line ping pong for buffer headers that the worker touches on
completion, and inter-process interrupts. I don't know where to draw
the line with a potential optimisations to IO worker mode that would
realistically only help on Linux today, when the main performance plan
for Linux is io_uring.