Re: Automatically sizing the IO worker pool - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Automatically sizing the IO worker pool
Date
Msg-id CA+hUKGK=vELXFXNj2L=vTkof6s_EQzTjYXXrUVwOOW0rahEfVg@mail.gmail.com
Whole thread
In response to Re: Automatically sizing the IO worker pool  (Andres Freund <andres@anarazel.de>)
Responses Re: Automatically sizing the IO worker pool
Re: Automatically sizing the IO worker pool
List pgsql-hackers
On Wed, Apr 8, 2026 at 12:30 PM Andres Freund <andres@anarazel.de> wrote:
> On 2026-04-08 11:18:51 +1200, Thomas Munro wrote:
> > >                 /* Choose one worker to wake for this batch. */
> > >                 if (worker == -1)
> > >                         worker = pgaio_worker_choose_idle(-1);
> >
> > Well I didn't want to wake a worker if we'd failed to enqueue
> > anything.
>
> I think it's worth waking up workers if there are idle ones and the queue is
> full?

True, but I prefer to test nsync because there is another reason to break:

commit 29a0fb215779d10fae0cbeb8ce57805f244bad9b
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date:   Wed Mar 11 12:11:04 2026 +0100

    Conditional locking in pgaio_worker_submit_internal

I haven't finished digesting that commit, and will follow up shortly
on that topic once this patch is in.

> I suspect the primary reasonis that pgaio_worker_request_grow() is triggered
> even when io_worker_control->nworkers is >= io_max_workers.

Yeah.  V6 already addressed that directly.

> I suspect there's also pingpong between submission not finding any workers
> idle, requesting growth, and workers being idle for a short period, then the
> same thing starting again.
>
> Seems like there should be two fields. One saying "notify postmaster again"
> and one "postmaster start a worker".  The former would only be cleared by
> postmaster after the timeout.

Good idea.  V7 has two tweaks:

* separate grow and grow_signal_sent flags, as you suggested
* it also applies the io_worker_launch_delay to cancelled grow requests

This seems to work pretty well for avoiding useless postmaster
wakeups.  You get a few due to cancelled grow requests, but not more
frequently than than io_worker_launch_delay allows, while the pool is
vacillating during workload changes.  It soon makes its mind up and
stabilises on a good size.  To be clear, there is no change in overall
effect, only a reduction in useless wakeups.

I retested the value of request cancellation.  If you comment that
call out, we do tend to overshoot, so I think it's worth having.  But
you were quite right to complain about the postmaster wakeup rate it
produced.

> > Our goal is simple: process every IO immediately.  We have immediate
> > feedback that is simple: there's an IO in the queue and there is no
> > idle worker.  The only action we can take is simple: add one more
> > worker.  So we don't need to suffer through the maths required to
> > figure out the ideal k for our M/G/k queue system (I think that's what
> > we have?) or any of the inputs that would require*.  The problem is
> > that on its own, the test triggered far too easily because a worker
> > that is not marked idle might in fact be just about to pick up that IO
>
> Is that case really concerning? As long as you have some rate limiting about
> the start rate, starting another worker when there are no idle workers seems
> harmless?  Afaict it's fairly self limiting.

I retested without the depth test and I continue to think we need it.
Without it, the pool overshoots by quite a lot.  You should be able to
set io_max_workers=32 without fear of creating a ton of useless worker
processes no matter what your workload.

> > on the one the one hand, and because there might be rare
> > spikes/clustering on the other, so I cooled it off a bit by
> > additionally testing if the queue appears to be growing or spiking
> > beyond some threshold.  I think it's OK to let the queue grow a bit
> > before we are triggered anyway, so the precise value used doesn't seem
> > too critical.  Someone might be able to come up with a more defensible
> > value, but in the end I just wanted a value that isn't triggered by
> > the outliers I see in real systems that are keeping up.  We could tune
> > it lower and overshoot more, but this setting seems to work pretty
> > well.  It doesn't seem likely that a real system could achieve a
> > steady state that is introducing latency but isn't increasing over
> > time, and pool size adjustments are bound to lag anyway.
>
> Yea, I don't think the precise logic matters that much as long as we ramp up
> reasonably fast without being crazy and ramp up a bit faster.

Cool.

Attachment

pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: updates for handling optional argument in system functions
Next
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication