Re: Managing IO workers in postmaster's state machine - Mailing list pgsql-hackers

From Cédric Villemain
Subject Re: Managing IO workers in postmaster's state machine
Date
Msg-id 1e16b1f8-c79c-4ce0-b516-7d51b88b9825@Data-Bene.io
Whole thread Raw
In response to Managing IO workers in postmaster's state machine  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi Andres,

Thanks for sharing these detailed questions and insights. Below are some 
comments and additional thoughts that might help, particularly regarding 
bgworkers.


On 09/12/2024 22:42, Andres Freund wrote:

> Hi,
>
> For AIO, when using the worker process based "AIO", we need to manage a set of
> workers. These need to be started by postmaster, obviously.
>
> I have a few questions about the design details. They seemed mostly
> independent of the larger AIO thread, so I started this separately.
>
>
> 1) Where to start/stop workers after a config reload?
>
> The GUC that controls how many workers are allowed is PGC_SIGHUP. Therefore we
> need to adjust the number of workers after a config reload.
>
> Right now this is implemented (I think by Thomas, but it could have been me),
> in a GUC assign hook. But that seems rather iffy.
>
> For example, I think this means that, as we're in the middle of the GUC reload
> process, a child forked in that moment may have some updated GUCs, but not
> others.  But even besides that, it "feels wrong", in a way I can't fully
> articulate.
>
> However, it's not exactly obvious *where* workers should be started / stopped
> instead.
>
> One approach would be to put the adjustment in PostmasterStateMachine(), but
> currently a config reload request doesn't trigger an invocation of
> PostmasterStateMachine().
>
> We have a somewhat similar case for bgworkers, for those we have a dedicated
> call in ServerLoop().  I guess we could just copy that, but it doesn't seem
> exactly pretty.


I am unsure why background workers are managed directly in the 
ServerLoop(), wouldn't it help if a "background worker launcher" exist 
instead ?

> 2) New state machine phase for AIO shutdown?
>
> During non-crash shutdown, we currently stop archiver and walsender last
> (except dead-end children, but those don't matter in this context). But at
> least walsender might use AIO, so we need to shut down AIO after walsender.
>
> We already have PM_SHUTDOWN, PM_SHUTDOWN_2. I didn't want to introduce
> PM_SHUTDOWN_3. For now there's a new PM_SHUTDOWN_IO.
>
> Is that the right approach?
>
> But somehow the number of shutdown phases seems to be getting out of control,
> besides PM_SHUTDOWN* we also have PM_STOP_BACKENDS, PM_WAIT_BACKENDS,
> PM_WAIT_DEAD_END and PM_NO_CHILDREN...


For the startup there is a StartupStatusEnum, it serves distinct purpose 
but maybe having ShutdownSequence/ShutdownStatus can be helpful: just 
outline order of operations for the shutdown state ? Maybe nearest from 
a chain of responsability than the existing state machine...

> 3) Do we need an new PM* phase at startup?
>
> Because the startup process - obviously - can perform IO, IO workers need to
> be started before the startup process is started.
>
> We already do this with checkpointer and bgwriter. For those we don't have a
> new phase, we just do so before forking the startup process. That means that
> the startup process might need to wait for checkpointer to finish starting up,
> but that seems fine to me.  The same seems true for IO workers.
>
> Therefore I don't see a need for a new phase?

So it should belong to PM_INIT? but the next question:

> 4) Resetting pmState during crash-restart?
>
> For the IO worker handling we need to check the phase we currently are in to
> know whether to start new workers (so we don't start more workers while
> shutting down).
>
> That's easy during normal startup, the state is PM_INIT. But for a
> crash-restart (bottom of PostmasterStateMachine()), pmState will be
> PM_NO_CHILDREN - in which we would *NOT* want to start new IO workers normally
> (if e.g. a SIGHUP is processed while in that state). But we do need to start
> IO workers at that point in the crash-restart cycle.
>
> The AIO patchset has dealt with that by moving pmState = PM_STARTUP to a bit
> earlier. But that seems a *bit* weird, even though I don't see any negative
> consequences right now.
>
> Would it be better to reset pmState to PM_INIT? That doesn't seem quite right
> either, it's described as "postmaster starting".

The comments in postmaster.c said that crash recovery is like shutdown 
followed by startup (not by init).
If I understood correctly IO workers will need to be restarted in such 
context, so they must be after PM_INIT ...


> 5) btmask_all_except2() not used anymore, btmask_all_except3() needed
>
> The one place using btmask_all_except2() now also needs to allow IO workers,
> and thus btmask_all_except3(). Which triggers warnings about
> btmask_all_except2 being unused.
>
> We could remove btmask_all_except2(), ifdef it out or create a more generic
> version that works with arbitrary numbers of arguments.
>
> Something like
>
> static inline BackendTypeMask
> btmask_all_except_n(int nargs, BackendType *t)
> {
>      BackendTypeMask mask = BTYPE_MASK_ALL;
>
>      for (int i = 0; i < nargs; i++)
>          mask = btmask_del(mask, t[i]);
>      return mask;
> }
>
> #define btmask_all_except(...) \
>      btmask_all_except_n( \
>          lengthof(((BackendType[]){__VA_ARGS__})), \
>          (BackendType[]){__VA_ARGS__} \
>      )
>
> should do the trick, I think.
>
>
> 6) Variable numbered aux processes
>
> The IO workers are aux processes - that seemed to be the most
> appropriate. However, if "real" AIO is configured (PGC_POSTMASTER), no IO
> workers can be started.
>
> The only change the patch had to make for that is:
>
> -#define NUM_AUXILIARY_PROCS     6
> +#define MAX_IO_WORKERS          32
> +#define NUM_AUXILIARY_PROCS     (6 + MAX_IO_WORKERS)
>
> Which means that we'll reserve some resources for IO workers even if no IO
> workers can be started. Do we consider that a problem?  There's some precedent
> - we reserve space for archiver even if not configured, despite archive_mode
> being PGC_POSTMASTER - but of course that's a different scale.


I have not checked the history, I have read some details on this topic, 
but the reasoning is not yet very clear to me as **why** this is 
hardcoded.  If there's a technical or historical reason for the current 
design, it would be useful to revisit that in this context.
What prevent the number of background workers to be dynamically sized 
today ?

---
Cédric Villemain +33 6 20 30 22 52
https://www.Data-Bene.io
PostgreSQL Support, Expertise, Training, R&D




pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Remaining dependency on setlocale()
Next
From: Alvaro Herrera
Date:
Subject: Re: Difference in dump from original and restored database due to NOT NULL constraints on children