Re: Managing IO workers in postmaster's state machine - Mailing list pgsql-hackers
From | Cédric Villemain |
---|---|
Subject | Re: Managing IO workers in postmaster's state machine |
Date | |
Msg-id | 1e16b1f8-c79c-4ce0-b516-7d51b88b9825@Data-Bene.io Whole thread Raw |
In response to | Managing IO workers in postmaster's state machine (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
Hi Andres, Thanks for sharing these detailed questions and insights. Below are some comments and additional thoughts that might help, particularly regarding bgworkers. On 09/12/2024 22:42, Andres Freund wrote: > Hi, > > For AIO, when using the worker process based "AIO", we need to manage a set of > workers. These need to be started by postmaster, obviously. > > I have a few questions about the design details. They seemed mostly > independent of the larger AIO thread, so I started this separately. > > > 1) Where to start/stop workers after a config reload? > > The GUC that controls how many workers are allowed is PGC_SIGHUP. Therefore we > need to adjust the number of workers after a config reload. > > Right now this is implemented (I think by Thomas, but it could have been me), > in a GUC assign hook. But that seems rather iffy. > > For example, I think this means that, as we're in the middle of the GUC reload > process, a child forked in that moment may have some updated GUCs, but not > others. But even besides that, it "feels wrong", in a way I can't fully > articulate. > > However, it's not exactly obvious *where* workers should be started / stopped > instead. > > One approach would be to put the adjustment in PostmasterStateMachine(), but > currently a config reload request doesn't trigger an invocation of > PostmasterStateMachine(). > > We have a somewhat similar case for bgworkers, for those we have a dedicated > call in ServerLoop(). I guess we could just copy that, but it doesn't seem > exactly pretty. I am unsure why background workers are managed directly in the ServerLoop(), wouldn't it help if a "background worker launcher" exist instead ? > 2) New state machine phase for AIO shutdown? > > During non-crash shutdown, we currently stop archiver and walsender last > (except dead-end children, but those don't matter in this context). But at > least walsender might use AIO, so we need to shut down AIO after walsender. > > We already have PM_SHUTDOWN, PM_SHUTDOWN_2. I didn't want to introduce > PM_SHUTDOWN_3. For now there's a new PM_SHUTDOWN_IO. > > Is that the right approach? > > But somehow the number of shutdown phases seems to be getting out of control, > besides PM_SHUTDOWN* we also have PM_STOP_BACKENDS, PM_WAIT_BACKENDS, > PM_WAIT_DEAD_END and PM_NO_CHILDREN... For the startup there is a StartupStatusEnum, it serves distinct purpose but maybe having ShutdownSequence/ShutdownStatus can be helpful: just outline order of operations for the shutdown state ? Maybe nearest from a chain of responsability than the existing state machine... > 3) Do we need an new PM* phase at startup? > > Because the startup process - obviously - can perform IO, IO workers need to > be started before the startup process is started. > > We already do this with checkpointer and bgwriter. For those we don't have a > new phase, we just do so before forking the startup process. That means that > the startup process might need to wait for checkpointer to finish starting up, > but that seems fine to me. The same seems true for IO workers. > > Therefore I don't see a need for a new phase? So it should belong to PM_INIT? but the next question: > 4) Resetting pmState during crash-restart? > > For the IO worker handling we need to check the phase we currently are in to > know whether to start new workers (so we don't start more workers while > shutting down). > > That's easy during normal startup, the state is PM_INIT. But for a > crash-restart (bottom of PostmasterStateMachine()), pmState will be > PM_NO_CHILDREN - in which we would *NOT* want to start new IO workers normally > (if e.g. a SIGHUP is processed while in that state). But we do need to start > IO workers at that point in the crash-restart cycle. > > The AIO patchset has dealt with that by moving pmState = PM_STARTUP to a bit > earlier. But that seems a *bit* weird, even though I don't see any negative > consequences right now. > > Would it be better to reset pmState to PM_INIT? That doesn't seem quite right > either, it's described as "postmaster starting". The comments in postmaster.c said that crash recovery is like shutdown followed by startup (not by init). If I understood correctly IO workers will need to be restarted in such context, so they must be after PM_INIT ... > 5) btmask_all_except2() not used anymore, btmask_all_except3() needed > > The one place using btmask_all_except2() now also needs to allow IO workers, > and thus btmask_all_except3(). Which triggers warnings about > btmask_all_except2 being unused. > > We could remove btmask_all_except2(), ifdef it out or create a more generic > version that works with arbitrary numbers of arguments. > > Something like > > static inline BackendTypeMask > btmask_all_except_n(int nargs, BackendType *t) > { > BackendTypeMask mask = BTYPE_MASK_ALL; > > for (int i = 0; i < nargs; i++) > mask = btmask_del(mask, t[i]); > return mask; > } > > #define btmask_all_except(...) \ > btmask_all_except_n( \ > lengthof(((BackendType[]){__VA_ARGS__})), \ > (BackendType[]){__VA_ARGS__} \ > ) > > should do the trick, I think. > > > 6) Variable numbered aux processes > > The IO workers are aux processes - that seemed to be the most > appropriate. However, if "real" AIO is configured (PGC_POSTMASTER), no IO > workers can be started. > > The only change the patch had to make for that is: > > -#define NUM_AUXILIARY_PROCS 6 > +#define MAX_IO_WORKERS 32 > +#define NUM_AUXILIARY_PROCS (6 + MAX_IO_WORKERS) > > Which means that we'll reserve some resources for IO workers even if no IO > workers can be started. Do we consider that a problem? There's some precedent > - we reserve space for archiver even if not configured, despite archive_mode > being PGC_POSTMASTER - but of course that's a different scale. I have not checked the history, I have read some details on this topic, but the reasoning is not yet very clear to me as **why** this is hardcoded. If there's a technical or historical reason for the current design, it would be useful to revisit that in this context. What prevent the number of background workers to be dynamically sized today ? --- Cédric Villemain +33 6 20 30 22 52 https://www.Data-Bene.io PostgreSQL Support, Expertise, Training, R&D
pgsql-hackers by date: