Managing IO workers in postmaster's state machine - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Managing IO workers in postmaster's state machine |
Date | |
Msg-id | w3z6w3g4aovivs735nk4pzjhmegntecesm3kktpebchegm5o53@aonnq2kn27xi Whole thread Raw |
Responses |
Re: Managing IO workers in postmaster's state machine
|
List | pgsql-hackers |
Hi, For AIO, when using the worker process based "AIO", we need to manage a set of workers. These need to be started by postmaster, obviously. I have a few questions about the design details. They seemed mostly independent of the larger AIO thread, so I started this separately. 1) Where to start/stop workers after a config reload? The GUC that controls how many workers are allowed is PGC_SIGHUP. Therefore we need to adjust the number of workers after a config reload. Right now this is implemented (I think by Thomas, but it could have been me), in a GUC assign hook. But that seems rather iffy. For example, I think this means that, as we're in the middle of the GUC reload process, a child forked in that moment may have some updated GUCs, but not others. But even besides that, it "feels wrong", in a way I can't fully articulate. However, it's not exactly obvious *where* workers should be started / stopped instead. One approach would be to put the adjustment in PostmasterStateMachine(), but currently a config reload request doesn't trigger an invocation of PostmasterStateMachine(). We have a somewhat similar case for bgworkers, for those we have a dedicated call in ServerLoop(). I guess we could just copy that, but it doesn't seem exactly pretty. 2) New state machine phase for AIO shutdown? During non-crash shutdown, we currently stop archiver and walsender last (except dead-end children, but those don't matter in this context). But at least walsender might use AIO, so we need to shut down AIO after walsender. We already have PM_SHUTDOWN, PM_SHUTDOWN_2. I didn't want to introduce PM_SHUTDOWN_3. For now there's a new PM_SHUTDOWN_IO. Is that the right approach? But somehow the number of shutdown phases seems to be getting out of control, besides PM_SHUTDOWN* we also have PM_STOP_BACKENDS, PM_WAIT_BACKENDS, PM_WAIT_DEAD_END and PM_NO_CHILDREN... 3) Do we need an new PM* phase at startup? Because the startup process - obviously - can perform IO, IO workers need to be started before the startup process is started. We already do this with checkpointer and bgwriter. For those we don't have a new phase, we just do so before forking the startup process. That means that the startup process might need to wait for checkpointer to finish starting up, but that seems fine to me. The same seems true for IO workers. Therefore I don't see a need for a new phase? 4) Resetting pmState during crash-restart? For the IO worker handling we need to check the phase we currently are in to know whether to start new workers (so we don't start more workers while shutting down). That's easy during normal startup, the state is PM_INIT. But for a crash-restart (bottom of PostmasterStateMachine()), pmState will be PM_NO_CHILDREN - in which we would *NOT* want to start new IO workers normally (if e.g. a SIGHUP is processed while in that state). But we do need to start IO workers at that point in the crash-restart cycle. The AIO patchset has dealt with that by moving pmState = PM_STARTUP to a bit earlier. But that seems a *bit* weird, even though I don't see any negative consequences right now. Would it be better to reset pmState to PM_INIT? That doesn't seem quite right either, it's described as "postmaster starting". 5) btmask_all_except2() not used anymore, btmask_all_except3() needed The one place using btmask_all_except2() now also needs to allow IO workers, and thus btmask_all_except3(). Which triggers warnings about btmask_all_except2 being unused. We could remove btmask_all_except2(), ifdef it out or create a more generic version that works with arbitrary numbers of arguments. Something like static inline BackendTypeMask btmask_all_except_n(int nargs, BackendType *t) { BackendTypeMask mask = BTYPE_MASK_ALL; for (int i = 0; i < nargs; i++) mask = btmask_del(mask, t[i]); return mask; } #define btmask_all_except(...) \ btmask_all_except_n( \ lengthof(((BackendType[]){__VA_ARGS__})), \ (BackendType[]){__VA_ARGS__} \ ) should do the trick, I think. 6) Variable numbered aux processes The IO workers are aux processes - that seemed to be the most appropriate. However, if "real" AIO is configured (PGC_POSTMASTER), no IO workers can be started. The only change the patch had to make for that is: -#define NUM_AUXILIARY_PROCS 6 +#define MAX_IO_WORKERS 32 +#define NUM_AUXILIARY_PROCS (6 + MAX_IO_WORKERS) Which means that we'll reserve some resources for IO workers even if no IO workers can be started. Do we consider that a problem? There's some precedent - we reserve space for archiver even if not configured, despite archive_mode being PGC_POSTMASTER - but of course that's a different scale. I guess this could be argued to better be in a separate thread... Greetings, Andres Freund
pgsql-hackers by date: