Managing IO workers in postmaster's state machine - Mailing list pgsql-hackers

From Andres Freund
Subject Managing IO workers in postmaster's state machine
Date
Msg-id w3z6w3g4aovivs735nk4pzjhmegntecesm3kktpebchegm5o53@aonnq2kn27xi
Whole thread Raw
Responses Re: Managing IO workers in postmaster's state machine
List pgsql-hackers
Hi,

For AIO, when using the worker process based "AIO", we need to manage a set of
workers. These need to be started by postmaster, obviously.

I have a few questions about the design details. They seemed mostly
independent of the larger AIO thread, so I started this separately.


1) Where to start/stop workers after a config reload?

The GUC that controls how many workers are allowed is PGC_SIGHUP. Therefore we
need to adjust the number of workers after a config reload.

Right now this is implemented (I think by Thomas, but it could have been me),
in a GUC assign hook. But that seems rather iffy.

For example, I think this means that, as we're in the middle of the GUC reload
process, a child forked in that moment may have some updated GUCs, but not
others.  But even besides that, it "feels wrong", in a way I can't fully
articulate.

However, it's not exactly obvious *where* workers should be started / stopped
instead.

One approach would be to put the adjustment in PostmasterStateMachine(), but
currently a config reload request doesn't trigger an invocation of
PostmasterStateMachine().

We have a somewhat similar case for bgworkers, for those we have a dedicated
call in ServerLoop().  I guess we could just copy that, but it doesn't seem
exactly pretty.


2) New state machine phase for AIO shutdown?

During non-crash shutdown, we currently stop archiver and walsender last
(except dead-end children, but those don't matter in this context). But at
least walsender might use AIO, so we need to shut down AIO after walsender.

We already have PM_SHUTDOWN, PM_SHUTDOWN_2. I didn't want to introduce
PM_SHUTDOWN_3. For now there's a new PM_SHUTDOWN_IO.

Is that the right approach?

But somehow the number of shutdown phases seems to be getting out of control,
besides PM_SHUTDOWN* we also have PM_STOP_BACKENDS, PM_WAIT_BACKENDS,
PM_WAIT_DEAD_END and PM_NO_CHILDREN...


3) Do we need an new PM* phase at startup?

Because the startup process - obviously - can perform IO, IO workers need to
be started before the startup process is started.

We already do this with checkpointer and bgwriter. For those we don't have a
new phase, we just do so before forking the startup process. That means that
the startup process might need to wait for checkpointer to finish starting up,
but that seems fine to me.  The same seems true for IO workers.

Therefore I don't see a need for a new phase?


4) Resetting pmState during crash-restart?

For the IO worker handling we need to check the phase we currently are in to
know whether to start new workers (so we don't start more workers while
shutting down).

That's easy during normal startup, the state is PM_INIT. But for a
crash-restart (bottom of PostmasterStateMachine()), pmState will be
PM_NO_CHILDREN - in which we would *NOT* want to start new IO workers normally
(if e.g. a SIGHUP is processed while in that state). But we do need to start
IO workers at that point in the crash-restart cycle.

The AIO patchset has dealt with that by moving pmState = PM_STARTUP to a bit
earlier. But that seems a *bit* weird, even though I don't see any negative
consequences right now.

Would it be better to reset pmState to PM_INIT? That doesn't seem quite right
either, it's described as "postmaster starting".


5) btmask_all_except2() not used anymore, btmask_all_except3() needed

The one place using btmask_all_except2() now also needs to allow IO workers,
and thus btmask_all_except3(). Which triggers warnings about
btmask_all_except2 being unused.

We could remove btmask_all_except2(), ifdef it out or create a more generic
version that works with arbitrary numbers of arguments.

Something like

static inline BackendTypeMask
btmask_all_except_n(int nargs, BackendType *t)
{
    BackendTypeMask mask = BTYPE_MASK_ALL;

    for (int i = 0; i < nargs; i++)
        mask = btmask_del(mask, t[i]);
    return mask;
}

#define btmask_all_except(...) \
    btmask_all_except_n( \
        lengthof(((BackendType[]){__VA_ARGS__})), \
        (BackendType[]){__VA_ARGS__} \
    )

should do the trick, I think.


6) Variable numbered aux processes

The IO workers are aux processes - that seemed to be the most
appropriate. However, if "real" AIO is configured (PGC_POSTMASTER), no IO
workers can be started.

The only change the patch had to make for that is:

-#define NUM_AUXILIARY_PROCS     6
+#define MAX_IO_WORKERS          32
+#define NUM_AUXILIARY_PROCS     (6 + MAX_IO_WORKERS)

Which means that we'll reserve some resources for IO workers even if no IO
workers can be started. Do we consider that a problem?  There's some precedent
- we reserve space for archiver even if not configured, despite archive_mode
being PGC_POSTMASTER - but of course that's a different scale.

I guess this could be argued to better be in a separate thread...

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Refactoring postmaster's code to cleanup after child exit
Next
From: "David G. Johnston"
Date:
Subject: Re: Document NULL