Home > mailing lists
Re: dynamic background workers, round two - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: dynamic background workers, round two
Date	July 26, 2013 09:34:55
Msg-id	20130726093447.GH15081@alap2.anarazel.de Whole thread Raw
In response to	Re: dynamic background workers, round two (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: dynamic background workers, round two
List	pgsql-hackers
Tree view
On 2013-07-25 12:35:30 -0400, Robert Haas wrote:
> On Wed, Jul 24, 2013 at 5:34 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > This seems like a sensible idea to me. But, in the context of dynamic
> > query, don't we also need the reverse infrastructure of notifying a
> > bgworker that the client, that requested it to be started, has died?
> > Ending up with a dozen bgworkers running because the normal backend
> > FATALed doesn't seem nice.
> 
> I think that will be desirable for many, although not all, uses of
> background workers.  For example, you might want to be able to do
> something like SELECT pg_background('CREATE INDEX CONCURRENTLY ...')
> and then exit your session and have the background worker continue to
> run.  Or maybe somebody writes a trigger that starts a background
> worker (if there's not one already working) and queues up work for the
> trigger, and the background worker continues processing the queue
> until it is empty, and then exits.

> But for parallel query specifically, yeah, we'll probably want the
> untimely demise of either the original backend or any of its workers
> to result in the death of all the workers and an abort of the parent's
> transaction.  However, there's no need for the postmaster to be
> involved in any of that.  For example, once the worker process is up
> and running and the parent has its PID, it can use an on_shmem_exit
> hook or a PG_ENSURE_ERROR_CLEANUP() block to send SIGTERM to any
> still-living workers.

It doesn't need to be the postmaster, but I think we need to provide
central infrastructure for that. I don't want this to end up being
redone poorly in multiple places.
I just wanted to mention it, it obviously doesn't need to be implemented
at the same time.

> >> +#define InvalidPid                           (-1)
> >> +
> >
> > That value strikes me as a rather bad idea. As a pid that represents
> > init's processgroup. I.e. kill(InvalidPid) will try to kill init... I'd
> > rather not spread that outside async.c.
> 
> Well, if we ever try to kill(InvalidPid), that's a PostgreSQL bug.
> And if the postgres process has permission to send signals to init,
> that's an OS bug (or a PostgreSQL bug, since we refuse to run as
> root).  So I'm not very worried about it.

I don't think somebody mistakenly kill()ing somebody with -1 is all too
likely, but it's a valid value to pass. That's why I dislike using it as
constant for an invalid value.

> I've got two functions that
> need distinguishable return values for the not-yet-started case and
> the already-dead case, neither of which can be real PIDs.  If we don't
> use 0 and -1, the alternative is presumably to complicate the calling
> signature with another out parameter.  That does not seem like a net
> improvement.

I actually think those cases would be better served by an error code
mechanism which is extensible.

> >> +/*
> >> + * Report the PID of a newly-launched background worker in shared memory.
> >> + *
> >> + * This function should only be called from the postmaster.
> >> + */
> >> +void
> >> +ReportBackgroundWorkerPID(RegisteredBgWorker *rw)
> >> +{
> >> +     BackgroundWorkerSlot *slot;
> >> +
> >> +     Assert(rw->rw_shmem_slot < max_worker_processes);
> >> +     slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
> >> +     slot->pid = rw->rw_pid;
> >> +
> >> +     if (rw->rw_worker.bgw_notify_pid != 0)
> >> +             kill(rw->rw_worker.bgw_notify_pid, SIGUSR1);
> >> +}
> >
> > Any reason not to use SendProcSignal() properly here? Just that you
> > don't know the BackendId? I seems unclean to reuse SIGUSR1 without using
> > the infrastructure built for that.
> 
> We're in the postmaster here.  The postmaster currently uses kill
> directly in all cases, rather than SendProcSignal(), and I'd rather
> not change that.  Any code that the postmaster calls has got to be
> safe against arbitrary shared memory corruption, and I'd rather keep
> that footprint as small as possible.  For example, if we added
> bgw_backend_id, the postmaster would have to bounds check it to make
> sure it didn't seg fault.  The more of that kind of stuff we add, the
> more chances there are to destabilize the postmaster.  I'd like to
> keep it as minimal as possible.

I can see the point in the argument, but it still seems to be grotty
architecturally.

> >> +             for (;;)
> >> +             {
> >> +                     pid = GetBackgroundWorkerPid(handle);
> >> +                     if (pid != InvalidPid)
> >> +                             break;
> >> +                     rc = WaitLatch(&MyProc->procLatch,
> >> +                                                WL_LATCH_SET | WL_POSTMASTER_DEATH, 0);
> >> +                     if (rc & WL_POSTMASTER_DEATH)
> >> +                             return InvalidPid;
> >
> > This basically ignores postmaster death. With unix_latch.c that
> > shouldn't be a problem because afaics it's implementation correctly
> > should report an error in the next WaitLatch without waiting as well. I
> > am not sure about the win32 implementation, is that ok there as well?
> 
> I don't understand what you mean by this.  Why does it ignore postmaster death?

The other callers that check for WL_POSTMASTER_DEATH actually perform
drastic measures like exit()ing upon postmaster. Here you just
return. With the unix latch implementation it seems to be guaranteed
that the next WaitLatch() done somewhere else will also return
WL_POSTMASTER_DEATH. I am not sure whether it might actually "consume"
the death in the windows implementation.

> >> +     set_latch_on_sigusr1 = false;
> >
> > So, the set_latch_on_sigusr1 logical exists so MyProc->procLatch doesn't
> > get SetLatch'ed unless we're in +WaitForBackgroundWorkerStartup()?
> 
> Right.  Noah suggested to me that we might do this unconditionally,
> which would certainly be simpler and cleaner.  I dunno if it'd risk
> too many spurious wake-ups in any scenario, though.  I kind of doubt
> it, but I don't know.

If we were to using the procsignal mechanism, I wouldn't worry about
doing it unconditionally, but signals for the other reasons SIGUSR1 is
sent might actually come in relatively frequently.

> >> +/*
> >> + * When a backend asks to be notified about worker state changes, we
> >> + * set a flag in its backend entry.  The background worker machinery needs
> >> + * to know when such backends exit.
> >> + */
> >> +bool
> >> +PostmasterMarkPIDForWorkerNotify(int pid)
> >> +{
> >> +     dlist_iter      iter;
> >> +     Backend    *bp;
> >> +
> >> +     dlist_foreach(iter, &BackendList)
> >> +     {
> >> +             bp = dlist_container(Backend, elem, iter.cur);
> >> +             if (bp->pid == pid)
> >> +             {
> >> +                     bp->bgworker_notify = true;
> >> +                     return true;
> >> +             }
> >> +     }
> >> +     return false;
> >> +}
> >
> > Maybe we should have MyBackend?
> 
> How would that help?  This code is run in the postmaster, so that we
> only clear iterate through workers and clear bgw_notify_pid when the
> backend that has just exited has at some point appeared in a
> bgw_notify_pid.

Thinko.

> > Btw, you seem to want to support this in bgworkers started by a
> > bgworker. That's not going to work without some changes if the
> > "intermediate" bgworker is one without a backend since those don't use
> > procsignal_sigusr1_handler.

> Right.  I think it's OK for now to limit it to cases where the
> intermediate bgworker has a backend.  If someone else finds that
> restriction unacceptable, they can fix it.

I don't have a problem with the restriction, but I'd like to see a check
against it. Maybe check for MyBackendId != InvalidBackendId in
RegisterDynamicBackgroundWorker()? That would also prevent starting
further bgworkers before BackgroundWorkerInitializeConnection() is done
in a connected bgworker which seems to be a good thing.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services
pgsql-hackers by date:
From: Karol Trzcionka
Date: 26 July 2013, 08:39:12
Subject: Re: Patch to add support of "IF NOT EXISTS" to others "CREATE" statements
From: Greg Smith
Date: 26 July 2013, 09:42:45
Subject: Re: Design proposal: fsync absorb linear slider
Re: dynamic background workers, round two - Mailing list pgsql-hackers

Previous

Next