Re: background workers, round three - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: background workers, round three
Date
Msg-id CAB7nPqQqZ7XeBzrd_XLccYJKLxBwbz+=PSRpY4uOsRPgE7zwAQ@mail.gmail.com
Whole thread Raw
In response to background workers, round three  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: background workers, round three  (Robert Haas <robertmhaas@gmail.com>)
Re: background workers, round three  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

Finally I got the chance to put my hands on this code. Really sorry
for the late replay.

On Fri, Sep 13, 2013 at 10:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Last week, I attempted to write some code to perform a trivial
> operation in parallel by launching background workers.  Despite my
> earlier convictions that I'd built enough infrastructure to make this
> possible, the experiment turned out badly - so, patches!
>
> It's been pretty obvious to me from the beginning that any sort of
> parallel computation would need a way to make sure that if any worker
> dies, everybody dies.  Conversely, if the parent aborts, all the
> workers should die.  My thought was that this could be implemented
> using PG_TRY()/PG_CATCH() blocks over the existing infrastructure, but
> this turned out to be naive.  If the original backend encounters an
> error before the child manages to start up, there's no good recovery
> strategy.  The parent can't kill the child because it doesn't exist
> yet, and the parent can't stop the postmaster from starting the child
> later.  The parent could try waiting until the child starts up and
> THEN killing it, but that's probably unreliable and surely
> mind-numbingly lame.  The first attached patch,
> terminate-worker-v1.patch, rectifies this problem by providing a
> TerminateBackgroundWorker() function.  When this function is invoked,
> the postmaster will become unwilling to restart the worker and will
> send it a SIGTERM if it's already running.
And here are some comments after reading the first patch... The patch
looks good, creates no warnings and is not that invasive in the
existing code.

Few comments about the code:
1) In postmaster.c, what about adding a comment here telling that we
can forget about this bgworker as it has already been requested for a
termination:
+                       if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART
+                               || rw->rw_terminate)
2) Trying to think about this patch as an independent feature, I think
that it would have been nice to demonstrate the capacity of
TerminateBackgroundWorker directly with an example in worker_spi to
show a direct application of it, with for example the addition of a
function like worker_spi_terminate. However this needs:
- either an additional API using as input the PID, pid used to fetch a
bgworker handle terminating the worker afterwards. This is basically
what I did in the patch attached when playing with your patch. You
might find it useful... Or not! It introduces as well
worker_spi_terminate, a small API scanning the array of bgworker slots
. Not sure that this is much compatible with the concept of generation
though...
- return not only the PID of the worker started dynamically in
worker_spi_launch, but also the generation number of the worker to be
able to generate a bgworker handle that could be afterwards be used
with TerminateBackgroundWorker.
Note that this comment is based on my personal experience woth
bgworkers, and I think that it is important to provide examples of
what is implemented such as users are not off the track, even if
playing with bgworker is high-level hacking. TerminateBackgroundWorker
can offer a way to users to kill a bgworker more native than
pg_terminate_backend for example, especially if they implement a
bgworker structure of the type launcher/workers like what autovacuum
does.
3) Documentation is needed, following comment 2).

> It's important that the
> SIGTERM be sent by the postmaster, because if the original backend
> tries to send it, there's a race condition: the process might not be
> started at the time the original backend tries to send the signal, but
> the postmaster might start it before it sees the terminate request.)
That's interesting. Nothing more to say about that (perhaps because of
my lack of knowledge of the code in this area).

> By itself, this is useful, but painful.  The pain comes from the fact
> that all of the house-keeping is left to the programmer.
If this is necessary, so be it. But I think that newcomers to
background workers would appreciate at least an exampe in worker_spi
(using as a base what I provided in the patch attached) they could
refer to when beginning to implement a new feature. This is a thing
that people could, and for sure would refer to when implementing a
complex thing using this infrastructure.

This is all I have about the 1st patch... It is already late here, so
I'll have a look at the 2nd/3rd patch hopefully tomorrow or the day
after.
Regards,
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Next
From: Arturas Mazeika
Date:
Subject: #define ExclusiveLock in /usr/include/postgresql/9.1/server/storage/lock.h