Re: background workers, round three - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: background workers, round three |
Date | |
Msg-id | CAB7nPqQqZ7XeBzrd_XLccYJKLxBwbz+=PSRpY4uOsRPgE7zwAQ@mail.gmail.com Whole thread Raw |
In response to | background workers, round three (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: background workers, round three
Re: background workers, round three |
List | pgsql-hackers |
Hi, Finally I got the chance to put my hands on this code. Really sorry for the late replay. On Fri, Sep 13, 2013 at 10:42 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Last week, I attempted to write some code to perform a trivial > operation in parallel by launching background workers. Despite my > earlier convictions that I'd built enough infrastructure to make this > possible, the experiment turned out badly - so, patches! > > It's been pretty obvious to me from the beginning that any sort of > parallel computation would need a way to make sure that if any worker > dies, everybody dies. Conversely, if the parent aborts, all the > workers should die. My thought was that this could be implemented > using PG_TRY()/PG_CATCH() blocks over the existing infrastructure, but > this turned out to be naive. If the original backend encounters an > error before the child manages to start up, there's no good recovery > strategy. The parent can't kill the child because it doesn't exist > yet, and the parent can't stop the postmaster from starting the child > later. The parent could try waiting until the child starts up and > THEN killing it, but that's probably unreliable and surely > mind-numbingly lame. The first attached patch, > terminate-worker-v1.patch, rectifies this problem by providing a > TerminateBackgroundWorker() function. When this function is invoked, > the postmaster will become unwilling to restart the worker and will > send it a SIGTERM if it's already running. And here are some comments after reading the first patch... The patch looks good, creates no warnings and is not that invasive in the existing code. Few comments about the code: 1) In postmaster.c, what about adding a comment here telling that we can forget about this bgworker as it has already been requested for a termination: + if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART + || rw->rw_terminate) 2) Trying to think about this patch as an independent feature, I think that it would have been nice to demonstrate the capacity of TerminateBackgroundWorker directly with an example in worker_spi to show a direct application of it, with for example the addition of a function like worker_spi_terminate. However this needs: - either an additional API using as input the PID, pid used to fetch a bgworker handle terminating the worker afterwards. This is basically what I did in the patch attached when playing with your patch. You might find it useful... Or not! It introduces as well worker_spi_terminate, a small API scanning the array of bgworker slots . Not sure that this is much compatible with the concept of generation though... - return not only the PID of the worker started dynamically in worker_spi_launch, but also the generation number of the worker to be able to generate a bgworker handle that could be afterwards be used with TerminateBackgroundWorker. Note that this comment is based on my personal experience woth bgworkers, and I think that it is important to provide examples of what is implemented such as users are not off the track, even if playing with bgworker is high-level hacking. TerminateBackgroundWorker can offer a way to users to kill a bgworker more native than pg_terminate_backend for example, especially if they implement a bgworker structure of the type launcher/workers like what autovacuum does. 3) Documentation is needed, following comment 2). > It's important that the > SIGTERM be sent by the postmaster, because if the original backend > tries to send it, there's a race condition: the process might not be > started at the time the original backend tries to send the signal, but > the postmaster might start it before it sees the terminate request.) That's interesting. Nothing more to say about that (perhaps because of my lack of knowledge of the code in this area). > By itself, this is useful, but painful. The pain comes from the fact > that all of the house-keeping is left to the programmer. If this is necessary, so be it. But I think that newcomers to background workers would appreciate at least an exampe in worker_spi (using as a base what I provided in the patch attached) they could refer to when beginning to implement a new feature. This is a thing that people could, and for sure would refer to when implementing a complex thing using this infrastructure. This is all I have about the 1st patch... It is already late here, so I'll have a look at the 2nd/3rd patch hopefully tomorrow or the day after. Regards, -- Michael
Attachment
pgsql-hackers by date: