Re: Parallel worker hangs while handling errors. - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Parallel worker hangs while handling errors. |
Date | |
Msg-id | CA+TgmoZAya0STWjtHdhr1FgeSV4EtLf69DkYFSG4ZjbHPLe7gw@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel worker hangs while handling errors. (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Parallel worker hangs while handling errors.
Re: Parallel worker hangs while handling errors. |
List | pgsql-hackers |
On Tue, Jul 28, 2020 at 5:35 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > The v4 patch looks good to me. Hang is not seen, make check and make > check-world passes. I moved this to the committer for further review > in https://commitfest.postgresql.org/29/2636/. I don't think I agree with this approach. In particular, I don't understand the rationale for unblocking only SIGUSR1. Above, Vignesh says that he feels that unblocking only that signal would be the right approach, but no reason is given. I have two reasons why I suspect it's not the right approach. One, it doesn't seem to be what we do elsewhere; the only existing cases where we have special handling for particular signals are SIGQUIT and SIGPIPE, and those places have comments explaining the reason why they are handled in a special way. Two, SIGUSR1 is used for a LOT of things: look at all the different cases procsignal_sigusr1_handler() checks. If the intention is to only allow the things we know are safe, rather than all the signals there are, I think this coding utterly fails to achieve that - and for reasons that I don't think are really fixable. My first idea about how to fix this was just to call BackgroundWorkerUnblockSignals() before sigsetjmp(), but that doesn't really work, because ParallelWorkerMain() needs to set the handler for SIGTERM before unblocking signals. When you really look at it, the code that does sigsetjmp() in StartBackgroundWorker() is entirely bogus. The comment says "See notes in postgres.c about the design of this coding," but if you go read that comment, it says that the point of using sigsetjmp() is to make sure that signals are unblocked within the if-block that follows, but the use in bgworker.c actually achieves exactly the opposite, because signals have not yet been unblocked at this point. So, whereas the postgres.c code unblocks signals if they are blocked, this code blocks signals if they are unblocked. Given that, maybe the right thing to do is to just start the if-block with a call to BackgroundWorkerUnblockSignals(). Perhaps there's some reason that would be unsafe, if the failure occurs too early: postgres.c doesn't unblock signals until after BaseInit() and InitProcess() have been called, but here an error in those functions would unblock signals while it's being handled. Off-hand, I don't see why that would matter, though. In the postgres.c case, there wouldn't be a PG_exception_stack yet, so we'd end up in the long part of pg_re_throw(), which basically promotes the ERROR to FATAL but otherwise does pretty similar things to what this handler does anyway. So I'm not really sure there's any reason not to just go this way. Another approach would be to establish a new PG_exception_stack() with a free sigsetjmp() call and fresh local buffer, inside ParallelWorkerMain(). It would do the same thing as the existing handler, but because it would be established after unblocking signals, sigsetjmp() would behave as desired. This doesn't seem quite as good to me because I think this pattern might end up getting copied into many background workers, and it's a bunch of extra code for which I don't really see a clear need. So at the moment I think a one line fix in StartBackgroundWorker(), to just unblock signals while handling errors, looks better. Adding Alvaro to the CC line, since I think he wrote this code originally. Not sure if he or anyone else might have an opinion. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: