Thread: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket
Hello,
There may possibly be a very small window for a double exit() self-deadlock during a forked backend process's ProcessStartupPacket returns and status is not STATUS_OK. The process will go into proc_exit and then a very well timed SIGQUIT will call startup_die for another proc_exit. If timed correctly, the two exit() calls can deadlock since exit() is not re-entrant. It seems extremely hard to hit the deadlock but I believe the opportunity is there.
Using gdb, I was able to create the window and get this stacktrace:
#0 startup_die (postgres_signal_arg=0) at postmaster.c:5090
#1 <signal handler called>
#2 proc_exit_prepare (code=0) at ipc.c:158
#3 0x00000000007c4135 in proc_exit (code=0) at ipc.c:102
#4 0x000000000076b736 in BackendInitialize (port=0x2c13740) at postmaster.c:4207
#5 0x000000000076b190 in BackendStartup (port=0x2c13740) at postmaster.c:3979
#6 0x0000000000767ad3 in ServerLoop () at postmaster.c:1722
#7 0x00000000007671df in PostmasterMain (argc=3, argv=0x2bebad0) at postmaster.c:1330
#8 0x00000000006b5df6 in main (argc=3, argv=0x2bebad0) at main.c:228
I got the stacktrace by doing the following:
- gdb attach to postmaster and run
set follow-fork child
andbreak postmaster.c:4206
(right after ProcessStartupPacket) and continue - In another terminal, open a psql session which should trigger the gdb follow
- In the gdb session,
set status=1
and step into proc_exit() - In another terminal,
kill -s QUIT <child pid>
to send SIGQUIT to the child process. Or run pg_ctl stop -M immediate. - In the gdb session, step to process the signal into
startup_die
and runbt
This was discovered while hacking on Greenplum Database (currently based off of Postgres 8.3) where we recently started encountering the self-deadlock intermittently in our testing environment.
In that pull request, we fix the issue by checking for proc_exit_inprogress. Is there a reason why startup_die should not check for proc_exit_inprogress?
In the above pull request, Heikki also mentions that a similar scenario can happen during palloc() as well... which is similar to what we saw in Greenplum a couple years back for a deadlock in a malloc() call where we responded by changing exit() to _exit() in quickdie as a fix. That could possibly be applicable to latest Postgres as well.
Regards,In the above pull request, Heikki also mentions that a similar scenario can happen during palloc() as well... which is similar to what we saw in Greenplum a couple years back for a deadlock in a malloc() call where we responded by changing exit() to _exit() in quickdie as a fix. That could possibly be applicable to latest Postgres as well.
On Thu, Feb 2, 2017 at 3:18 PM, Jimmy Yih <jyih@pivotal.io> wrote: > In that pull request, we fix the issue by checking for proc_exit_inprogress. > Is there a reason why startup_die should not check for proc_exit_inprogress? startup_die() is just calling proc_exit(), so it seems like it might be better to fix it by putting the check into proc_exit itself. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2017-02-02 12:18:22 -0800, Jimmy Yih wrote: > In the above pull request, Heikki also mentions that a similar scenario can > happen during palloc() as well... which is similar to what we saw in > Greenplum a couple years back for a deadlock in a malloc() call where we > responded by changing exit() to _exit() in quickdie as a fix. That could > possibly be applicable to latest Postgres as well. Isn't the quickdie() issue that we palloc/malloc in the first place, rather than the exit call? There's some risk for exit() too, but we reset our own atexit handlers before exiting, so the risk seems fairly small. In my opinion the fix here would be to do it right and never emit error messages from signal handlers via elog.c - we've progressed a lot towards the goal and do a lot less in signal handlers these days - but quickdie() is one glaring exception to that. I think a reasonable fix here would be to use write() of a statically allocated message, rather then elog proper, and not to send the message to the client. Libpq, and I presume other clients, synthethize a message upon unexpected socket closure anyway, and it's completely unclear whether we can send a message anyway. Greetings, Andres Freund
On 2017-06-22 10:41:41 -0700, Andres Freund wrote: > On 2017-02-02 12:18:22 -0800, Jimmy Yih wrote: > > In the above pull request, Heikki also mentions that a similar scenario can > > happen during palloc() as well... which is similar to what we saw in > > Greenplum a couple years back for a deadlock in a malloc() call where we > > responded by changing exit() to _exit() in quickdie as a fix. That could > > possibly be applicable to latest Postgres as well. > > Isn't the quickdie() issue that we palloc/malloc in the first place, > rather than the exit call? There's some risk for exit() too, but we > reset our own atexit handlers before exiting, so the risk seems fairly > small. > > > In my opinion the fix here would be to do it right and never emit error > messages from signal handlers via elog.c - we've progressed a lot > towards the goal and do a lot less in signal handlers these days - but > quickdie() is one glaring exception to that. I think a reasonable fix > here would be to use write() of a statically allocated message, rather > then elog proper, and not to send the message to the client. Libpq, and > I presume other clients, synthethize a message upon unexpected socket > closure anyway, and it's completely unclear whether we can send a > message anyway. Or, probably more robust: Simply _exit(2) without further ado, and rely on postmaster to output an appropriate error message. Arguably it's not actually useful to see hundreds of "WARNING: terminating connection because of crash of another server process" messages in the log anyway. - Andres
Andres Freund <andres@anarazel.de> writes: > Or, probably more robust: Simply _exit(2) without further ado, and rely > on postmaster to output an appropriate error message. Arguably it's not > actually useful to see hundreds of "WARNING: terminating connection because of > crash of another server process" messages in the log anyway. At that point you might as well skip the entire mechanism and go straight to SIGKILL. IMO the only reason quickdie() exists at all is to try to send a helpful message to the client. And it is helpful --- your attempt to claim that it isn't sounds very much like wishful thinking. regards, tom lane
On Thu, Jun 22, 2017 at 10:50 AM, Andres Freund <andres@anarazel.de> wrote: > > Or, probably more robust: Simply _exit(2) without further ado, and rely > on postmaster to output an appropriate error message. Arguably it's not > actually useful to see hundreds of "WARNING: terminating connection because of > crash of another server process" messages in the log anyway. > To support using _exit(2) in *quickdie() handlers, I would like to share another stack trace indicating self-deadlock. In this case, WAL writer process got SIGQUIT while it was already handling a SIGQUIT, leading to self-deadlock. #0 __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95 #1 0x00007f0bf04db2bd in _int_free (av=0x7f0bf081fb20 <main_arena>, p=0x1557e60, have_lock=0) at malloc.c:3962 #2 0x00007f0bf04df53c in __GI___libc_free (mem=mem@entry=0x1557e70) at malloc.c:2968 #3 0x00007f0bf0495025 in __run_exit_handlers (status=2, listp=0x7f0bf081f5f8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:91 #4 0x00007f0bf0495045 in __GI_exit (status=<optimized out>) at exit.c:104 #5 0x0000000000843994 in wal_quickdie () #6 <signal handler called> #7 0x00007f0bf04db014 in _int_free (av=0x7f0bf081fb20 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:4014 #8 0x00007f0bf04df53c in __GI___libc_free (mem=<optimized out>) at malloc.c:2968 #9 0x00007f0bebf8b2ba in ?? () from /usr/lib/x86_64-linux-gnu/libtasn1.so.6 #10 0x00007f0bebf8c4ba in asn1_delete_structure2 () from /usr/lib/x86_64-linux-gnu/libtasn1.so.6 #11 0x00007f0beec24738 in ?? () from /usr/lib/x86_64-linux-gnu/libgnutls.so.30 #12 0x00007f0bf3bb6de7 in _dl_fini () at dl-fini.c:235 #13 0x00007f0bf0494ff8 in __run_exit_handlers (status=2, listp=0x7f0bf081f5f8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82 #14 0x00007f0bf0495045 in __GI_exit (status=<optimized out>) at exit.c:104 #15 0x0000000000843994 in wal_quickdie () #16 <signal handler called> #17 0x00007f0bf05585b3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84 #18 0x0000000000b7c5da in pg_usleep () #19 0x0000000000843c4a in WalWriterMain () #20 0x000000000059ac47 in AuxiliaryProcessMain ()
Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket
From
Heikki Linnakangas
Date:
On 19/07/18 03:26, Asim R P wrote: > On Thu, Jun 22, 2017 at 10:50 AM, Andres Freund <andres@anarazel.de> wrote: >> >> Or, probably more robust: Simply _exit(2) without further ado, and rely >> on postmaster to output an appropriate error message. Arguably it's not >> actually useful to see hundreds of "WARNING: terminating connection because of >> crash of another server process" messages in the log anyway. >> > > To support using _exit(2) in *quickdie() handlers, I would like to > share another stack trace indicating self-deadlock. In this case, WAL > writer process got SIGQUIT while it was already handling a SIGQUIT, > leading to self-deadlock. Ugh. Yeah, in wal_quickdie, and other aux process *_quickdie() handlers, I agree we should just _exit(2). All we want to do is to exit the process immediately. bgworker_quickdie() makes some effort to block SIGQUIT during the exit() processing, but that doesn't solve the whole problem. The process could've been in the middle of a malloc/free when the signal arrived, for example. exit() is simply not safe to call from a signal handler. The regular backend's quickdie() function is more tricky. It should also call _exit(2) rather than exit(2). But it also tries to ereport a WARNING, and that is quite useful. ereport() is generally not safe in a signal handler. There is some protection for specific failures: see socket_putmessage(), for example, which has a mechanism to prevent a message from being sent to the client in the middle of another message. But at the end of the day, ereport() is complicated enough that it will surely do a lot of unsafe things. I don't have any great ideas on how to make the ereport(WARNING) safer, but I agree we should at least change all the exit(2) calls in quickdie handlers to _exit(2). BTW, if postmaster is still alive, it will send a SIGKILL to any child process that doesn't terminate in 5 seconds. So a deadlock in SIGQUIT handler isn't that bad. Some other nasty things could happen, however; all bets are off if you call unsafe functions from a signal handler. - Heikki
Hi, On 2018-07-19 11:57:25 +0300, Heikki Linnakangas wrote: > On 19/07/18 03:26, Asim R P wrote: > > On Thu, Jun 22, 2017 at 10:50 AM, Andres Freund <andres@anarazel.de> wrote: > > > > > > Or, probably more robust: Simply _exit(2) without further ado, and rely > > > on postmaster to output an appropriate error message. Arguably it's not > > > actually useful to see hundreds of "WARNING: terminating connection because of > > > crash of another server process" messages in the log anyway. > > > > > > > To support using _exit(2) in *quickdie() handlers, I would like to > > share another stack trace indicating self-deadlock. In this case, WAL > > writer process got SIGQUIT while it was already handling a SIGQUIT, > > leading to self-deadlock. > > Ugh. Yeah, in wal_quickdie, and other aux process *_quickdie() handlers, I > agree we should just _exit(2). All we want to do is to exit the process > immediately. Indeed. > bgworker_quickdie() makes some effort to block SIGQUIT during the exit() > processing, but that doesn't solve the whole problem. The process could've > been in the middle of a malloc/free when the signal arrived, for example. > exit() is simply not safe to call from a signal handler. Yea. I really don't understand why we have't been able to agree on this for *years* now. > The regular backend's quickdie() function is more tricky. It should also > call _exit(2) rather than exit(2). But it also tries to ereport a WARNING, > and that is quite useful. Is that actually true? Clients like libpq create the same error message (which has its own issues, because it'll sometimes mis-interpret things). The message doesn't actually have useful content, no? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-07-19 11:57:25 +0300, Heikki Linnakangas wrote: >> The regular backend's quickdie() function is more tricky. It should also >> call _exit(2) rather than exit(2). But it also tries to ereport a WARNING, >> and that is quite useful. There's already an on_exit_reset in there; why do we need more than that? > Is that actually true? Clients like libpq create the same error message > (which has its own issues, because it'll sometimes mis-interpret > things). The message doesn't actually have useful content, no? Yes, it does: it lets users tell the difference between exit due to a SIGQUIT and a crash of their own backend. Admittedly, if we crash trying to send the message, then we're not better off. But since that happens only very rarely, I do not think it's a reasonable tradeoff to never send it at all. regards, tom lane
On Thu, Jul 19, 2018 at 12:20:53PM -0700, Andres Freund wrote: > On 2018-07-19 11:57:25 +0300, Heikki Linnakangas wrote: > > Ugh. Yeah, in wal_quickdie, and other aux process *_quickdie() handlers, I > > agree we should just _exit(2). All we want to do is to exit the process > > immediately. > > Indeed. > > > bgworker_quickdie() makes some effort to block SIGQUIT during the exit() > > processing, but that doesn't solve the whole problem. The process could've > > been in the middle of a malloc/free when the signal arrived, for example. > > exit() is simply not safe to call from a signal handler. > > Yea. I really don't understand why we have't been able to agree on this > for *years* now. I mean, only calling async-signal-safe functions from signal handlers is a critical POSIX requirement, and exit(3) is NOT async-signal-safe. > > The regular backend's quickdie() function is more tricky. It should also > > call _exit(2) rather than exit(2). But it also tries to ereport a WARNING, > > and that is quite useful. > > Is that actually true? Clients like libpq create the same error message > (which has its own issues, because it'll sometimes mis-interpret > things). The message doesn't actually have useful content, no? Is ereport() async-signal-safe? No. It could be, but it uses stdio to write to stderr/console, and stdio is not async-signal-safe. Nico --
On Thu, Jul 19, 2018 at 03:49:35PM -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2018-07-19 11:57:25 +0300, Heikki Linnakangas wrote: > >> The regular backend's quickdie() function is more tricky. It should also > >> call _exit(2) rather than exit(2). But it also tries to ereport a WARNING, > >> and that is quite useful. > > There's already an on_exit_reset in there; why do we need more than that? You're not allowed to call exit(3) in signal handlers. exit(3) is not async-signal-safe (for example, it calls atexit handlers, flushes stdio, and who knows what else). > > Is that actually true? Clients like libpq create the same error message > > (which has its own issues, because it'll sometimes mis-interpret > > things). The message doesn't actually have useful content, no? > > Yes, it does: it lets users tell the difference between exit due to a > SIGQUIT and a crash of their own backend. ereport() calls (via errstart() and write_stderr()) vfprintf() and fflush(stderr). That's absolutely not async-signal-safe. The WIN32 code in write_stderr() does use vsnprintf() instead, and that could get written to stderr with write(2), which is async-signal-safe. > Admittedly, if we crash trying to send the message, then we're not > better off. But since that happens only very rarely, I do not think > it's a reasonable tradeoff to never send it at all. If sending it means using OpenSSL or what have you, that's probably even more async-signal-safe code. Now, ereport() could all be made safe enough for use in signal handlers, but it isn't at this time. What I'd do is have a volatile sig_atomic_t in_signal_handler_context variable to indicate that we're dying, and then when that is non-zero, ereport() and friends could use all-async-signal-safe codepaths. Nico --
Nico Williams <nico@cryptonector.com> writes: > What I'd do is have a volatile sig_atomic_t in_signal_handler_context > variable to indicate that we're dying, and then when that is non-zero, > ereport() and friends could use all-async-signal-safe codepaths. I eagerly await your patch with an async-safe implementation of ereport... regards, tom lane
On Thu, Jun 22, 2017 at 03:10:31PM -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > Or, probably more robust: Simply _exit(2) without further ado, and rely > > on postmaster to output an appropriate error message. Arguably it's not > > actually useful to see hundreds of "WARNING: terminating connection because of > > crash of another server process" messages in the log anyway. > > At that point you might as well skip the entire mechanism and go straight > to SIGKILL. IMO the only reason quickdie() exists at all is to try to > send a helpful message to the client. And it is helpful --- your attempt > to claim that it isn't sounds very much like wishful thinking. I dunno if it is or isn't helpful. But I do know that this must be done in an async-signal-safe way. Besides making ereport() async-signal-safe, which is tricky, you could write(2) the arguments to a pipe that another thread in the same process is reading from and which will then call ereport() and exit(3). This would be less work if you're willing to use a thread for that (the thread would only block in read(2) on that pipe, and would only provide this one service). Nico --
On Thu, Jul 19, 2018 at 04:04:01PM -0400, Tom Lane wrote: > Nico Williams <nico@cryptonector.com> writes: > > What I'd do is have a volatile sig_atomic_t in_signal_handler_context > > variable to indicate that we're dying, and then when that is non-zero, > > ereport() and friends could use all-async-signal-safe codepaths. > > I eagerly await your patch with an async-safe implementation of ereport... Once my ALWAYS DEFERRED patch is done, sure :) Also, see my other post just now where I propose a thread-based mechanism for making quickdie() async-signal-safe. Basically, there would be a pipe and a thread that only blocks in read(2) on that pipe, and quickdie() would write to the pipe the relevant details, then that thread would call ereport() and then exit(3). This would be much much simpler to implement than making ereport() async-signal-safe. Nico --
Hi, On 2018-07-19 15:49:35 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2018-07-19 11:57:25 +0300, Heikki Linnakangas wrote: > >> The regular backend's quickdie() function is more tricky. It should also > >> call _exit(2) rather than exit(2). But it also tries to ereport a WARNING, > >> and that is quite useful. > > There's already an on_exit_reset in there; why do we need more than that? Because there's plenty things, many of which are not signal safe, running atexit handlers, not necessarily just ours. exit() simply isn't signal safe. In Asim's case that's a gnutls (where did that come from) atexit(). There's plenty other libraries with things like that, not to speak of PLs. > > Is that actually true? Clients like libpq create the same error message > > (which has its own issues, because it'll sometimes mis-interpret > > things). The message doesn't actually have useful content, no? > > Yes, it does: it lets users tell the difference between exit due to a > SIGQUIT and a crash of their own backend. Not really reliably, though. Given that we can deadlock, the message might not arrive, the absence doesn't allow to infer that it was a crash. > Admittedly, if we crash trying to send the message, then we're not > better off. But since that happens only very rarely, I do not think > it's a reasonable tradeoff to never send it at all. Failovers that randomly take longer aren't all that harmless. Greetings, Andres Freund
On 2018-07-19 15:04:15 -0500, Nico Williams wrote: > Besides making ereport() async-signal-safe, which is tricky, you could > write(2) the arguments to a pipe that another thread in the same process > is reading from and which will then call ereport() and exit(3). This > would be less work if you're willing to use a thread for that (the > thread would only block in read(2) on that pipe, and would only provide > this one service). It'd also increase memory usage noticably (we'd have twice the process count in the kernel, would have a lot of additional stacks etc), would tie us to supporting threading in the backend, ... This is a DOA approach imo. Greetings, Andres Freund
On 2018-07-19 14:54:52 -0500, Nico Williams wrote: > On Thu, Jul 19, 2018 at 12:20:53PM -0700, Andres Freund wrote: > > On 2018-07-19 11:57:25 +0300, Heikki Linnakangas wrote: > > > Ugh. Yeah, in wal_quickdie, and other aux process *_quickdie() handlers, I > > > agree we should just _exit(2). All we want to do is to exit the process > > > immediately. > > > > Indeed. > > > > > bgworker_quickdie() makes some effort to block SIGQUIT during the exit() > > > processing, but that doesn't solve the whole problem. The process could've > > > been in the middle of a malloc/free when the signal arrived, for example. > > > exit() is simply not safe to call from a signal handler. > > > > Yea. I really don't understand why we have't been able to agree on this > > for *years* now. > > I mean, only calling async-signal-safe functions from signal handlers is > a critical POSIX requirement, and exit(3) is NOT async-signal-safe. There's a few standard requirements that aren't particularly relevant in practice (including some functions not being listed as signal safe). Just arguing from that isn't really helpful. But there's plenty hard evidence, including a few messages upthread!, of it being practically problematic. Just looking at the reasoning at why exit (and malloc) aren't signal safe however, makes it clear that this particular restriction should be adhered to, however. > > > The regular backend's quickdie() function is more tricky. It should also > > > call _exit(2) rather than exit(2). But it also tries to ereport a WARNING, > > > and that is quite useful. > > > > Is that actually true? Clients like libpq create the same error message > > (which has its own issues, because it'll sometimes mis-interpret > > things). The message doesn't actually have useful content, no? > > Is ereport() async-signal-safe? No. It could be, but it uses stdio to > write to stderr/console, and stdio is not async-signal-safe. Im not sure what you're trying to tell me here? What you quoted doesn't advocate doing anything with ereport()? Greetings, Andres Freund
Nico Williams <nico@cryptonector.com> writes: > I dunno if it is or isn't helpful. But I do know that this must be done > in an async-signal-safe way. I haven't actually heard a convincing reason why that's true. As per the previous discussion, if we happen to service the SIGQUIT at an unfortunate moment, we might get a deadlock or crash in the backend process, and thereby fail to send the message. But we're no worse off in such cases than if we'd not tried to send it at all. The only likely penalty is that, in the deadlock case, a few seconds will elapse before the postmaster runs out of patience and sends SIGKILL. Yeah, it'd be nicer if we weren't taking such risks, but the amount of effort required to get there seems very far out of proportion to the benefit. > Besides making ereport() async-signal-safe, which is tricky, you could > write(2) the arguments to a pipe that another thread in the same process > is reading from and which will then call ereport() and exit(3). We have not yet accepted any patch that introduces threading into the core backend, and this seems like a particularly unlikely place to start. Color me dubious, as well, that thread 2 calling exit() (never mind ereport()) while the main thread continues to do $whatever offers any actual gain in safety. regards, tom lane
On Thu, Jul 19, 2018 at 01:10:14PM -0700, Andres Freund wrote: > On 2018-07-19 15:04:15 -0500, Nico Williams wrote: > > Besides making ereport() async-signal-safe, which is tricky, you could > > write(2) the arguments to a pipe that another thread in the same process > > is reading from and which will then call ereport() and exit(3). This > > would be less work if you're willing to use a thread for that (the > > thread would only block in read(2) on that pipe, and would only provide > > this one service). > > It'd also increase memory usage noticably (we'd have twice the process > count in the kernel, would have a lot of additional stacks etc), would > tie us to supporting threading in the backend, ... This is a DOA > approach imo. You can create that thread with a really small stack given that its only purpose is to do this error reporting and exit. Running a thread that does only this does not impact the rest of the code in the backend at all -- it's not "threading" the backend. When it gets invoked, the caller would be blocking / sleeping, waiting for the coming exit, while this helper thread would block until invoked. It's really not a big deal. I use this technique in some of my programs (unfortunately none in my github repos). Usually I use it for detection of parent process death (so that if the parent dies silently, the children die too). In that case the child-side of fork() closes the write end of a pipe and starts a thread that blocks in read(2) on the read end of the pipe, and exit()s when the read returns anything other than EINTR. Nico --
On Thu, Jul 19, 2018 at 04:16:31PM -0400, Tom Lane wrote: > Nico Williams <nico@cryptonector.com> writes: > > I dunno if it is or isn't helpful. But I do know that this must be done > > in an async-signal-safe way. > > I haven't actually heard a convincing reason why that's true. As per It's undefined behavior. The system is free to do all sorts of terrible things. In practice deadlock or crash are the things you'll see, as you say. > the previous discussion, if we happen to service the SIGQUIT at an > unfortunate moment, we might get a deadlock or crash in the backend > process, and thereby fail to send the message. But we're no worse > off in such cases than if we'd not tried to send it at all. The only > likely penalty is that, in the deadlock case, a few seconds will elapse > before the postmaster runs out of patience and sends SIGKILL. > > Yeah, it'd be nicer if we weren't taking such risks, but the amount > of effort required to get there seems very far out of proportion to > the benefit. > > > Besides making ereport() async-signal-safe, which is tricky, you could > > write(2) the arguments to a pipe that another thread in the same process > > is reading from and which will then call ereport() and exit(3). > > We have not yet accepted any patch that introduces threading into the > core backend, and this seems like a particularly unlikely place to > start. Color me dubious, as well, that thread 2 calling exit() (never > mind ereport()) while the main thread continues to do $whatever offers > any actual gain in safety. No, the other thread does NOT continue to do whatever -- it blocks/sleeps forever waiting for the coming exit(3). I.e., quickdie() would look like this: /* Marshall info in to an automatic */ struct quickdie_info info, *infop; info.this = that; ... infop = &info; /* Tell the death thread to report and exit */ /* XXX actually something like net_write(), to handle EINTR */ write(quickdie_pipe[1], infop, sizeof(infop)); /* Wait forever */ for (;;) usleep(FOREVER); and the thread would basically do: struct quickdie_info *info; /* Wait forever for a quickdie() to happen on the main thread */ /* XXX Actually something like net_read(), to handle EINTR */ read(quickdie_pipe[0], &infop, sizeof(infop)); ereport(infop->...); exit(1); This use of threads does not require any changes to the rest of the codebase. Nico --
On 2018-07-19 15:17:26 -0500, Nico Williams wrote: > On Thu, Jul 19, 2018 at 01:10:14PM -0700, Andres Freund wrote: > > On 2018-07-19 15:04:15 -0500, Nico Williams wrote: > > > Besides making ereport() async-signal-safe, which is tricky, you could > > > write(2) the arguments to a pipe that another thread in the same process > > > is reading from and which will then call ereport() and exit(3). This > > > would be less work if you're willing to use a thread for that (the > > > thread would only block in read(2) on that pipe, and would only provide > > > this one service). > > > > It'd also increase memory usage noticably (we'd have twice the process > > count in the kernel, would have a lot of additional stacks etc), would > > tie us to supporting threading in the backend, ... This is a DOA > > approach imo. > > You can create that thread with a really small stack given that its only > purpose is to do this error reporting and exit. You still have a full kernel process backing it, which is *FAR* from free. And we'd enough infrastructure to setup threads with small stacks on a number of platforms. > Running a thread that does only this does not impact the rest of the > code in the backend at all -- it's not "threading" the backend. It actually does. Without passing thread related flags to the compiler, which you need to do for correctness, the compiler and libraries are free to use faster non-threadsafe implementations. Greetings, Andres Freund
Hi, On 2018-07-19 16:16:31 -0400, Tom Lane wrote: > Nico Williams <nico@cryptonector.com> writes: > > I dunno if it is or isn't helpful. But I do know that this must be done > > in an async-signal-safe way. > > I haven't actually heard a convincing reason why that's true. As per > the previous discussion, if we happen to service the SIGQUIT at an > unfortunate moment, we might get a deadlock or crash in the backend > process, and thereby fail to send the message. That crash could very well be exploitable. Corrupting internal management state is far from guaranteed to only deadlock or crash cleanly. > But we're no worse off in such cases than if we'd not tried to send it > at all. The only likely penalty is that, in the deadlock case, a few > seconds will elapse before the postmaster runs out of patience and > sends SIGKILL. Which for deterministic failover *IS* a problem. Greetings, Andres Freund
On Thu, Jul 19, 2018 at 01:35:02PM -0700, Andres Freund wrote: > On 2018-07-19 15:17:26 -0500, Nico Williams wrote: > > You can create that thread with a really small stack given that its only > > purpose is to do this error reporting and exit. > > You still have a full kernel process backing it, which is *FAR* from > free. [...] It's not that big. Its stack is small. > [...] And we'd enough infrastructure to setup threads with small stacks > on a number of platforms. Portability is actually the problem here, yes. But you could enable the async-signal-safe thread path in some platforms and not others, and you'd still be improving things. > > Running a thread that does only this does not impact the rest of the > > code in the backend at all -- it's not "threading" the backend. > > It actually does. Without passing thread related flags to the compiler, > which you need to do for correctness, the compiler and libraries are > free to use faster non-threadsafe implementations. That's build goop, but there's no need to add mutexes or whatever elsewhere. Nico --
Hi, On 2018-07-19 15:27:06 -0500, Nico Williams wrote: > No, the other thread does NOT continue to do whatever -- it > blocks/sleeps forever waiting for the coming exit(3). > > I.e., quickdie() would look like this: > > /* Marshall info in to an automatic */ > struct quickdie_info info, *infop; > > info.this = that; > ... > infop = &info; > > /* Tell the death thread to report and exit */ > /* XXX actually something like net_write(), to handle EINTR */ > write(quickdie_pipe[1], infop, sizeof(infop)); > > /* Wait forever */ > for (;;) usleep(FOREVER); > > and the thread would basically do: > > struct quickdie_info *info; > > /* Wait forever for a quickdie() to happen on the main thread */ > /* XXX Actually something like net_read(), to handle EINTR */ > read(quickdie_pipe[0], &infop, sizeof(infop)); > > ereport(infop->...); > > exit(1); > > This use of threads does not require any changes to the rest of the > codebase. Uhm, this'd already require a fair bit of threadsafety. Like at least all of the memory allocator / context code. Nor is having threads around unproblematic for subprocesses that are forked off. Nor does this account for the portability work. Greetings, Andres Freund
On Thu, Jul 19, 2018 at 01:38:52PM -0700, Andres Freund wrote: > On 2018-07-19 15:27:06 -0500, Nico Williams wrote: > > No, the other thread does NOT continue to do whatever -- it > > blocks/sleeps forever waiting for the coming exit(3). > > > > I.e., quickdie() would look like this: > > > > [...] > > > > and the thread would basically do: > > > > [...] > > > > This use of threads does not require any changes to the rest of the > > codebase. > > Uhm, this'd already require a fair bit of threadsafety. Like at least > all of the memory allocator / context code. Nor is having threads > around unproblematic for subprocesses that are forked off. Nor does > this account for the portability work. Yes, but that's in libc. None of that is in the PG code itself.
On Thu, Jul 19, 2018 at 03:42:46PM -0500, Nico Williams wrote: > On Thu, Jul 19, 2018 at 01:38:52PM -0700, Andres Freund wrote: > > Uhm, this'd already require a fair bit of threadsafety. Like at least > > all of the memory allocator / context code. Nor is having threads > > around unproblematic for subprocesses that are forked off. Nor does > > this account for the portability work. > > Yes, but that's in libc. None of that is in the PG code itself. Hmm, it would have perf impact, yes. Could the postmaster keep a pipe to all the backend processes and do reporting for them?
On 2018-07-19 15:42:46 -0500, Nico Williams wrote: > On Thu, Jul 19, 2018 at 01:38:52PM -0700, Andres Freund wrote: > > On 2018-07-19 15:27:06 -0500, Nico Williams wrote: > > > No, the other thread does NOT continue to do whatever -- it > > > blocks/sleeps forever waiting for the coming exit(3). > > > > > > I.e., quickdie() would look like this: > > > > > > [...] > > > > > > and the thread would basically do: > > > > > > [...] > > > > > > This use of threads does not require any changes to the rest of the > > > codebase. > > > > Uhm, this'd already require a fair bit of threadsafety. Like at least > > all of the memory allocator / context code. Nor is having threads > > around unproblematic for subprocesses that are forked off. Nor does > > this account for the portability work. > > Yes, but that's in libc. None of that is in the PG code itself. That's simply entirely completely wrong. PG has a good chunk of memory management layers (facilitating memory contexts) ontop of malloc. And that's stateful. Greetings, Andres Freund
On 2018-07-19 15:44:23 -0500, Nico Williams wrote: > On Thu, Jul 19, 2018 at 03:42:46PM -0500, Nico Williams wrote: > > On Thu, Jul 19, 2018 at 01:38:52PM -0700, Andres Freund wrote: > > > Uhm, this'd already require a fair bit of threadsafety. Like at least > > > all of the memory allocator / context code. Nor is having threads > > > around unproblematic for subprocesses that are forked off. Nor does > > > this account for the portability work. > > > > Yes, but that's in libc. None of that is in the PG code itself. > > Hmm, it would have perf impact, yes. Could the postmaster keep a pipe > to all the backend processes and do reporting for them? No, postmaster doesn't have sockets open to the client.
On Thu, Jul 19, 2018 at 01:46:12PM -0700, Andres Freund wrote: > On 2018-07-19 15:42:46 -0500, Nico Williams wrote: > > Yes, but that's in libc. None of that is in the PG code itself. > > That's simply entirely completely wrong. PG has a good chunk of memory > management layers (facilitating memory contexts) ontop of malloc. And > that's stateful. Ah, Ok, I see. Fine.
Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket
From
Heikki Linnakangas
Date:
On 19/07/18 23:04, Nico Williams wrote: > On Thu, Jun 22, 2017 at 03:10:31PM -0400, Tom Lane wrote: >> Andres Freund <andres@anarazel.de> writes: >>> Or, probably more robust: Simply _exit(2) without further ado, and rely >>> on postmaster to output an appropriate error message. Arguably it's not >>> actually useful to see hundreds of "WARNING: terminating connection because of >>> crash of another server process" messages in the log anyway. >> >> At that point you might as well skip the entire mechanism and go straight >> to SIGKILL. IMO the only reason quickdie() exists at all is to try to >> send a helpful message to the client. And it is helpful --- your attempt >> to claim that it isn't sounds very much like wishful thinking. > > I dunno if it is or isn't helpful. But I do know that this must be done > in an async-signal-safe way. > > Besides making ereport() async-signal-safe, which is tricky, you could > write(2) the arguments to a pipe that another thread in the same process > is reading from and which will then call ereport() and exit(3). I don't see how that helps. It still wouldn't be safe for the other thread to call ereport(), because the main thread might be in the middle of calling ereport() itself. - Heikki
Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket
From
Heikki Linnakangas
Date:
On 19/07/18 23:13, Andres Freund wrote: > On 2018-07-19 14:54:52 -0500, Nico Williams wrote: >> On Thu, Jul 19, 2018 at 12:20:53PM -0700, Andres Freund wrote: >>> On 2018-07-19 11:57:25 +0300, Heikki Linnakangas wrote: >>>> Ugh. Yeah, in wal_quickdie, and other aux process *_quickdie() handlers, I >>>> agree we should just _exit(2). All we want to do is to exit the process >>>> immediately. >>> >>> Indeed. >>> >>>> bgworker_quickdie() makes some effort to block SIGQUIT during the exit() >>>> processing, but that doesn't solve the whole problem. The process could've >>>> been in the middle of a malloc/free when the signal arrived, for example. >>>> exit() is simply not safe to call from a signal handler. >>> >>> Yea. I really don't understand why we have't been able to agree on this >>> for *years* now. >> >> I mean, only calling async-signal-safe functions from signal handlers is >> a critical POSIX requirement, and exit(3) is NOT async-signal-safe. > > There's a few standard requirements that aren't particularly relevant in > practice (including some functions not being listed as signal > safe). Just arguing from that isn't really helpful. But there's plenty > hard evidence, including a few messages upthread!, of it being > practically problematic. Just looking at the reasoning at why exit (and > malloc) aren't signal safe however, makes it clear that this particular > restriction should be adhered to, however. ISTM that no-one has any great ideas on what to do about the ereport() in quickdie(). But I think we have consensus on replacing the exit(2) calls with _exit(2). If we do just that, it would be better than the status quo, even if it doesn't completely fix the problem. This would prevent the case that Asim reported, for starters. Any objections to the attached? With _exit(), I think we wouldn't really need the "PG_SETMASK(&BlockSig);" calls in the aux process *_quickdie handlers that don't do anything else than call _exit(). But I didn't dare to remove them yet. - Heikki
Attachment
On 2018-07-20 11:53:32 +0300, Heikki Linnakangas wrote: > ISTM that no-one has any great ideas on what to do about the ereport() in > quickdie(). I think we actually have some decent ideas how to make that less problematic in a number of cases. Avoid translation (thereby avoiding direct malloc()) and rely on the fact that the error message evaluation happens in a memory context with pre-allocated memory. That still leaves openssl, but is better than nothing. If we wanted, we additionally could force ssl allocations to happen through another context with preallocated memory. > But I think we have consensus on replacing the exit(2) calls > with _exit(2). If we do just that, it would be better than the status quo, > even if it doesn't completely fix the problem. This would prevent the case > that Asim reported, for starters. Right. You're planning to backpatch? > With _exit(), I think we wouldn't really need the "PG_SETMASK(&BlockSig);" > calls in the aux process *_quickdie handlers that don't do anything else > than call _exit(). But I didn't dare to remove them yet. I think we should remove it at least in master. > diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c > index 960d3de204..010d53a6ce 100644 > --- a/src/backend/postmaster/bgwriter.c > +++ b/src/backend/postmaster/bgwriter.c > @@ -404,22 +404,21 @@ bg_quickdie(SIGNAL_ARGS) > /* > * We DO NOT want to run proc_exit() callbacks -- we're here because > * shared memory may be corrupted, so we don't want to try to clean up our > - * transaction. Just nail the windows shut and get out of town. Now that > - * there's an atexit callback to prevent third-party code from breaking > - * things by calling exit() directly, we have to reset the callbacks > - * explicitly to make this work as intended. > - */ > - on_exit_reset(); > - > - /* > - * Note we do exit(2) not exit(0). This is to force the postmaster into a > - * system reset cycle if some idiot DBA sends a manual SIGQUIT to a random > - * backend. This is necessary precisely because we don't clean up our > - * shared memory state. (The "dead man switch" mechanism in pmsignal.c > - * should ensure the postmaster sees this as a crash, too, but no harm in > - * being doubly sure.) > + * transaction. Just nail the windows shut and get out of town. > + * > + * There's an atexit callback to prevent third-party code from breaking > + * things by calling exit() directly. We don't want to trigger that, so > + * we use _exit(), which doesn't run atexit callbacks, rather than exit(). > + * And exit() wouldn't be safe to run from a signal handler, anyway. It's much less the exit() that's unsafe, than the callbacks themselves, right? Perhaps just restate that we wouldn't want to trigger atexit processing due to signal safety? > + * Note we use _exit(2) not _exit(0). This is to force the postmaster > + * into a system reset cycle if some idiot DBA sends a manual SIGQUIT to a > + * random backend. This is necessary precisely because we don't clean up > + * our shared memory state. (The "dead man switch" mechanism in > + * pmsignal.c should ensure the postmaster sees this as a crash, too, but > + * no harm in being doubly sure.) > */ > - exit(2); > + _exit(2); > } Greetings, Andres Freund
Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket
From
Ashwin Agrawal
Date:
On Fri, Jul 20, 2018 at 1:54 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 19/07/18 23:13, Andres Freund wrote:
> On 2018-07-19 14:54:52 -0500, Nico Williams wrote:
>> On Thu, Jul 19, 2018 at 12:20:53PM -0700, Andres Freund wrote:
>>> On 2018-07-19 11:57:25 +0300, Heikki Linnakangas wrote:
>>>> Ugh. Yeah, in wal_quickdie, and other aux process *_quickdie() handlers, I
>>>> agree we should just _exit(2). All we want to do is to exit the process
>>>> immediately.
>>>
>>> Indeed.
>>>
>>>> bgworker_quickdie() makes some effort to block SIGQUIT during the exit()
>>>> processing, but that doesn't solve the whole problem. The process could've
>>>> been in the middle of a malloc/free when the signal arrived, for example.
>>>> exit() is simply not safe to call from a signal handler.
>>>
>>> Yea. I really don't understand why we have't been able to agree on this
>>> for *years* now.
>>
>> I mean, only calling async-signal-safe functions from signal handlers is
>> a critical POSIX requirement, and exit(3) is NOT async-signal-safe.
>
> There's a few standard requirements that aren't particularly relevant in
> practice (including some functions not being listed as signal
> safe). Just arguing from that isn't really helpful. But there's plenty
> hard evidence, including a few messages upthread!, of it being
> practically problematic. Just looking at the reasoning at why exit (and
> malloc) aren't signal safe however, makes it clear that this particular
> restriction should be adhered to, however.
ISTM that no-one has any great ideas on what to do about the ereport()
in quickdie(). But I think we have consensus on replacing the exit(2)
calls with _exit(2). If we do just that, it would be better than the
status quo, even if it doesn't completely fix the problem. This would
prevent the case that Asim reported, for starters.
Any objections to the attached?
With _exit(), I think we wouldn't really need the
"PG_SETMASK(&BlockSig);" calls in the aux process *_quickdie handlers
that don't do anything else than call _exit(). But I didn't dare to
remove them yet.
Seems there is opportunity to actuall consolidate all those quickdies as well. As I see no difference between the auxiliary process quickdie implementations. Only backend `quickdie()` has special code for ereport and all.
On Fri, Jul 20, 2018 at 4:53 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > ISTM that no-one has any great ideas on what to do about the ereport() in > quickdie(). But I think we have consensus on replacing the exit(2) calls > with _exit(2). If we do just that, it would be better than the status quo, > even if it doesn't completely fix the problem. This would prevent the case > that Asim reported, for starters. +1 for trying to improve this by using _exit rather than exit, and for not letting the perfect be the enemy of the good. But -1 for copying the language "if some idiot DBA sends a manual SIGQUIT to a random backend". I think that phrase could be deleted from this comment -- and all of the other places where this comment appears already today -- without losing any useful informational content. Or it could be rephrased to "if this process receives a SIGQUIT". It's just not necessary to call somebody an idiot to communicate the point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket
From
Heikki Linnakangas
Date:
On 20/07/18 18:03, Andres Freund wrote: > On 2018-07-20 11:53:32 +0300, Heikki Linnakangas wrote: >> But I think we have consensus on replacing the exit(2) calls >> with _exit(2). If we do just that, it would be better than the status quo, >> even if it doesn't completely fix the problem. This would prevent the case >> that Asim reported, for starters. > > Right. You're planning to backpatch? Yeah. Committed and back-patched. >> With _exit(), I think we wouldn't really need the "PG_SETMASK(&BlockSig);" >> calls in the aux process *_quickdie handlers that don't do anything else >> than call _exit(). But I didn't dare to remove them yet. > > I think we should remove it at least in master. Removed. > It's much less the exit() that's unsafe, than the callbacks themselves, > right? Perhaps just restate that we wouldn't want to trigger atexit > processing due to signal safety? Well, in principle exit() is unsafe too, although you're right that in practice it's more likely the callbacks that would cause trouble. I reworded the comment to put more emphasis on the callbacks. On 23/07/18 17:34, Robert Haas wrote: > +1 for trying to improve this by using _exit rather than exit, and for > not letting the perfect be the enemy of the good. > > But -1 for copying the language "if some idiot DBA sends a manual > SIGQUIT to a random backend". I think that phrase could be deleted > from this comment -- and all of the other places where this comment > appears already today -- without losing any useful informational > content. Or it could be rephrased to "if this process receives a > SIGQUIT". It's just not necessary to call somebody an idiot to > communicate the point. Rephrased to be less offensive. So, with this commit, the various SIGQUIT quickdie handlers are in a better shape. The regular backend's quickdie() handler still calls ereport(), which is not safe, but it's a step in the right direction. And we haven't addressed the original complaint, which was actually about startup_die(), not quickdie(). What should we do about startup_die()? Ideas? - Heikki
On Wed, Aug 08, 2018 at 07:19:34PM +0300, Heikki Linnakangas wrote: > On 20/07/18 18:03, Andres Freund wrote: > >It's much less the exit() that's unsafe, than the callbacks themselves, > >right? Perhaps just restate that we wouldn't want to trigger atexit > >processing due to signal safety? > > Well, in principle exit() is unsafe too, although you're right that in > practice it's more likely the callbacks that would cause trouble. I reworded > the comment to put more emphasis on the callbacks. It's unsafe because it will: - flush stdio buffers - call atexit handlers (which might, e.g., free()) - who knows what else It's easy enough to decide to exit() or _exit(). If you tell can't which to call from lexical context, then use a global volatile sig_atomic_t variable to indicate that we're exiting from a signal handler and check that variable in the quickdie() functions. > So, with this commit, the various SIGQUIT quickdie handlers are in a better > shape. The regular backend's quickdie() handler still calls ereport(), which > is not safe, but it's a step in the right direction. And we haven't > addressed the original complaint, which was actually about startup_die(), > not quickdie(). Yes. Would that snprintf() and vsnprintf() were async-signal-safe -- they can be, and some implementations probably are, but they aren't required to be, then making ereport() safe would be easier. Nico --
On Wed, Aug 08, 2018 at 11:47:34AM -0500, Nico Williams wrote: > Yes. Would that snprintf() and vsnprintf() were async-signal-safe -- > they can be, and some implementations probably are, but they aren't > required to be, then making ereport() safe would be easier. So, I took a look at glibc's implementation for giggles. It uses malloc() (and free()) only in three cases: a) when printing floating point numbers with very large/small exponents, b) when formatting long wide strings into multi-byte strings, c) when formatting specifiers have width asterisks. Assuming locales are not lazily loaded, I think that's probably all the reasonable cases where vsnprintf() could call async-signal-unsafe functions or do async-signal-unsafe things. Considering that PostgreSQL already has async-signal-unsafe signal handlers, might as well assume that vsnprintf() is async-signal-safe and just format strings into alloca()'ed buffers or even into fixed-sized automatic char arrays, and be done. Perhaps when a global volatile sig_atomic_t is set denoting we're handling a signal, then use vsnprintf() with a fixed-sized automatic buffer, otherwise malloc() one. Nico --
Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket
From
Heikki Linnakangas
Date:
On 08/08/18 19:19, Heikki Linnakangas wrote: > So, with this commit, the various SIGQUIT quickdie handlers are in a > better shape. The regular backend's quickdie() handler still calls > ereport(), which is not safe, but it's a step in the right direction. > And we haven't addressed the original complaint, which was actually > about startup_die(), not quickdie(). > > What should we do about startup_die()? Ideas? To recap: If a backend process receives a SIGTERM, or if the authentication timeout expires, while the backend is reading the startup packet, we call proc_exit(0) from the signal handler. proc_exit(0) is clearly not async-safe, so if the process was busy modifying backend state, for example doing a memory allocation, bad things can happen. In the beginning of this thread, Jimmy demonstrated a self-deadlock, when the backend was in the middle of proc_exit(), when the signal arrived, and a nested call to proc_exit was made. I think the proper fix here is to stop calling proc_exit(0) directly from the startup_die() signal handler. Instead, let's install the regular die() signal handler earlier, before reading the startup process, and rely on the usual CHECK_FOR_INTERRUPTS() mechanism. One annoying thing is the pg_getnameinfo_all() call, in BackendInitialize(). If 'log_hostname' is enabled, it can perform a DNS lookup, which can take a very long time. It would be nice to still be able to interrupt that, but if we stop calling proc_exit() from the signal handler, we won't. I don't think it's safe to interrupt it, like we do today, but I wonder if the cure is worse than the disease. We revamped the signal handling in backend processes in PostgreSQL 9.5, so I'm inclined to only backpatch this to 9.5. In 9.3 and 9.4, let's just add a '!proc_exit_in_progress' check to the signal handler, to prevent the nested proc_eixt() call that Jimmy ran into from happening. It won't fix the general problem, but since we haven't heard any other reports about this, I think that's the right amount of effort for 9.3 and 9.4. - Heikki