Thread: stopgap fix for signal handling during restore_command
On Wed, Feb 22, 2023 at 09:48:10PM +1300, Thomas Munro wrote: > On Tue, Feb 21, 2023 at 5:50 PM Nathan Bossart <nathandbossart@gmail.com> wrote: >> I'm happy to create a new thread if needed, but I can't tell if there is >> any interest in this stopgap/back-branch fix. Perhaps we should just jump >> straight to the long-term fix that Thomas is looking into. > > Unfortunately the latch-friendly subprocess module proposal I was > talking about would be for 17. I may post a thread fairly soon with > design ideas + list of problems and decision points as I see them, and > hopefully some sketch code, but it won't be a proposal for [/me checks > calendar] next week's commitfest and probably wouldn't be appropriate > in a final commitfest anyway, and I also have some other existing > stuff to clear first. So please do continue with the stopgap ideas. Okay, here is a new thread... Since v8.4, the startup process will proc_exit() immediately within its SIGTERM handler while the restore_command executes via system(). Some recent changes added unsafe code to the section where this behavior is enabled [0]. The long-term fix likely includes moving away from system() completely, but we may want to have a stopgap/back-branch fix while that is under development. I've attached a patch set for a proposed stopgap fix. 0001 simply moves the extra code outside of the Pre/PostRestoreCommand() block so that only system() is executed while the SIGTERM handler might proc_exit(). This restores the behavior that was in place from v8.4 to v14, so I don't expect it to be too controversial. 0002 adds code to startup's SIGTERM handler to call _exit() instead of proc_exit() if we are in a forked process from system(), etc. It also adds assertions to ensure proc_exit(), ProcKill(), and AuxiliaryProcKill() are not called within such forked processes. Thoughts? [0] https://postgr.es/m/20230201105514.rsjl4bnhb65giyvo%40alap3.anarazel.de -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Fri, Feb 24, 2023 at 12:15 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > Thoughts? I think you should have a trailing \n when writing to stderr. Here's that reproducer I speculated about (sorry I confused SIGQUIT and SIGTERM in my earlier email, ENOCOFFEE). Seems to do the job, and I tested on a Linux box for good measure. If you comment out the kill(), "check PROVE_TESTS=t/002_archiving.pl" works fine (demonstrating that that definition of system() works fine). With the kill(), it reliably reaches 'TRAP: failed Assert("latch->owner_pid == MyProcPid")' without your patch, and with your patch it avoids it. (I believe glibc's system() could reach it too with the right timing, but I didn't try, my point being that the use of the OpenBSD system() here is only because it's easier to grok and to wrangle.)
Attachment
On Fri, Feb 24, 2023 at 1:25 PM Thomas Munro <thomas.munro@gmail.com> wrote: > ENOCOFFEE Erm, I realised after sending that I'd accidentally sent a version that uses fork() anyway, and now if I change it back to vfork() it doesn't fail the way I wanted to demonstrate, at least on Linux. I don't have time or desire to dig into how Linux vfork() really works so I'll leave it at that... but the patch as posted does seem to be a useful tool for understanding this failure... please just ignore the confused comments about fork() vs vfork() therein.
On Fri, Feb 24, 2023 at 01:25:01PM +1300, Thomas Munro wrote: > I think you should have a trailing \n when writing to stderr. Oops. I added that in v7. > Here's that reproducer I speculated about (sorry I confused SIGQUIT > and SIGTERM in my earlier email, ENOCOFFEE). Seems to do the job, and > I tested on a Linux box for good measure. If you comment out the > kill(), "check PROVE_TESTS=t/002_archiving.pl" works fine > (demonstrating that that definition of system() works fine). With the > kill(), it reliably reaches 'TRAP: failed Assert("latch->owner_pid == > MyProcPid")' without your patch, and with your patch it avoids it. (I > believe glibc's system() could reach it too with the right timing, but > I didn't try, my point being that the use of the OpenBSD system() here > is only because it's easier to grok and to wrangle.) Thanks for providing the reproducer! I am seeing the behavior that you described on my Linux machine. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 2023-02-23 20:33:23 -0800, Nathan Bossart wrote:> > if (in_restore_command) > - proc_exit(1); > + { > + /* > + * If we are in a child process (e.g., forked by system() in > + * RestoreArchivedFile()), we don't want to call any exit callbacks. > + * The parent will take care of that. > + */ > + if (MyProcPid == (int) getpid()) > + proc_exit(1); > + else > + { > + const char msg[] = "StartupProcShutdownHandler() called in child process\n"; > + int rc pg_attribute_unused(); > + > + rc = write(STDERR_FILENO, msg, sizeof(msg)); > + _exit(1); > + } > + } Why do we need that rc variable? Don't we normally get away with (void) write(...)? > diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c > index 22b4278610..e3da0622d7 100644 > --- a/src/backend/storage/lmgr/proc.c > +++ b/src/backend/storage/lmgr/proc.c > @@ -805,6 +805,7 @@ ProcKill(int code, Datum arg) > dlist_head *procgloballist; > > Assert(MyProc != NULL); > + Assert(MyProcPid == (int) getpid()); /* not safe if forked by system(), etc. */ > > /* Make sure we're out of the sync rep lists */ > SyncRepCleanupAtProcExit(); > @@ -925,6 +926,7 @@ AuxiliaryProcKill(int code, Datum arg) > PGPROC *proc; > > Assert(proctype >= 0 && proctype < NUM_AUXILIARY_PROCS); > + Assert(MyProcPid == (int) getpid()); /* not safe if forked by system(), etc. */ > > auxproc = &AuxiliaryProcs[proctype]; > > -- > 2.25.1 I think the much more interesting assertion here would be to check that MyProc->pid equals the current pid. Greetings, Andres Freund
On Sat, Feb 25, 2023 at 11:07:42AM -0800, Andres Freund wrote: > On 2023-02-23 20:33:23 -0800, Nathan Bossart wrote:> >> if (in_restore_command) >> - proc_exit(1); >> + { >> + /* >> + * If we are in a child process (e.g., forked by system() in >> + * RestoreArchivedFile()), we don't want to call any exit callbacks. >> + * The parent will take care of that. >> + */ >> + if (MyProcPid == (int) getpid()) >> + proc_exit(1); >> + else >> + { >> + const char msg[] = "StartupProcShutdownHandler() called in child process\n"; >> + int rc pg_attribute_unused(); >> + >> + rc = write(STDERR_FILENO, msg, sizeof(msg)); >> + _exit(1); >> + } >> + } > > Why do we need that rc variable? Don't we normally get away with (void) > write(...)? My compiler complains about that. :/ ../postgresql/src/backend/postmaster/startup.c: In function ‘StartupProcShutdownHandler’: ../postgresql/src/backend/postmaster/startup.c:139:11: error: ignoring return value of ‘write’, declared with attributewarn_unused_result [-Werror=unused-result] 139 | (void) write(STDERR_FILENO, msg, sizeof(msg)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors >> diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c >> index 22b4278610..e3da0622d7 100644 >> --- a/src/backend/storage/lmgr/proc.c >> +++ b/src/backend/storage/lmgr/proc.c >> @@ -805,6 +805,7 @@ ProcKill(int code, Datum arg) >> dlist_head *procgloballist; >> >> Assert(MyProc != NULL); >> + Assert(MyProcPid == (int) getpid()); /* not safe if forked by system(), etc. */ >> >> /* Make sure we're out of the sync rep lists */ >> SyncRepCleanupAtProcExit(); >> @@ -925,6 +926,7 @@ AuxiliaryProcKill(int code, Datum arg) >> PGPROC *proc; >> >> Assert(proctype >= 0 && proctype < NUM_AUXILIARY_PROCS); >> + Assert(MyProcPid == (int) getpid()); /* not safe if forked by system(), etc. */ >> >> auxproc = &AuxiliaryProcs[proctype]; >> >> -- >> 2.25.1 > > I think the much more interesting assertion here would be to check that > MyProc->pid equals the current pid. I don't mind changing this, but why is this a more interesting assertion? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Sat, Feb 25, 2023 at 11:28:25AM -0800, Nathan Bossart wrote: > On Sat, Feb 25, 2023 at 11:07:42AM -0800, Andres Freund wrote: >> I think the much more interesting assertion here would be to check that >> MyProc->pid equals the current pid. > > I don't mind changing this, but why is this a more interesting assertion? Here is a new patch set with this change. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 2023-02-25 11:28:25 -0800, Nathan Bossart wrote: > On Sat, Feb 25, 2023 at 11:07:42AM -0800, Andres Freund wrote: > > Why do we need that rc variable? Don't we normally get away with (void) > > write(...)? > > My compiler complains about that. :/ > > ../postgresql/src/backend/postmaster/startup.c: In function ‘StartupProcShutdownHandler’: > ../postgresql/src/backend/postmaster/startup.c:139:11: error: ignoring return value of ‘write’, declared with attributewarn_unused_result [-Werror=unused-result] > 139 | (void) write(STDERR_FILENO, msg, sizeof(msg)); > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > cc1: all warnings being treated as errors Ick. I guess we've already encountered this, because we've apparently removed all the (void) write cases. Which I am certain we had at some point. We still do it for a bunch of other functions though. Ah, yes: aa90e148ca7, 27314d32a88, 6c72a28e5ce etc. I think I opined on this before, but we really ought to have a function to do some minimal signal safe output. Implemented centrally, instead of being open coded in a bunch of places. > >> diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c > >> index 22b4278610..e3da0622d7 100644 > >> --- a/src/backend/storage/lmgr/proc.c > >> +++ b/src/backend/storage/lmgr/proc.c > >> @@ -805,6 +805,7 @@ ProcKill(int code, Datum arg) > >> dlist_head *procgloballist; > >> > >> Assert(MyProc != NULL); > >> + Assert(MyProcPid == (int) getpid()); /* not safe if forked by system(), etc. */ > >> > >> /* Make sure we're out of the sync rep lists */ > >> SyncRepCleanupAtProcExit(); > >> @@ -925,6 +926,7 @@ AuxiliaryProcKill(int code, Datum arg) > >> PGPROC *proc; > >> > >> Assert(proctype >= 0 && proctype < NUM_AUXILIARY_PROCS); > >> + Assert(MyProcPid == (int) getpid()); /* not safe if forked by system(), etc. */ > >> > >> auxproc = &AuxiliaryProcs[proctype]; > >> > >> -- > >> 2.25.1 > > > > I think the much more interesting assertion here would be to check that > > MyProc->pid equals the current pid. > > I don't mind changing this, but why is this a more interesting assertion? Because we so far have little to no protection against some state corruption leading to releasing PGPROC that's not ours. I didn't actually mean that we shouldn't check that MyProcPid == (int) getpid(), just that the much more interesting case to check is that MyProc->pid matches, because that protect against multiple releases, releasing the wrong slot, etc. Greetings, Andres Freund
On Sat, Feb 25, 2023 at 11:52:53AM -0800, Andres Freund wrote: > I think I opined on this before, but we really ought to have a function to do > some minimal signal safe output. Implemented centrally, instead of being open > coded in a bunch of places. While looking around for the right place to put this, I noticed that there's a write_stderr() function in elog.c that we might be able to use. I used that in v9. WDYT? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 2023-02-25 14:06:29 -0800, Nathan Bossart wrote: > On Sat, Feb 25, 2023 at 11:52:53AM -0800, Andres Freund wrote: > > I think I opined on this before, but we really ought to have a function to do > > some minimal signal safe output. Implemented centrally, instead of being open > > coded in a bunch of places. > > While looking around for the right place to put this, I noticed that > there's a write_stderr() function in elog.c that we might be able to use. > I used that in v9. WDYT? write_stderr() isn't signal safe, from what I can tell.
On Sun, Feb 26, 2023 at 10:00:29AM -0800, Andres Freund wrote: > On 2023-02-25 14:06:29 -0800, Nathan Bossart wrote: >> On Sat, Feb 25, 2023 at 11:52:53AM -0800, Andres Freund wrote: >> > I think I opined on this before, but we really ought to have a function to do >> > some minimal signal safe output. Implemented centrally, instead of being open >> > coded in a bunch of places. >> >> While looking around for the right place to put this, I noticed that >> there's a write_stderr() function in elog.c that we might be able to use. >> I used that in v9. WDYT? > > write_stderr() isn't signal safe, from what I can tell. *facepalm* Sorry. What precisely did you have in mind? AFAICT you are asking for a wrapper around write(). -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Hi, On 2023-02-26 11:39:00 -0800, Nathan Bossart wrote: > On Sun, Feb 26, 2023 at 10:00:29AM -0800, Andres Freund wrote: > > On 2023-02-25 14:06:29 -0800, Nathan Bossart wrote: > >> On Sat, Feb 25, 2023 at 11:52:53AM -0800, Andres Freund wrote: > >> > I think I opined on this before, but we really ought to have a function to do > >> > some minimal signal safe output. Implemented centrally, instead of being open > >> > coded in a bunch of places. > >> > >> While looking around for the right place to put this, I noticed that > >> there's a write_stderr() function in elog.c that we might be able to use. > >> I used that in v9. WDYT? > > > > write_stderr() isn't signal safe, from what I can tell. > > *facepalm* Sorry. > > What precisely did you have in mind? AFAICT you are asking for a wrapper > around write(). Partially I just want something that can easily be searched for, that can have comments attached to it documenting why what it is doing is safe. It'd not be a huge amount of work to have a slow and restricted string interpolation support, to make it easier to write messages. Converting floats is probably too hard to do safely, and I'm not sure %m can safely be supported. But basic things like %d would be pretty simple. Basically a loop around the format string that directly writes to stderr using write(), and only supports a signal safe subset of normal format strings. Greetings, Andres Freund
On Sun, Feb 26, 2023 at 12:12:27PM -0800, Andres Freund wrote: > On 2023-02-26 11:39:00 -0800, Nathan Bossart wrote: >> What precisely did you have in mind? AFAICT you are asking for a wrapper >> around write(). > > Partially I just want something that can easily be searched for, that can have > comments attached to it documenting why what it is doing is safe. > > It'd not be a huge amount of work to have a slow and restricted string > interpolation support, to make it easier to write messages. Converting floats > is probably too hard to do safely, and I'm not sure %m can safely be > supported. But basic things like %d would be pretty simple. > > Basically a loop around the format string that directly writes to stderr using > write(), and only supports a signal safe subset of normal format strings. Got it, thanks. I will try to put something together along these lines, although I don't know if I'll pick up the interpolation support in this thread. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Tue, Feb 28, 2023 at 08:36:03PM -0800, Nathan Bossart wrote: > On Sun, Feb 26, 2023 at 12:12:27PM -0800, Andres Freund wrote: >> Partially I just want something that can easily be searched for, that can have >> comments attached to it documenting why what it is doing is safe. >> >> It'd not be a huge amount of work to have a slow and restricted string >> interpolation support, to make it easier to write messages. Converting floats >> is probably too hard to do safely, and I'm not sure %m can safely be >> supported. But basic things like %d would be pretty simple. >> >> Basically a loop around the format string that directly writes to stderr using >> write(), and only supports a signal safe subset of normal format strings. > > Got it, thanks. I will try to put something together along these lines, > although I don't know if I'll pick up the interpolation support in this > thread. Here is an attempt at adding a signal safe function for writing to STDERR. I didn't add support for format strings, but looking ahead, I think one challenge will be avoiding va_start() and friends. In any case, IMO format string support probably deserves its own thread. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 2023-03-01 14:47:51 -0800, Nathan Bossart wrote: > On Tue, Feb 28, 2023 at 08:36:03PM -0800, Nathan Bossart wrote: > > On Sun, Feb 26, 2023 at 12:12:27PM -0800, Andres Freund wrote: > >> Partially I just want something that can easily be searched for, that can have > >> comments attached to it documenting why what it is doing is safe. > >> > >> It'd not be a huge amount of work to have a slow and restricted string > >> interpolation support, to make it easier to write messages. Converting floats > >> is probably too hard to do safely, and I'm not sure %m can safely be > >> supported. But basic things like %d would be pretty simple. > >> > >> Basically a loop around the format string that directly writes to stderr using > >> write(), and only supports a signal safe subset of normal format strings. > > > > Got it, thanks. I will try to put something together along these lines, > > although I don't know if I'll pick up the interpolation support in this > > thread. > > Here is an attempt at adding a signal safe function for writing to STDERR. Cool. > I didn't add support for format strings, but looking ahead, I think one > challenge will be avoiding va_start() and friends. In any case, IMO format > string support probably deserves its own thread. Makes sense to split that off. FWIW, I think we could rely on va_start() et al to be signal safe. The standardese isn't super clear about this, because they aren't functions, and posix only talks about functions being async signal safe... Greetings, Andres Freund
On Wed, Mar 01, 2023 at 03:13:04PM -0800, Andres Freund wrote: > FWIW, I think we could rely on va_start() et al to be signal safe. The > standardese isn't super clear about this, because they aren't functions, and > posix only talks about functions being async signal safe... Good to know. I couldn't tell whether that was a safe assumption from briefly reading around. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Wed, Mar 01, 2023 at 03:13:04PM -0800, Andres Freund wrote: > On 2023-03-01 14:47:51 -0800, Nathan Bossart wrote: >> Here is an attempt at adding a signal safe function for writing to STDERR. > > Cool. I'm gently bumping this thread to see if anyone had additional feedback on the proposed patches [0]. The intent was to back-patch these as needed and to pursue a long-term fix in v17. Are there any concerns with that? [0] https://postgr.es/m/20230301224751.GA1823946%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On 22.04.23 01:07, Nathan Bossart wrote: > On Wed, Mar 01, 2023 at 03:13:04PM -0800, Andres Freund wrote: >> On 2023-03-01 14:47:51 -0800, Nathan Bossart wrote: >>> Here is an attempt at adding a signal safe function for writing to STDERR. >> >> Cool. > > I'm gently bumping this thread to see if anyone had additional feedback on > the proposed patches [0]. The intent was to back-patch these as needed and > to pursue a long-term fix in v17. Are there any concerns with that? > > [0] https://postgr.es/m/20230301224751.GA1823946%40nathanxps13 Is this still being contemplated? What is the status of this?
On Sun, Oct 01, 2023 at 08:50:15PM +0200, Peter Eisentraut wrote: > Is this still being contemplated? What is the status of this? I'll plan on committing this in the next couple of days. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On 2023-03-01 14:47:51 -0800, Nathan Bossart wrote: > Subject: [PATCH v10 1/2] Move extra code out of the Pre/PostRestoreCommand() > block. LGTM > From fb6957da01f11b75d1a1966f32b00e2e77c789a0 Mon Sep 17 00:00:00 2001 > From: Nathan Bossart <nathandbossart@gmail.com> > Date: Tue, 14 Feb 2023 09:44:53 -0800 > Subject: [PATCH v10 2/2] Don't proc_exit() in startup's SIGTERM handler if > forked by system(). > > Instead, emit a message to STDERR and _exit() in this case. This > change also adds assertions to proc_exit(), ProcKill(), and > AuxiliaryProcKill() to verify that these functions are not called > by a process forked by system(), etc. > --- > src/backend/postmaster/startup.c | 17 ++++++++++++++++- > src/backend/storage/ipc/ipc.c | 3 +++ > src/backend/storage/lmgr/proc.c | 2 ++ > src/backend/utils/error/elog.c | 28 ++++++++++++++++++++++++++++ > src/include/utils/elog.h | 6 +----- > 5 files changed, 50 insertions(+), 6 deletions(-) > diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c > index 22b4278610..b9e2c3aafe 100644 > --- a/src/backend/storage/lmgr/proc.c > +++ b/src/backend/storage/lmgr/proc.c > @@ -805,6 +805,7 @@ ProcKill(int code, Datum arg) > dlist_head *procgloballist; > > Assert(MyProc != NULL); > + Assert(MyProc->pid == (int) getpid()); /* not safe if forked by system(), etc. */ > > /* Make sure we're out of the sync rep lists */ > SyncRepCleanupAtProcExit(); > @@ -925,6 +926,7 @@ AuxiliaryProcKill(int code, Datum arg) > PGPROC *proc; > > Assert(proctype >= 0 && proctype < NUM_AUXILIARY_PROCS); > + Assert(MyProc->pid == (int) getpid()); /* not safe if forked by system(), etc. */ > > auxproc = &AuxiliaryProcs[proctype]; > I'd make these elog(PANIC), I think. The paths are not performance critical enough that a single branch hurts, so the overhead of the check is irrelevant, and the consequences of calling ProcKill() twice for the same process are very severe. > +/* > + * Write a message to STDERR using only async-signal-safe functions. This can > + * be used to safely emit a message from a signal handler. > + * > + * TODO: It is likely possible to safely do a limited amount of string > + * interpolation (e.g., %s and %d), but that is not presently supported. > + */ > +void > +write_stderr_signal_safe(const char *fmt) As is, this isn't a format, so I'd probably just name it s or str :) > -/* > - * Write errors to stderr (or by equal means when stderr is > - * not available). Used before ereport/elog can be used > - * safely (memory context, GUC load etc) > - */ > extern void write_stderr(const char *fmt,...) pg_attribute_printf(1, 2); > +extern void write_stderr_signal_safe(const char *fmt); Not sure why you removed the comment? Greetings, Andres Freund
On Tue, Oct 10, 2023 at 04:40:28PM -0700, Andres Freund wrote: > On 2023-03-01 14:47:51 -0800, Nathan Bossart wrote: >> diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c >> index 22b4278610..b9e2c3aafe 100644 >> --- a/src/backend/storage/lmgr/proc.c >> +++ b/src/backend/storage/lmgr/proc.c >> @@ -805,6 +805,7 @@ ProcKill(int code, Datum arg) >> dlist_head *procgloballist; >> >> Assert(MyProc != NULL); >> + Assert(MyProc->pid == (int) getpid()); /* not safe if forked by system(), etc. */ >> >> /* Make sure we're out of the sync rep lists */ >> SyncRepCleanupAtProcExit(); >> @@ -925,6 +926,7 @@ AuxiliaryProcKill(int code, Datum arg) >> PGPROC *proc; >> >> Assert(proctype >= 0 && proctype < NUM_AUXILIARY_PROCS); >> + Assert(MyProc->pid == (int) getpid()); /* not safe if forked by system(), etc. */ >> >> auxproc = &AuxiliaryProcs[proctype]; >> > > I'd make these elog(PANIC), I think. The paths are not performance critical > enough that a single branch hurts, so the overhead of the check is irrelevant, > and the consequences of calling ProcKill() twice for the same process are very > severe. Right. Should we write_stderr_signal_safe() and then abort() to keep these paths async-signal-safe? >> +/* >> + * Write a message to STDERR using only async-signal-safe functions. This can >> + * be used to safely emit a message from a signal handler. >> + * >> + * TODO: It is likely possible to safely do a limited amount of string >> + * interpolation (e.g., %s and %d), but that is not presently supported. >> + */ >> +void >> +write_stderr_signal_safe(const char *fmt) > > As is, this isn't a format, so I'd probably just name it s or str :) Yup. >> -/* >> - * Write errors to stderr (or by equal means when stderr is >> - * not available). Used before ereport/elog can be used >> - * safely (memory context, GUC load etc) >> - */ >> extern void write_stderr(const char *fmt,...) pg_attribute_printf(1, 2); >> +extern void write_stderr_signal_safe(const char *fmt); > > Not sure why you removed the comment? I think it was because it's an exact copy of the comment above the function in elog.c, and I didn't want to give the impression that it applied to the signal-safe one, too. I added it back along with a new comment for write_stderr_signal_safe(). -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Tue, Oct 10, 2023 at 09:54:18PM -0500, Nathan Bossart wrote: > On Tue, Oct 10, 2023 at 04:40:28PM -0700, Andres Freund wrote: >> I'd make these elog(PANIC), I think. The paths are not performance critical >> enough that a single branch hurts, so the overhead of the check is irrelevant, >> and the consequences of calling ProcKill() twice for the same process are very >> severe. > > Right. Should we write_stderr_signal_safe() and then abort() to keep these > paths async-signal-safe? Hm. I see that elog() is called elsewhere in proc_exit(), and it does not appear to be async-signal-safe. Am I missing something? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Hi, On 2023-10-10 22:29:34 -0500, Nathan Bossart wrote: > On Tue, Oct 10, 2023 at 09:54:18PM -0500, Nathan Bossart wrote: > > On Tue, Oct 10, 2023 at 04:40:28PM -0700, Andres Freund wrote: > >> I'd make these elog(PANIC), I think. The paths are not performance critical > >> enough that a single branch hurts, so the overhead of the check is irrelevant, > >> and the consequences of calling ProcKill() twice for the same process are very > >> severe. > > > > Right. Should we write_stderr_signal_safe() and then abort() to keep these > > paths async-signal-safe? > > Hm. I see that elog() is called elsewhere in proc_exit(), and it does not > appear to be async-signal-safe. Am I missing something? We shouldn't call proc_exit() in a signal handler. We perhaps have a few remaining calls left, but we should (and I think in some cases are) working on removing those. Greetings, Andres Freund
On Tue, Oct 10, 2023 at 08:39:29PM -0700, Andres Freund wrote: > We shouldn't call proc_exit() in a signal handler. We perhaps have a few > remaining calls left, but we should (and I think in some cases are) working on > removing those. Hmm. I don't recall anything remaining, even after a quick check. FWIW, I was under the impression that Thomas' work done in 0da096d78e1e4 has cleaned up the last bits of that. -- Michael
Attachment
On Wed, Oct 11, 2023 at 01:02:14PM +0900, Michael Paquier wrote: > On Tue, Oct 10, 2023 at 08:39:29PM -0700, Andres Freund wrote: >> We shouldn't call proc_exit() in a signal handler. We perhaps have a few >> remaining calls left, but we should (and I think in some cases are) working on >> removing those. Got it. > Hmm. I don't recall anything remaining, even after a quick check. > FWIW, I was under the impression that Thomas' work done in > 0da096d78e1e4 has cleaned up the last bits of that. StartupProcShutdownHandler() remains, at least. Of the other items in Tom's list from 2020 [0], bgworker_die() and FloatExceptionHandler() are also still unsafe. RecoveryConflictInterrupt() should be fixed by 0da096d, and StandbyDeadLockHandler() and StandbyTimeoutHandler() should be fixed by 8900b5a and 8f1537d, respectively. [0] https://postgr.es/m/148145.1599703626%40sss.pgh.pa.us -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Committed and back-patched. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Tue, Oct 17, 2023 at 10:46:47AM -0500, Nathan Bossart wrote: > Committed and back-patched. ... and it looks like some of the back-branches are failing for Windows. I'm assuming this is because c290e79 was only back-patched to v15. My first instinct is just to back-patch that one all the way to v11, but maybe there's an alternative involving #ifdef WIN32. Are there any concerns with back-patching c290e79? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Nathan Bossart <nathandbossart@gmail.com> writes: > ... and it looks like some of the back-branches are failing for Windows. > I'm assuming this is because c290e79 was only back-patched to v15. My > first instinct is just to back-patch that one all the way to v11, but maybe > there's an alternative involving #ifdef WIN32. Are there any concerns with > back-patching c290e79? Sounds fine to me. regards, tom lane
On Tue, Oct 17, 2023 at 12:47:29PM -0400, Tom Lane wrote: > Nathan Bossart <nathandbossart@gmail.com> writes: >> ... and it looks like some of the back-branches are failing for Windows. >> I'm assuming this is because c290e79 was only back-patched to v15. My >> first instinct is just to back-patch that one all the way to v11, but maybe >> there's an alternative involving #ifdef WIN32. Are there any concerns with >> back-patching c290e79? > > Sounds fine to me. Thanks, done. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com