Re: common signal handler protection - Mailing list pgsql-hackers

From Andres Freund
Subject Re: common signal handler protection
Date
Msg-id 20231122225945.3kgclsgz5lqmtnan@awork3.anarazel.de
Whole thread Raw
In response to Re: common signal handler protection  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: common signal handler protection
List pgsql-hackers
Hi,

On 2023-11-22 15:59:44 -0600, Nathan Bossart wrote:
> Subject: [PATCH v2 1/3] Check that MyProcPid == getpid() in all signal
>  handlers.
> 
> In commit 97550c0711, we added a similar check to the SIGTERM
> handler for the startup process.  This commit adds this check to
> all signal handlers installed with pqsignal().  This is done by
> using a wrapper function that performs the check before calling the
> actual handler.
> 
> The hope is that this will offer more general protection against
> grandchildren processes inadvertently modifying shared memory due
> to inherited signal handlers.

It's a bit unclear here what grandchildren refers to - it's presumably in
reference to postmaster, but the preceding text doesn't even mention
postmaster. I'd probably just say "child processes of the current process.


> +
> +#ifdef PG_SIGNAL_COUNT            /* Windows */
> +#define PG_NSIG (PG_SIGNAL_COUNT)
> +#elif defined(NSIG)
> +#define PG_NSIG (NSIG)
> +#else
> +#define PG_NSIG (64)            /* XXX: wild guess */
> +#endif

Perhaps worth adding a static assert for at least a few common types of
signals being below that value? That way we'd see a potential issue without
needing to reach the code path.


> +/*
> + * Except when called with SIG_IGN or SIG_DFL, pqsignal() sets up this function
> + * as the handler for all signals.  This wrapper handler function checks that
> + * it is called within a process that the server knows about, and not a
> + * grandchild process forked by system(3), etc.

Similar comment to earlier - the grandchildren bit seems like a dangling
reference. And also too specific - I think we could encounter this in single
user mode as well?

Perhaps it could be reframed to "postgres processes, as determined by having
called InitProcessGlobals()"?


>This check ensures that such
> + * grandchildren processes do not modify shared memory, which could be
> + * detrimental.

"could be" seems a bit euphemistic :)


> From b77da9c54590a71eb9921d6f16bf4ffb0543e87e Mon Sep 17 00:00:00 2001
> From: Nathan Bossart <nathan@postgresql.org>
> Date: Fri, 17 Nov 2023 14:00:12 -0600
> Subject: [PATCH v2 2/3] Centralize logic for restoring errno in signal
>  handlers.
> 
> Presently, we rely on each individual signal handler to save the
> initial value of errno and then restore it before returning if
> needed.  This is easily forgotten and, if missed, often goes
> undetected for a long time.

It's also just verbose :)


> From 5734e0cf5f00bbd74504b45934f68e1c2c1290bd Mon Sep 17 00:00:00 2001
> From: Nathan Bossart <nathan@postgresql.org>
> Date: Fri, 17 Nov 2023 22:09:24 -0600
> Subject: [PATCH v2 3/3] Revert "Avoid calling proc_exit() in processes forked
>  by system()."
> 
> Thanks to commit XXXXXXXXXX, this check in the SIGTERM handler for
> the startup process is now obsolete and can be removed.  Instead of
> leaving around the elog(PANIC, ...) calls that are now unlikely to
> be triggered and the dead function write_stderr_signal_safe(), I've
> opted to just remove them for now.  Thanks to modern version
> control software, it will be trivial to dig those up if they are
> ever needed in the future.
> 
> This reverts commit 97550c0711972a9856b5db751539bbaf2f88884c.
> 
> Reviewed-by: ???
> Discussion: ???
> ---
>  src/backend/postmaster/startup.c | 17 +----------------
>  src/backend/storage/ipc/ipc.c    |  4 ----
>  src/backend/storage/lmgr/proc.c  |  8 --------
>  src/backend/utils/error/elog.c   | 28 ----------------------------
>  src/include/utils/elog.h         |  6 ------
>  5 files changed, 1 insertion(+), 62 deletions(-)
> 
> diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
> index 83dbed86b9..f40acd20ff 100644
> --- a/src/backend/postmaster/startup.c
> +++ b/src/backend/postmaster/startup.c
> @@ -19,8 +19,6 @@
>   */
>  #include "postgres.h"
>  
> -#include <unistd.h>
> -
>  #include "access/xlog.h"
>  #include "access/xlogrecovery.h"
>  #include "access/xlogutils.h"
> @@ -113,20 +111,7 @@ static void
>  StartupProcShutdownHandler(SIGNAL_ARGS)
>  {
>      if (in_restore_command)
> -    {
> -        /*
> -         * If we are in a child process (e.g., forked by system() in
> -         * RestoreArchivedFile()), we don't want to call any exit callbacks.
> -         * The parent will take care of that.
> -         */
> -        if (MyProcPid == (int) getpid())
> -            proc_exit(1);
> -        else
> -        {
> -            write_stderr_signal_safe("StartupProcShutdownHandler() called in child process\n");
> -            _exit(1);
> -        }
> -    }
> +        proc_exit(1);
>      else
>          shutdown_requested = true;
>      WakeupRecovery();

Hm. I wonder if this indicates an issue.  Do the preceding changes perhaps
make it more likely that a child process of the startup process could hang
around for longer, because the signal we're delivering doesn't terminate child
processes, because we'd just reset the signal handler?  I think it's fine for
the startup process, because we ask the startup process to shut down with
SIGTERM, which defaults to exiting.

But we do have a few processes that we do ask to shut down with other
signals, wich do not trigger an exit by default, e.g. Checkpointer, archiver,
walsender are asked to shut down using SIGUSR2 IIRC.  The only one where that
could be an issue is archiver, I guess?


> diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c
> index 6591b5d6a8..1904d21795 100644
> --- a/src/backend/storage/ipc/ipc.c
> +++ b/src/backend/storage/ipc/ipc.c
> @@ -103,10 +103,6 @@ static int    on_proc_exit_index,
>  void
>  proc_exit(int code)
>  {
> -    /* not safe if forked by system(), etc. */
> -    if (MyProcPid != (int) getpid())
> -        elog(PANIC, "proc_exit() called in child process");
> -
>      /* Clean up everything that must be cleaned up */
>      proc_exit_prepare(code);

> diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
> index e9e445bb21..5b663a2997 100644
> --- a/src/backend/storage/lmgr/proc.c
> +++ b/src/backend/storage/lmgr/proc.c
> @@ -806,10 +806,6 @@ ProcKill(int code, Datum arg)
>  
>      Assert(MyProc != NULL);
>  
> -    /* not safe if forked by system(), etc. */
> -    if (MyProc->pid != (int) getpid())
> -        elog(PANIC, "ProcKill() called in child process");
> -
>      /* Make sure we're out of the sync rep lists */
>      SyncRepCleanupAtProcExit();
>  
> @@ -930,10 +926,6 @@ AuxiliaryProcKill(int code, Datum arg)
>  
>      Assert(proctype >= 0 && proctype < NUM_AUXILIARY_PROCS);
>  
> -    /* not safe if forked by system(), etc. */
> -    if (MyProc->pid != (int) getpid())
> -        elog(PANIC, "AuxiliaryProcKill() called in child process");
> -
>      auxproc = &AuxiliaryProcs[proctype];
>  
>      Assert(MyProc == auxproc);

I think we should leave these checks. It's awful to debug situations where a
proc gets reused and the cost of the checks is irrelevant. The checks don't
just protect against child processes, they also protect e.g. against calling
those functions twice (we IIRC had cases of that).

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Changing references of password encryption to hashing
Next
From: Peter Geoghegan
Date:
Subject: Re: Optionally using a better backtrace library?