Re: Use of signal-unsafe functions from signal handlers - Mailing list pgsql-bugs

From Mats Kindahl
Subject Re: Use of signal-unsafe functions from signal handlers
Date
Msg-id CA+14426Zpdi6=jr9Z+C_tzS36pVa_+BbCHSeriqd=sLw4X6_iw@mail.gmail.com
Whole thread Raw
In response to Re: Use of signal-unsafe functions from signal handlers  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Use of signal-unsafe functions from signal handlers
List pgsql-bugs

On Tue, May 24, 2022 at 12:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Hi,

On Tue, May 24, 2022 at 11:42:35AM +0200, Mats Kindahl wrote:
>
> Typically, signal-unsafe functions should not be called from signal
> handlers. In particular, calling malloc() directly or indirectly can cause
> deadlocks, making PostgreSQL unresponsive to signals.
>
> Unless I am missing something, bgworker_die uses ereport, which indirectly
> calls printf-like functions, which are not signal-safe since they use
> malloc(). In rare cases, this can lead to deadlocks with stacks that look
> like this (from https://github.com/timescale/timescaledb/issues/4200):
>
> #0  0x00007f0e4d1040eb in __lll_lock_wait_private () from
> target:/lib/x86_64-linux-gnu/libc.so.6
> [...]
> #3  malloc (size=53)
> [...]
> #7  0x000055b9212235b1 in errmsg ()
> #8  0x00007f0e27bf27a8 in handle_sigterm (postgres_signal_arg=15) at
> /build/timescaledb/src/bgw/scheduler.c:841
> #9  <signal handler called>
> [...]
> #13 free (ptr=<optimized out>)
> #14 0x00007f0e4db12cb4 in OPENSSL_LH_free () from
> target:/lib/x86_64-linux-gnu/libcrypto.so.1.1
> [...]

As far as I can see the problem comes from your handle_sigterm:

static void handle_sigterm(SIGNAL_ARGS)
{
        /*
         * do not use a level >= ERROR because we don't want to exit here but
         * rather only during CHECK_FOR_INTERRUPTS
         */
        ereport(LOG,
                        (errcode(ERRCODE_ADMIN_SHUTDOWN),
                         errmsg("terminating TimescaleDB job scheduler due to administrator command")));
        die(postgres_signal_arg);
}

As mentioned in the topmost comment of elog.c, there's an escape hatch for
out of memory situations, to make sure that a small enough message can be
displayed without trying to allocate memory.  But this is only for ERROR or
higher levels.  Using an ereport(LOG, ...) level in a sigterm handler indeed
isn't safe.

Yes, and we have already fixed this in the TimescaleDB code, so this is not an issue for us (https://github.com/timescale/timescaledb/pull/4323). (We actually replace the bgworker_die with just die as signal handler.)

However, bgworker_die()---which is part of PostgreSQL and is the default signal handler for background workers---is using ereport() and I think this should be a problem for all error levels since the problem is in the usage of the formatting function to format the error message, not where you write it. This would mean that any existing background workers would have a window for triggering this issue on shutdown.

pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #17494: High demand for displacement sort
Next
From: Julien Rouhaud
Date:
Subject: Re: Use of signal-unsafe functions from signal handlers