Re: Use of signal-unsafe functions from signal handlers - Mailing list pgsql-bugs

From Julien Rouhaud
Subject Re: Use of signal-unsafe functions from signal handlers
Date
Msg-id 20220524101418.7pp65powa362vzln@jrouhaud
Whole thread Raw
In response to Use of signal-unsafe functions from signal handlers  (Mats Kindahl <mats@timescale.com>)
Responses Re: Use of signal-unsafe functions from signal handlers
List pgsql-bugs
Hi,

On Tue, May 24, 2022 at 11:42:35AM +0200, Mats Kindahl wrote:
> 
> Typically, signal-unsafe functions should not be called from signal
> handlers. In particular, calling malloc() directly or indirectly can cause
> deadlocks, making PostgreSQL unresponsive to signals.
> 
> Unless I am missing something, bgworker_die uses ereport, which indirectly
> calls printf-like functions, which are not signal-safe since they use
> malloc(). In rare cases, this can lead to deadlocks with stacks that look
> like this (from https://github.com/timescale/timescaledb/issues/4200):
> 
> #0  0x00007f0e4d1040eb in __lll_lock_wait_private () from
> target:/lib/x86_64-linux-gnu/libc.so.6
> [...]
> #3  malloc (size=53)
> [...]
> #7  0x000055b9212235b1 in errmsg ()
> #8  0x00007f0e27bf27a8 in handle_sigterm (postgres_signal_arg=15) at
> /build/timescaledb/src/bgw/scheduler.c:841
> #9  <signal handler called>
> [...]
> #13 free (ptr=<optimized out>)
> #14 0x00007f0e4db12cb4 in OPENSSL_LH_free () from
> target:/lib/x86_64-linux-gnu/libcrypto.so.1.1
> [...]

As far as I can see the problem comes from your handle_sigterm:

static void handle_sigterm(SIGNAL_ARGS)
{
    /*
     * do not use a level >= ERROR because we don't want to exit here but
     * rather only during CHECK_FOR_INTERRUPTS
     */
    ereport(LOG,
            (errcode(ERRCODE_ADMIN_SHUTDOWN),
             errmsg("terminating TimescaleDB job scheduler due to administrator command")));
    die(postgres_signal_arg);
}

As mentioned in the topmost comment of elog.c, there's an escape hatch for
out of memory situations, to make sure that a small enough message can be
displayed without trying to allocate memory.  But this is only for ERROR or
higher levels.  Using an ereport(LOG, ...) level in a sigterm handler indeed
isn't safe.



pgsql-bugs by date:

Previous
From: Andrey Borodin
Date:
Subject: Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY
Next
From: operations i
Date:
Subject: How is this possible "publication does not exist"