Hi,
On Tue, May 24, 2022 at 11:42:35AM +0200, Mats Kindahl wrote:
>
> Typically, signal-unsafe functions should not be called from signal
> handlers. In particular, calling malloc() directly or indirectly can cause
> deadlocks, making PostgreSQL unresponsive to signals.
>
> Unless I am missing something, bgworker_die uses ereport, which indirectly
> calls printf-like functions, which are not signal-safe since they use
> malloc(). In rare cases, this can lead to deadlocks with stacks that look
> like this (from https://github.com/timescale/timescaledb/issues/4200):
>
> #0 0x00007f0e4d1040eb in __lll_lock_wait_private () from
> target:/lib/x86_64-linux-gnu/libc.so.6
> [...]
> #3 malloc (size=53)
> [...]
> #7 0x000055b9212235b1 in errmsg ()
> #8 0x00007f0e27bf27a8 in handle_sigterm (postgres_signal_arg=15) at
> /build/timescaledb/src/bgw/scheduler.c:841
> #9 <signal handler called>
> [...]
> #13 free (ptr=<optimized out>)
> #14 0x00007f0e4db12cb4 in OPENSSL_LH_free () from
> target:/lib/x86_64-linux-gnu/libcrypto.so.1.1
> [...]
As far as I can see the problem comes from your handle_sigterm:
static void handle_sigterm(SIGNAL_ARGS)
{
/*
* do not use a level >= ERROR because we don't want to exit here but
* rather only during CHECK_FOR_INTERRUPTS
*/
ereport(LOG,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating TimescaleDB job scheduler due to administrator command")));
die(postgres_signal_arg);
}
As mentioned in the topmost comment of elog.c, there's an escape hatch for
out of memory situations, to make sure that a small enough message can be
displayed without trying to allocate memory. But this is only for ERROR or
higher levels. Using an ereport(LOG, ...) level in a sigterm handler indeed
isn't safe.