Home > mailing lists

Re: Async-unsafe functions in signal handlers - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Async-unsafe functions in signal handlers
Date	August 27, 2021 21:05:47
Msg-id	20210827210547.nntbtdlhqbxbo4wu@alap3.anarazel.de Whole thread Raw
In response to	Re: Async-unsafe functions in signal handlers (Denis Smirnov <sd@arenadata.io>)
Responses	Re: Async-unsafe functions in signal handlers
List	pgsql-hackers

Tree view

Hi,

On 2021-08-27 23:51:27 +0300, Denis Smirnov wrote:
> > 26 авг. 2021 г., в 23:39, Tom Lane <tgl@sss.pgh.pa.us> написал(а):
> > 
> > (BTW, I think it's pretty silly to imagine that adding backtrace()
> > calls inside ereport is making things any more dangerous.  ereport
> > has pretty much always carried a likelihood of calling malloc(),
> > for example.)
> 
> I have taken a look through the signal handlers and found out that many of them use malloc() via ereport() and
elog().Here is the list:

> 
> SIGUSR1
> - procsignal_sigusr1_handler(): autoprewarm, autovacuum, bgworker, bgwriter, checkpointer, pgarch, startup,
walwriter,walreciever, walsender

There shouldn't be meaningful uses of elog/ereport() inside
procsignal_sigusr1_handler(). The exception I found was an elog(FATAL) for
unreachable code.

> - sigusr1_handler(): postmaster
> SIGHUP:
> - SIGHUP_handler(): postmaster
> SIGCHLD:
> - reaper(): postmaster

I think these runs in a very controlled set of circumstances because most of
postmaster runs with signals masked.

> SIGFPE:
> - FloatExceptionHandler(): autovacuum, bgworker, postgres, plperl

Yep, although as discussed this might not be a "real" problem because it
should only run during an instruction triggering an FPE.

> SIGQUIT:
> - quickdie(): postgres

Yes, this is an issue. I've previously argued for handling this via write()
and _exit(), instead of the full ereport() machinery. However, we have a
bandaid that deals with possible hangs, by SIGKILLing when processes don't
shut down (at that point things have already gone quite south, so that's not
an issue).

> SIGTERM:
> - bgworker_die(): bgworker

Bad.

> SIGALRM:
> - handle_sig_alarm(): autovacuum, bgworker, postmaster, startup, walsender, postgres

I don't think there reachable elogs in there. I'm not concerned about e.g.
        elog(FATAL, "timeout index %d out of range 0..%d", index,
             num_active_timeouts - 1);
because that's not something that should ever be reachable in a production
scenario. If it is, there's bigger problems.

Perhaps we ought to have a version of Assert() that's enabled in production
builds as well, and that outputs the error messages via write() and then
_exit()s?

Greetings,

Andres Freund

pgsql-hackers by date:

From: Denis Smirnov
Date: 27 August 2021, 20:51:27
Subject: Re: Async-unsafe functions in signal handlers

From: Stephen Frost
Date: 27 August 2021, 22:33:33
Subject: Re: New predefined roles- 'pg_read/write_all_data'

Re: Async-unsafe functions in signal handlers - Mailing list pgsql-hackers

Previous

Next