Re: add assertion for palloc in signal handlers - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: add assertion for palloc in signal handlers
Date
Msg-id b3914af0-1e94-4b7a-941f-eed0c7fd5fc5@iki.fi
Whole thread Raw
In response to Re: add assertion for palloc in signal handlers  (Kirill Reshke <reshkekirill@gmail.com>)
Responses Re: add assertion for palloc in signal handlers
Re: add assertion for palloc in signal handlers
List pgsql-hackers
On 19/02/2026 09:21, Kirill Reshke wrote:
> On Thu, 19 Feb 2026 at 02:50, Andres Freund <andres@anarazel.de> wrote:
>> I don't think die() should call ereport() without being in single user mode. I
>> guess there's a corner case though, which is that the signal handler is
>> executed during exit processing, when we already reset whereToSendOutput.  I
>> think we probably should make sure this stuff is only reached in actual single
>> user mode.

+1

> Hi! Thank you for your input.
> Stupid question from me: in singleuser mode, how should this work? We
> still should not ereport inside signal handlers, are we?
> If singleuser mode is somehow safe with die & ereport then... I don't
> understand how.

It's not safe to ereport() from die() in single-user mode either. We've 
just accepted it as the lesser evil:

>     /*
>      * If we're in single user mode, we want to quit immediately - we can't
>      * rely on interrupts as they wouldn't work when stdin/stdout is a file.
>      * Rather ugly, but it's unlikely to be worthwhile to invest much more
>      * effort just for the benefit of single user mode.
>      */
>     if (DoingCommandRead && whereToSendOutput != DestRemote)
>         ProcessTerminateInterrupt();

I don't quite understand that though. I understand that a read/write on 
a file might be uninterruptible. But so what? It presumably won't take 
very long, so it's fine to wait for the read/write to finish. Are we 
worried about getting stuck on an NFS read when the network is down, or 
something like that?

This was the discussion back then 
[https://www.postgresql.org/message-id/20150203202811.GA12566%40awork2.anarazel.de]:

>> Could you use WaitLatchOrSocket to sleep on stdin?
> 
> Unfortunately I don't think so. poll() etc only works properly on
> network handles, pipes etc - but stdin can be a file :(. And I think
> what exactly happens if it's a file fd isn't super well defined. On
> linux the file is always marked as ready, which would probably actually
> work...

I think that's actually wrong. POSIX says 
[https://pubs.opengroup.org/onlinepubs/9799919799/functions/poll.html]:

> The poll() and ppoll() functions shall support regular files,
> terminal and pseudo-terminal devices, FIFOs, pipes, and sockets. The
> behavior of poll() and ppoll() on elements of fds that refer to
> other types of file is unspecified.
> 
> Regular files shall always poll TRUE for reading and writing.

So that is pretty well-defined, and we could use poll() on stdin too.

>> I'm somewhat confused by
>>
>>          /* Don't joggle the elbow of proc_exit */
>>          if (!proc_exit_inprogress)
>>          {
>>                  InterruptPending = true;
>>                  ProcDiePending = true;
>>          }
>>
>> because the only thing that avoids is actually harmless stuff, we're not going
>> to notice that , whereas the ProcessInterrupts() done further down is actually
>> certainly "joggling the elbow of proc_exit().
>>
>> <digs>
>>
>> I think the comment hasn't quite kept up with the time, the block used to do
>> more. Some dude named Andres didn't remove the comment in commit 2505ce0be0b.
>>
>> And another dude, confusingly also named Andres, in commit 4f85fde8eb86,
>> didn't think about the scenario that whereToSendOutput could already be reset
>> to DestNone in die()
>>
>> Ooops.

Yeah, I've been wondering about that "joggle the elbow" comment too. We 
don't have that check in any other signal handlers that set 
InterruptPending. I believe we could just remove the if(...) and set 
InterruptPending and ProcDiePending unconditionally here.

>> quickdie() obviously does does reach the ereport().  I think it's a bad idea,
>> with a bad justification: For one, libpq makes up the same error reason if the
>> client just vanishes.  For another, because it just uses
>> WARNING[_CLIENT_ONLY], it'll not even reach clients if they use a higher
>> client_min_messages.
>>
>> Just hoping that the client communication, including openssl, is in a state
>> that makes it somewhat safe to send messages from a signal handler, is
>> bonkers, IMNSHO.  Yes, we have the "safety" mechanism of postmaster SIGKILLing
>> processes, but that only protects against self deadlocks, not against
>> corrupted datastructures etc.

PqCommBusy gives some protection from messing with openssl state. But 
yes, it's generally unsafe.

>> It's one thing to ereport() in a signal handler during a crash triggered
>> quickdie(), the shit already has hit the fan after all, but doing it as part
>> of an intended immediate shutdown is a seriously bad idea.

I don't see much difference between a crash and an "intended immediate 
shutdown". In both cases, you don't want to corrupt datastructures etc. 
more than you already have, but on the other hand the system is going 
down anyway.

> So, after all, we need to remove ereport from quick die completely, no
> backup plan?
> This will make whole system less verbose about its shutdown reasons...
> Not sure how vital that is

What's the proposed alternative to ereporting() from quickdie? If we 
just remove the ereport(), yeah, clients will miss out on the messages. 
Are we OK with that?

One alternative is to add more checks to quickdie(), and refrain from 
ereporting() if we're in the middle of an existing ereport(), for 
example, or some other such state. We probably cannot make it 100% safe 
with that approach, but maybe that's the right tradeoff.

Another alternative is to try to handle SIGQUIT more like SIGTERM, and 
delay the exiting until next CHECK_FOR_INTERRUPTS(). Maybe with a short 
timer to just SIGKILL if it the interrupt isn't processed quickly. It 
increases the risk that the process won't exit as quickly as we'd like, 
but maybe that's the right tradeoff.

- Heikki




pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Eliminating SPI / SQL from some RI triggers - take 3
Next
From: Ashutosh Bapat
Date:
Subject: Re: SQL Property Graph Queries (SQL/PGQ)