Re: VM corruption on standby - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: VM corruption on standby
Date
Msg-id CA+hUKGKBUaWrJWSJCLtWqVvP4_aDPnPJ8LFv17wj-ViQQi4ouw@mail.gmail.com
Whole thread Raw
In response to Re: VM corruption on standby  (Andres Freund <andres@anarazel.de>)
Responses Re: VM corruption on standby
List pgsql-hackers
On Wed, Aug 20, 2025 at 1:56 AM Andres Freund <andres@anarazel.de> wrote:
> On 2025-08-19 02:13:43 -0400, Tom Lane wrote:
> > > Then wouldn't backends blocked in LWLockAcquire(x) hang forever, after
> > > someone who holds x calls _exit()?
> >
> > If someone who holds x is killed by (say) the OOM killer, how do
> > we get out of that?

If a backend is killed by the OOM killer, the postmaster will of
course send SIGQUIT/SIGKILL to all backend.  If the postmaster itself
is killed, then surviving backends will notice at their next
WaitEventSetWait() and exit, but if any are blocked in sem_wait(),
they it will only make progress because other exiting backends release
their LWLocks on their way out.  So if we change that to _exit(), I
assume such backends would linger forever in sem_wait() after the
postmaster dies.  I do agree that it seems quite weird to release all
locks as if this is a "normal" exit though, which is why Kirill and I
both wondered about other ways to boot them out of sem_wait()...

> On linux - the primary OS with OOM killer troubles - I'm pretty sure'll lwlock
> waiters would get killed due to the postmaster death signal we've configured
> (c.f. PostmasterDeathSignalInit()).

No, that has a handler that just sets a global variable.  That was
done because recovery used to try to read() from the postmaster pipe
after replaying every record.  Also we currently have some places that
don't want to be summarily killed (off the top of my head, syncrep
wants to send a special error message, and the logger wants to survive
longer than everyone else to catch as much output as possible, things
I've been thinking about in the context of threads).



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Remove traces of long in dynahash.c
Next
From: Andres Freund
Date:
Subject: Re: VM corruption on standby