Re: VM corruption on standby - Mailing list pgsql-hackers

From Kirill Reshke
Subject Re: VM corruption on standby
Date
Msg-id CALdSSPisWpkL+-_vS7B7vonX1XTC8aVkPhj3BBc2wtmuZ_a7cQ@mail.gmail.com
Whole thread Raw
In response to Re: VM corruption on standby  (Yura Sokolov <y.sokolov@postgrespro.ru>)
List pgsql-hackers
On Tue, 19 Aug 2025 at 21:16, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:

>
> That is not true.
> elog(PANIC) doesn't clear LWLocks. And XLogWrite, which is could be called
> from AdvanceXLInsertBuffer, may call elog(PANIC) from several places.
>
> It doesn't lead to any error, because usually postmaster is alive and it
> will kill -9 all its children if any one is died in critical section.
>
> So the problem is postmaster is already killed with SIGKILL by definition
> of the issue.
>
> Documentation says [0]:
> > If at all possible, do not use SIGKILL to kill the main postgres server.
> > Doing so will prevent postgres from freeing the system resources (e.g.,
> shared memory and semaphores) that it holds before terminating.
>
> Therefore if postmaster SIGKILL-ed, administrator already have to do some
> actions.
>

There are surely many cases when a system reaches the state which can
only be fixed by admin action.
The elog(PANIC) in the CRIT section is very rare (and very probably is
corruption already).
The simpler example is to kill-9 postmaster and then immediately
kill-9 someone who holds LWLock.
The problem is in pgv18 is that this state probability is much higher
due to the aforementioned commit. In can happen with almost
any OOM on highly loaded systems.

-- 
Best regards,
Kirill Reshke



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: RFC: extensible planner state
Next
From: "章晨曦"
Date:
Subject: Re: Performance issue on temporary relations