Re: VM corruption on standby - Mailing list pgsql-hackers

From Kirill Reshke
Subject Re: VM corruption on standby
Date
Msg-id CALdSSPhO4zJAPEKu6wuxg362d0uHv1Qr8f83q-o4T54c0J4GgA@mail.gmail.com
Whole thread Raw
In response to Re: VM corruption on standby  (Kirill Reshke <reshkekirill@gmail.com>)
Responses Re: VM corruption on standby
List pgsql-hackers
On Wed, 13 Aug 2025 at 16:15, I wrote:
> I did not find any doc or other piece of information indicating
> whether WaitEventSetWait and critical sections are allowed. But I do
> thing this is bad, because we do not process interruptions during
> critical sections, so it is unclear to me why we should handle
> postmaster death any differently.


Maybe I'm very wrong about this, but I'm currently suspecting there is
corruption involving CHECKPOINT, process in CRIT section and kill -9.

The scenario I am trying to reproduce is following:

1) Some process p1 locks some buffer (name it buf1), enters CRIT
section, calls MarkBufferDirty and hangs inside XLogInsert on CondVar
in (GetXLogBuffer -> AdvanceXLInsertBuffer).
2) CHECKPOINT (p2) stars and tries to FLUSH dirty buffers, awaiting lock on buf1
3) Postmaster kill-9-ed
4) signal of postmaster death delivered to p1, it wakes up in
WaitLatch/WaitEventSetWaitBlock functions, checks postmaster
aliveness, and exits releasing all locks.
5) p2 acquires locks  on buf1 and flushes it to disk.
6) signal of postmaster death delivered to p2, p2 exits.

And we now have a case when the buffer is flushed to disk, while the
xlog record that describes this change never makes it to disk. This is
very bad.

To be clear, I am trying to avoid use of inj points to reproduce
corruption. I am not yet successful in this though.

-- 
Best regards,
Kirill Reshke



pgsql-hackers by date:

Previous
From: Shinya Kato
Date:
Subject: Re: Add log_autovacuum_{vacuum|analyze}_min_duration
Next
From: Michael Paquier
Date:
Subject: Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)