Re: VM corruption on standby - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: VM corruption on standby
Date
Msg-id aKZWZHjSMOhmqfn5@paquier.xyz
Whole thread Raw
In response to Re: VM corruption on standby  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Wed, Aug 20, 2025 at 09:14:04AM -0400, Andres Freund wrote:
> On 2025-08-19 23:47:21 -0400, Tom Lane wrote:
>> Hm.  It still makes me mighty uncomfortable, because the point of a
>> critical section is "crash the database if anything goes wrong during
>> this bit".  Waiting for another process --- or thread --- greatly
>> increases the scope of ways for things to go wrong.  So I'm not
>> exactly convinced that this aspect of the AIO architecture is
>> well-thought-out.
>
> I don't see the alternative:
>
> 1) Some IO is done in critical sections (e.g. WAL writes / flushes)
>
> 2) Sometimes we need to wait for already started IO in critical sections
>    (also WAL)
>
> 3) With some ways of doing AIO the IO is offloaded to other processes, and
>    thus waiting for the IO to complete always requires waiting for another
>    process
>
> How could we avoid the need to wait for another process in criticial sections
> given these points?

Yes, it comes down to the point that for some code path we just cannot
accept a soft failure: some IOs are critical enough that if they fail
the only thing we can should and can do is to recover and replay based
on the past IOs that we know did succeed and made it durably to disk.

Having what can be qualified as safe and efficient to use in a
critical section for event broadcasting and waits would be really,
really nice.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Remove traces of long in dynahash.c
Next
From: Chao Li
Date:
Subject: Re: Remove traces of long in dynahash.c