Re: IPC/MultixactCreation on the Standby server - Mailing list pgsql-hackers

From Andrey Borodin
Subject Re: IPC/MultixactCreation on the Standby server
Date
Msg-id A638A5AC-B5F7-4E3E-BD07-EC0D56A14B41@yandex-team.ru
Whole thread Raw
In response to Re: IPC/MultixactCreation on the Standby server  (Álvaro Herrera <alvherre@kurilemu.de>)
Responses Re: IPC/MultixactCreation on the Standby server
List pgsql-hackers

> On 26 Jul 2025, at 22:44, Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2025-Jul-25, Andrey Borodin wrote:
>
>> Also I've discovered one more serious problem.
>> If a backend crashes just before WAL-logging multi, any heap tuple
>> that uses this multi will become unreadable. Any attempt to read it
>> will hang forever.
>>
>> I've reproduced the problem and now I'm working on scripting this
>> scenario. Basically, I modify code to hang forever after assigning
>> multi number 2.
>
> It took me a minute to understand this, and I think your description is
> slightly incorrect: you mean that the heap tuple that uses the PREVIOUS
> multixact cannot be read (at least, that's what I understand from your
> reproducer script).
Yes, I explained a bit incorrectly, but you got the problem correctly.

>
> Looking at this,
>
>        /*
>         * We want to avoid edge case 2 in redo, because we cannot wait for
>         * startup process in GetMultiXactIdMembers() without risk of a
>         * deadlock.
>         */
>        MultiXactId next = multi + 1;
>        int         next_pageno;
>
>        /* Handle wraparound as GetMultiXactIdMembers() does it. */
>        if (multi < FirstMultiXactId)
>            multi = FirstMultiXactId;
>
> Don't you mean to test and change the value 'next' rather than 'multi'
> here?
Yup, that was typo.

>
> In this bit,
>
>                 * We do not need to handle race conditions, because this code
>                 * is only executed in redo and we hold
>                 * MultiXactOffsetSLRULock.
>
> I think it'd be good to have an
> Assert(LWLockHeldByMeInMode(MultiXactOffsetSLRULock, LW_EXCLUSIVE));
> just for peace of mind.
Ugh, that uncovered 17+ problem: now we have a couple of locks simultaneously. I'll post a version with this a later.

> Also, commit c61678551699 removed
> ZeroMultiXactOffsetPage(), but since you have 'false' as the second
> argument, then SimpleLruZeroPage() is enough.  (I wondered why isn't
> WAL-logging necessary ... until I remember that we're in a standby.  I
> think a simple comment here like "no WAL-logging because we're a
> standby" should suffice.)

Agree.


I've made a test [0] and discovered another problem. Adding this checkpoint breaks the test[1] even after a fix[2].
I suspect that excluding "edge case 2" on standby is simply not enough... we have to do this "next offset" dance on
Primarytoo. I'll think more about other options. 


Best regards, Andrey Borodin.
[0] https://github.com/x4m/postgres_g/commit/eafcaec7aafde064b0da5d2ba4041ed2fb134f07
[1] https://github.com/x4m/postgres_g/commit/da762c7cac56eff1988ea9126171ca0a6d2665e9
[2] https://github.com/x4m/postgres_g/commit/d64c17d697d082856e5fe8bd52abafc0585973af

Timeline of this commits can be seen here https://github.com/x4m/postgres_g/commits/mx19/


pgsql-hackers by date:

Previous
From: Frédéric Yhuel
Date:
Subject: Re: vacuumdb changes for stats import/export
Next
From: Andy Fan
Date:
Subject: Re: Returning nbtree posting list TIDs in DESC order during backwards scans