Home > mailing lists

Re: IPC/MultixactCreation on the Standby server - Mailing list pgsql-hackers

From	Álvaro Herrera
Subject	Re: IPC/MultixactCreation on the Standby server
Date	July 26, 2025 20:44:58
Msg-id	202507261744.iuczyt2mynb3@alvherre.pgsql Whole thread Raw
In response to	Re: IPC/MultixactCreation on the Standby server (Andrey Borodin <x4mmm@yandex-team.ru>)
Responses	Re: IPC/MultixactCreation on the Standby server
List	pgsql-hackers

Tree view

On 2025-Jul-25, Andrey Borodin wrote:

> Also I've discovered one more serious problem.
> If a backend crashes just before WAL-logging multi, any heap tuple
> that uses this multi will become unreadable. Any attempt to read it
> will hang forever.
> 
> I've reproduced the problem and now I'm working on scripting this
> scenario. Basically, I modify code to hang forever after assigning
> multi number 2.

It took me a minute to understand this, and I think your description is
slightly incorrect: you mean that the heap tuple that uses the PREVIOUS
multixact cannot be read (at least, that's what I understand from your
reproducer script).  I agree it's a pretty ugly bug!  I think it's
essentially the same bug as the other problem, so the proposed fix
should solve both.

Thanks for working on this!

Looking at this,

        /*
         * We want to avoid edge case 2 in redo, because we cannot wait for
         * startup process in GetMultiXactIdMembers() without risk of a
         * deadlock.
         */
        MultiXactId next = multi + 1;
        int         next_pageno;

        /* Handle wraparound as GetMultiXactIdMembers() does it. */
        if (multi < FirstMultiXactId)
            multi = FirstMultiXactId;

Don't you mean to test and change the value 'next' rather than 'multi'
here?

In this bit,

                 * We do not need to handle race conditions, because this code
                 * is only executed in redo and we hold
                 * MultiXactOffsetSLRULock.

I think it'd be good to have an
Assert(LWLockHeldByMeInMode(MultiXactOffsetSLRULock, LW_EXCLUSIVE));
just for peace of mind.  Also, commit c61678551699 removed
ZeroMultiXactOffsetPage(), but since you have 'false' as the second
argument, then SimpleLruZeroPage() is enough.  (I wondered why isn't
WAL-logging necessary ... until I remember that we're in a standby.  I
think a simple comment here like "no WAL-logging because we're a
standby" should suffice.)

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/

pgsql-hackers by date:

From: Bruce Momjian
Date: 26 July 2025, 19:06:52
Subject: Re: vacuumdb changes for stats import/export

From: Corey Huinker
Date: 26 July 2025, 22:38:09
Subject: Re: vacuumdb changes for stats import/export

Re: IPC/MultixactCreation on the Standby server - Mailing list pgsql-hackers

Previous

Next