Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae - Mailing list pgsql-bugs

From Melanie Plageman
Subject Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Date
Msg-id CAAKRu_b=7e56UNog-Z=N2--bp42r0exckPAPfhruZBVRJHY9nQ@mail.gmail.com
Whole thread Raw
In response to Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae  (Noah Misch <noah@leadboat.com>)
Responses Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
List pgsql-bugs
On Tue, Jun 25, 2024 at 3:37 PM Noah Misch <noah@leadboat.com> wrote:
>
> On Thu, Jun 20, 2024 at 11:49:50AM -0400, Melanie Plageman wrote:
> > On Tue, Jun 18, 2024 at 6:51 PM Melanie Plageman <melanieplageman@gmail.com> wrote:
> > > I ended up manually backporting the logic from 1ccc1e05ae as opposed
> > > to cherry-picking because it relied on a struct introduced in
> > > 4e9fc3a9762065.
>
> > Attached is the backport and repros for 15 and 16.

I think we are going with the fix proposed for master [1] which
compares dead_after to OldestXmin before using GlobalVisState.
Backporting 1ccc1e05ae doesn't actually fix the problem. We just end
up erroring out when attempting to freeze the tuple we didn't remove.

As such, attached is my proposed fix for affected stable branches. It
is based off of the fix proposed in [1] but is a bit different in each
version due to surrounding code changes.

The test I added passes locally and on linux and windows in CI (on 15+
which have CI). I don't have enough cirrus credits to run the tests on
mac. I am nervous about the test flaking on the buildfarm. But I did
the best I could to try to make it stable. I think keeping it as a
separate commit should be easiest in case we have to revert it?

Thanks to Heikki for backporting BackgroundPsql -- this made my life
much easier!!

- Melanie

[1] https://www.postgresql.org/message-id/CAAKRu_Z4PybtZ0i_NKOr-vbrFW5p1ZdfEfUqaeU8fLPhszpP_g%40mail.gmail.com

Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #18348: Inconsistency with EXTRACT([field] from INTERVAL);
Next
From: Masahiko Sawada
Date:
Subject: Re: Potential data loss due to race condition during logical replication slot creation