Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae - Mailing list pgsql-bugs

From Noah Misch
Subject Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Date
Msg-id 20240322194323.8a.nmisch@google.com
Whole thread Raw
In response to Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae  (Melanie Plageman <melanieplageman@gmail.com>)
List pgsql-bugs
On Fri, Mar 22, 2024 at 02:41:25PM -0400, Melanie Plageman wrote:
> On Fri, Mar 22, 2024 at 8:22 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > On Thu, Mar 21, 2024 at 1:22 PM Matthias van de Meent
> > <boekewurm+postgres@gmail.com> wrote:
> > > > So it seems like Matthias, Peter, and Andres all agree that
> > > > GlobalVisState->maybe_needed going backward is bad and causes this
> > > > problem. Unfortunately, I don't understand the mechanism.
> > >
> > > There are 2 mechanisms I know of which allow this value to go backwards:
> >
> > I actually wasn't asking about the mechanism by which
> > GlobalVisState->maybe_needed could go backwards. I was asking about
> > the mechanism by which that could cause bad things to happen.
> >
> > > 1. Replication slots that connect may set their backend's xmin to an
> > > xmin < GlobalXmin.
> > > This is known and has been documented, and was considered OK when this
> > > was discussed on the list previously.
> >
> > Right, OK.
> >
> > > 2. The commit abort path has a short window in which the backend's
> > > xmin is unset and does not mirror the xmin of registered snapshots.
> > > This is what I described in [0], and may be the worst (?) offender.
> > >
> > > [0] https://www.postgresql.org/message-id/CAEze2Wj%2BV0kTx86xB_YbyaqTr5hnE_igdWAwuhSyjXBYscf5-Q%40mail.gmail.com
> >
> > So, what I would say is that this sounds inadvertent and so perhaps we
> > should do something about it, but also, it seems wrong to me that it
> > causes any serious problem. As far as I know, we've always treated the
> > result of an xmin calculation going backward as a rare but expected
> > case with which everything that depends on xmin calculations must
> > cope.
> 
> I'm still catching up here, so forgive me if this is a dumb question:
> Does using GlobalVisState instead of VacuumCutoffs->OldestXmin when
> freezing and determining relfrozenxid not solve the problem?

One could fix it along those lines.  If GlobalVisState moves forward during
VACUUM, that's fine, but relfrozenxid needs to reflect the overall outcome,
not just the final GlobalVisState.  Suppose we remove XIDs <100 at page 1, <99
at page 2, and <101 at page 3.  relfrozenxid needs the value it would get if
we had removed <99 at every page.  I think GlobalVisState doesn't track that
today, but it could.  The 2024-03-14 commit e85662d added
GetStrictOldestNonRemovableTransactionId(), which targets a similar problem.
I've not reviewed it, but I suggest checking it for relevance to $SUBJECT.



pgsql-bugs by date:

Previous
From: Robert Haas
Date:
Subject: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Next
From: Bruce Momjian
Date:
Subject: Re: Regression tests fail with musl libc because libpq.so can't be loaded