relfrozenxid may disagree with row XIDs after 1ccc1e05ae - Mailing list pgsql-bugs

From Noah Misch
Subject relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Date
Msg-id 20240304000717.1e@rfd.leadboat.com
Whole thread Raw
In response to Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()  (Noah Misch <noah@leadboat.com>)
Responses Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
List pgsql-bugs
On Sat, Jan 06, 2024 at 02:44:48PM -0800, Noah Misch wrote:
> On Sat, Jan 06, 2024 at 01:30:40PM -0800, Peter Geoghegan wrote:
> > On Sat, Jan 6, 2024 at 12:24 PM Noah Misch <noah@leadboat.com> wrote:
> > > Fair enough.  While I agree there's a decent chance back-patching would be
> > > okay, I think there's also a decent chance that 1ccc1e05ae creates the problem
> > > Matthias theorized.  Something like: we update relfrozenxid based on
> > > OldestXmin, even though GlobalVisState caused us to retain a tuple older than
> > > OldestXmin.  Then relfrozenxid disagrees with table contents.

I figure Matthias's upthread theory is more likely than not to hold.  If it
does hold, commit 1ccc1e05ae created a new corruption route.  Hence, I'm
adding a v17 open item for commit 1ccc1e05ae.

> > Either every relevant code path has the same OldestXmin to work off
> > of, or the whole NewRelfrozenXid/relfrozenxid-tracking thing can't be
> > expected to work as designed. I find it a bit odd that
> > pruneheap.c/GlobalVisState has no direct understanding of this
> > dependency (none that I can discern, at least). Wouldn't it at least
> > be more natural if pruneheap.c could access OldestXmin when run inside
> > VACUUM? (Could just be used by defensive hardening code.)
> 
> Tied to that decision is the choice of semantics when the xmin horizon moves
> backward during one VACUUM, e.g. when a new walsender xmin does so.  Options:
> 
> 1. Continue to remove tuples based on the OldestXmin from VACUUM's start.  We
>    could have already removed some of those tuples, so the walsender xmin
>    won't achieve a guarantee anyway.  (VACUUM would want ratchet-like behavior
>    in GlobalVisState, possibly by sharing OldestXmin with pruneheap like you
>    say.)
> 
> 2. Move OldestXmin backward, to reflect the latest xmin horizon.  (Perhaps
>    VACUUM would just pass GlobalVisState to a function that returns the
>    compatible OldestXmin.)
> 
> Which way is better?



pgsql-bugs by date:

Previous
From: jian he
Date:
Subject: Re: Record returning function accept not matched columns declaration
Next
From: Thomas Munro
Date:
Subject: Re: BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker