Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() - Mailing list pgsql-bugs
From | Melanie Plageman |
---|---|
Subject | Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() |
Date | |
Msg-id | CAAKRu_ai8PMW5cqCFhu-U46CWLmgP2d_FnpLOqCSvMxY-UQ9xw@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() (Andres Freund <andres@anarazel.de>) |
Responses |
Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
|
List | pgsql-bugs |
On Mon, Apr 15, 2024 at 1:39 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > I've tried a couple times to catch up with this thread. But always kinda felt > I must be missing something. It might be that this is one part of the > confusion: > > On 2024-01-06 12:24:13 -0800, Noah Misch wrote: > > Fair enough. While I agree there's a decent chance back-patching would be > > okay, I think there's also a decent chance that 1ccc1e05ae creates the problem > > Matthias theorized. Something like: we update relfrozenxid based on > > OldestXmin, even though GlobalVisState caused us to retain a tuple older than > > OldestXmin. Then relfrozenxid disagrees with table contents. > > Looking at the state as of 1ccc1e05ae, I don't see how - in lazy_scan_prune(), > if heap_page_prune() spuriously didn't prune a tuple, because the horizon went > backwards, we'd encounter the tuple in the loop below and call > heap_prepare_freeze_tuple(), which would error out with one of > > /* > * Process xmin, while keeping track of whether it's already frozen, or > * will become frozen iff our freeze plan is executed by caller (could be > * neither). > */ > xid = HeapTupleHeaderGetXmin(tuple); > if (!TransactionIdIsNormal(xid)) > xmin_already_frozen = true; > else > { > if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid)) > ereport(ERROR, > (errcode(ERRCODE_DATA_CORRUPTED), > errmsg_internal("found xmin %u from before relfrozenxid %u", > xid, cutoffs->relfrozenxid))); > > or > if (TransactionIdPrecedes(update_xact, cutoffs->relfrozenxid)) > ereport(ERROR, > (errcode(ERRCODE_DATA_CORRUPTED), > errmsg_internal("multixact %u contains update XID %u from before relfrozenxid%u", > multi, update_xact, > cutoffs->relfrozenxid))); > or > /* Raw xmax is normal XID */ > if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid)) > ereport(ERROR, > (errcode(ERRCODE_DATA_CORRUPTED), > errmsg_internal("found xmax %u from before relfrozenxid %u", > xid, cutoffs->relfrozenxid))); > > > I'm not saying that spuriously erroring out would be ok. But I guess I just > don't understand the data corruption theory in this subthread, because we'd > error out if we encountered a tuple that should have been frozen but wasn't? I have a more basic question. How could GlobalVisState->maybe_needed going backwards cause a problem with relfrozenxid? Yes, if maybe_needed goes backwards, we may not remove a tuple whose xmin/xmax are older than VacuumCutoffs->OldestXmin. But, if that tuple's xmin/xmax are older than OldestXmin, then wouldn't we freeze it? If we freeze it, there isn't an issue. And if the tuple's xids are not newer than OldestXmin, then how could we end up advancing relfrozenxid to a value greater than the tuple's xids? - Melanie
pgsql-bugs by date: