Re: Incorrect assumption in heap_prepare_freeze_tuple - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Incorrect assumption in heap_prepare_freeze_tuple |
Date | |
Msg-id | 20201002183458.t6embzp2ghgsw2qw@alap3.anarazel.de Whole thread Raw |
In response to | Incorrect assumption in heap_prepare_freeze_tuple (Kuntal Ghosh <kuntalghosh.2007@gmail.com>) |
Responses |
Re: Incorrect assumption in heap_prepare_freeze_tuple
|
List | pgsql-hackers |
Hi, On 2020-10-02 23:26:05 +0530, Kuntal Ghosh wrote: > In heap_prepare_freeze_tuple, we make the following assumption: > > * It is assumed that the caller has checked the tuple with > * HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD > * (else we should be removing the tuple, not freezing it). > > Thus, when we see a committed xmax that precedes the cutoff_xid, we throw > the following data corruption error: > errmsg_internal("cannot freeze committed xmax %u", xid) > > However, in the caller (lazy_scan_heap), HeapTupleSatisfiesVacuum may > return HEAPTUPLE_DEAD for an updated/deleted tuple that got modified by a > transaction older than OldestXmin. And, if the tuple is HOT-updated, it > should only be removed by a hot-chain prune operation. So, we treat the > tuple as RECENTLY_DEAD and don't remove the tuple. This code is so terrible :( We really should just merge the HOT pruning and lazy_scan_heap() removal/freeze operations. That'd avoid this corner case and *also* would significantly reduce the WAL volume of VACUUM. And safe a good bit of CPU time. > So, it may lead to an incorrect data corruption error. IIUC, following will > be the exact scenario where the error may happen, > > An updated/deleted tuple whose xamx is in between cutoff_xid and > OldestXmin. Since cutoff_xid depends on vacuum_freeze_min_age and > autovacuum_freeze_max_age, it'll not be encountered easily. But, I think > it can be reproduced with some xid burner patch. I don't think this case is possible (*). By definition, there cannot be any transactions needing tuples from before OldestXmin. Which means that the heap_page_prune() earlier in lazy_scan_heap() would have pruned away a DEAD tuple version that is part of a hot chain. The HEAPTUPLE_DEAD branch you're referring to really can only be hit for tuples that are *newer* than OldestXmin but become DEAD (instead of RECENTLY_DEAD) because the inserting transaction aborted. (*) with the exception of temp tables due to some recent changes, I am currently working on a fix for that. > I think the fix should be something like following: > if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) && > - TransactionIdDidCommit(xid)) > + TransactionIdDidCommit(xid) && > + !HeapTupleHeaderIsHotUpdated(tuple)) > ereport(ERROR, > (errcode(ERRCODE_DATA_CORRUPTED), > errmsg_internal("cannot freeze committed xmax %u", > xid))); > - freeze_xmax = true; > + > + freeze_xmax = HeapTupleHeaderIsHotUpdated(tuple) ? false : true; I don't think that would be correct - we'd end up with an xmax that's older than cutoff_xid left in the table. Breaking relfrozenxid / creating wraparound and clog lookup dangers. This branch is only entered when xmax precedes cutoff_xid - which is what we may set relfrozenxid to. What made you look at this? Did you hit the error? Greetings, Andres Freund
pgsql-hackers by date: