Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE |
Date | |
Msg-id | 20230108002923.cyoser3ttmt63bfn@awork3.anarazel.de Whole thread Raw |
In response to | Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE (Michail Nikolaev <michail.nikolaev@gmail.com>) |
List | pgsql-hackers |
Hi, Thomas, CCing you because of the 64bit xid representation aspect. On 2023-01-06 00:39:44 +0300, Michail Nikolaev wrote: > I apologize for the direct ping, but I think your snapshot scalability > work in PG14 could be related to the issue. Good call! > The TransactionIdRetreatedBy implementation looks correct... But with > txid_current=212195 I see errors like "could not access status of > transaction 58643736"... > So, maybe vacuum_defer_cleanup_age just highlights some special case > (something with "previous" xids on the left side of zero?).... I think the bug is close to TransactionIdRetreatedBy(). Arguably in FullXidRelativeTo(). Or a more fundamental data representation issue with 64bit xids. To explain, here's a trace to the bottom of GetSnapshotData() leading to the problem: In the case I'm looking at here we end up with 720: oldestfxid = FullXidRelativeTo(latest_completed, oldestxid); and xmin is 255271, both correct. Then in TransactionIdRetreatedBy: /* apply vacuum_defer_cleanup_age */ def_vis_xid_data = TransactionIdRetreatedBy(xmin, vacuum_defer_cleanup_age); things start to be iffy. Because we retreat by vacuum_defer_cleanup_age, which was set to txid_current() in scheme.sql, and the xmin above is that xid, TransactionIdRetreatedBy() first ends up with 0. It then backtracks further to the highest 32 xid (i.e. 4294967295). So far so good. We could obviously end up with values further in the past as well, if vacuum_defer_cleanup_age were larger. Things start to seriously go off the rails when we convert that 32bit xid to 64 bit with: def_vis_fxid = FullXidRelativeTo(latest_completed, def_vis_xid); which returns {value = 18446744073709551615}, which 0-1 in 64bit. However, as 64bit xids are not supposed to wrap around, we're in trouble - it's an xid *very* far into the future. Allowing things to be pruned that shouldn't, because everything is below that. I don't quite know how to best fix this. 4294967295 is the correct result for TransactionIdRetreatedBy() in this case, and it'd not cause problems for FullXidRelativeTo() if we actually had wrapped around the xid counter before (since then we'd just represent it as a fxid "of the proper epoch"). Basically, in contrast to 32bit xids, 64bit xids cannot represent xids from before the start of the universe, whereas with the modulo arithmetic of 32bit that's not a real problem. It's probably not too hard to fix specifically in this one place - we could just clamp vacuum_defer_cleanup_age to be smaller than latest_completed, but it strikes me as as a somewhat larger issue for the 64it xid infrastructure. I suspect this might not be the only place running into problems with such "before the universe" xids. For a bit I was thinking that we should introduce the notion that a FullTransactionId can be from the past. Specifically that when the upper 32bit are all set, we treat the lower 32bit as a value from before xid 0 using the normal 32bit xid arithmetic. But it sucks to add overhead for that everywhere. It might be a bit more palatable to designate one individual value, e.g. 2^32-1<<32, as a "before xid 0" indicator - it doesn't matter how far before the start of the universe an xid point to... FullXidRelativeTo() did assert that the input 32bit xid is in a reasonable range, but unfortunately didn't do a similar check for the 64bit case. Greetings, Andres Freund
pgsql-hackers by date: