Thread: HOT chain bug in latestRemovedXid calculation
ISTM that heap_compute_xid_horizon_for_tuples() calculates latestRemovedXid for index deletion callers without sufficient care. The function only follows line pointer redirects, which is necessary but not sufficient to visit all relevant heap tuple headers -- it also needs to traverse HOT chains, but that doesn't happen. AFAICT heap_compute_xid_horizon_for_tuples() might therefore fail to produce a sufficiently recent latestRemovedXid value for the index deletion operation as a whole. This might in turn lead to the REDO routine (e.g. btree_xlog_delete()) doing conflict processing incorrectly during hot standby. Attached is an instrumentation patch. If I run "make check" with the patch applied, I get test output failures that can be used to get a general sense of the problem: $ cat /code/postgresql/patch/build/src/test/regress/regression.diffs | grep "works okay this time" | wc -l 382 $ cat /code/postgresql/patch/build/src/test/regress/regression.diffs | grep "hot chain bug" +WARNING: hot chain bug, latestRemovedXid: 2307, latestRemovedXidWithHotChain: 2316 +WARNING: hot chain bug, latestRemovedXid: 4468, latestRemovedXidWithHotChain: 4538 +WARNING: hot chain bug, latestRemovedXid: 4756, latestRemovedXidWithHotChain: 4809 +WARNING: hot chain bug, latestRemovedXid: 5000, latestRemovedXidWithHotChain: 5001 +WARNING: hot chain bug, latestRemovedXid: 7683, latestRemovedXidWithHotChain: 7995 +WARNING: hot chain bug, latestRemovedXid: 13450, latestRemovedXidWithHotChain: 13453 +WARNING: hot chain bug, latestRemovedXid: 10040, latestRemovedXidWithHotChain: 10041 So out of 389 calls, we see 7 failures on this occasion, which is typical. Heap pruning usually saves us in practice (since it is highly correlated with setting LP_DEAD bits on index pages in the first place), and even when it doesn't it's not particularly likely that the issue will make the crucial difference for the deletion operation as a whole. The code that is now heap_compute_xid_horizon_for_tuples() ran in REDO routines directly prior to Postgres 12. heap_compute_xid_horizon_for_tuples() is a descendant of code added by Simon’s commit a760893d in 2010 -- pretty close to HOT’s initial introduction. So this has been around for a long time. -- Peter Geoghegan
Attachment
On Tue, Dec 22, 2020 at 9:52 AM Peter Geoghegan <pg@bowt.ie> wrote: > ISTM that heap_compute_xid_horizon_for_tuples() calculates > latestRemovedXid for index deletion callers without sufficient care. > The function only follows line pointer redirects, which is necessary > but not sufficient to visit all relevant heap tuple headers -- it also > needs to traverse HOT chains, but that doesn't happen. AFAICT > heap_compute_xid_horizon_for_tuples() might therefore fail to produce > a sufficiently recent latestRemovedXid value for the index deletion > operation as a whole. This might in turn lead to the REDO routine > (e.g. btree_xlog_delete()) doing conflict processing incorrectly > during hot standby. I attach a concrete fix for this bug. My basic approach is to restructure the code so that it follows both LP_REDIRECT redirects as well as HOT chain t_ctid page offset numbers in the same loop. This is loosely based on similar loops in places like heap_hot_search_buffer() and heap_prune_chain(). I also replaced the old "conjecture" comments about why it is that our handling of LP_DEAD line pointers is correct. These comments match what you'll see in the original 2010 commit (commit a760893d), which is inappropriate. At the time Simon wrote that comment, a latestRemovedXid return value of InvalidTransactionId had roughly the opposite meaning. The meaning changed significantly just a few months after a760893d, in commit 52010027efc. The old "conjecture" comments were intended to convey something along the lines of "here is why it is currently thought necessary to take this conservative approach with LP_DEAD line pointers". But the comment should say almost the opposite thing now -- something close to "here is why it's okay that we take the seemingly lax approach of skipping LP_DEAD line pointers -- that's actually safe". The patch has new comments that explain the issue by comparing it to the approach taken by index AMs such as nbtree during VACUUM proper/bulk deletion. Index vacuuming can rely on heap pruning records having generated latestRemovedXid values that obviate any need for nbtree VACUUM records to explicitly log their own latestRemovedXid value (which is why nbtree VACUUM cannot include extra "recently dead" index tuples). This makes it obvious, I think -- LP_DEAD line pointers in heap pages come from pruning, and pruning generates its own latestRemovedXid at precisely the point that line pointers become LP_DEAD. I would like to commit this patch to v12, the first version that did this process during original execution rather than in REDO routines. It seems worth keeping the back branches in sync here. I suspect that the old approach used prior to Postgres 12 has subtle buglets caused by inconsistencies during Hot Standby (I have heard rumors). I'd rather just not go there given the lack of field reports about this problem. -- Peter Geoghegan
Attachment
On Mon, Dec 28, 2020 at 9:49 PM Peter Geoghegan <pg@bowt.ie> wrote: > I would like to commit this patch to v12, the first version that did > this process during original execution rather than in REDO routines. > It seems worth keeping the back branches in sync here. I suspect that > the old approach used prior to Postgres 12 has subtle buglets caused > by inconsistencies during Hot Standby (I have heard rumors). I'd > rather just not go there given the lack of field reports about this > problem. Pushed that a moment ago. -- Peter Geoghegan