Thread: HOT chain bug in latestRemovedXid calculation

HOT chain bug in latestRemovedXid calculation

From
Peter Geoghegan
Date:
ISTM that heap_compute_xid_horizon_for_tuples() calculates
latestRemovedXid for index deletion callers without sufficient care.
The function only follows line pointer redirects, which is necessary
but not sufficient to visit all relevant heap tuple headers -- it also
needs to traverse HOT chains, but that doesn't happen. AFAICT
heap_compute_xid_horizon_for_tuples() might therefore fail to produce
a sufficiently recent latestRemovedXid value for the index deletion
operation as a whole. This might in turn lead to the REDO routine
(e.g. btree_xlog_delete()) doing conflict processing incorrectly
during hot standby.

Attached is an instrumentation patch. If I run "make check" with the
patch applied, I get test output failures that can be used to get a
general sense of the problem:

$ cat /code/postgresql/patch/build/src/test/regress/regression.diffs |
grep "works okay this time" | wc -l
382

$ cat /code/postgresql/patch/build/src/test/regress/regression.diffs |
grep "hot chain bug"
+WARNING:  hot chain bug, latestRemovedXid: 2307,
latestRemovedXidWithHotChain: 2316
+WARNING:  hot chain bug, latestRemovedXid: 4468,
latestRemovedXidWithHotChain: 4538
+WARNING:  hot chain bug, latestRemovedXid: 4756,
latestRemovedXidWithHotChain: 4809
+WARNING:  hot chain bug, latestRemovedXid: 5000,
latestRemovedXidWithHotChain: 5001
+WARNING:  hot chain bug, latestRemovedXid: 7683,
latestRemovedXidWithHotChain: 7995
+WARNING:  hot chain bug, latestRemovedXid: 13450,
latestRemovedXidWithHotChain: 13453
+WARNING:  hot chain bug, latestRemovedXid: 10040,
latestRemovedXidWithHotChain: 10041

So out of 389 calls, we see 7 failures on this occasion, which is
typical. Heap pruning usually saves us in practice (since it is highly
correlated with setting LP_DEAD bits on index pages in the first
place), and even when it doesn't it's not particularly likely that the
issue will make the crucial difference for the deletion operation as a
whole.

The code that is now heap_compute_xid_horizon_for_tuples() ran in REDO
routines directly prior to Postgres 12.
heap_compute_xid_horizon_for_tuples() is a descendant of code added by
Simon’s commit a760893d in 2010 -- pretty close to HOT’s initial
introduction. So this has been around for a long time.

--
Peter Geoghegan

Attachment

Re: HOT chain bug in latestRemovedXid calculation

From
Peter Geoghegan
Date:
On Tue, Dec 22, 2020 at 9:52 AM Peter Geoghegan <pg@bowt.ie> wrote:
> ISTM that heap_compute_xid_horizon_for_tuples() calculates
> latestRemovedXid for index deletion callers without sufficient care.
> The function only follows line pointer redirects, which is necessary
> but not sufficient to visit all relevant heap tuple headers -- it also
> needs to traverse HOT chains, but that doesn't happen. AFAICT
> heap_compute_xid_horizon_for_tuples() might therefore fail to produce
> a sufficiently recent latestRemovedXid value for the index deletion
> operation as a whole. This might in turn lead to the REDO routine
> (e.g. btree_xlog_delete()) doing conflict processing incorrectly
> during hot standby.

I attach a concrete fix for this bug. My basic approach is to
restructure the code so that it follows both LP_REDIRECT redirects as
well as HOT chain t_ctid page offset numbers in the same loop. This is
loosely based on similar loops in places like heap_hot_search_buffer()
and heap_prune_chain().

I also replaced the old "conjecture" comments about why it is that our
handling of LP_DEAD line pointers is correct. These comments match
what you'll see in the original 2010 commit (commit a760893d), which
is inappropriate. At the time Simon wrote that comment, a
latestRemovedXid return value of InvalidTransactionId had roughly the
opposite meaning. The meaning changed significantly just a few months
after a760893d, in commit 52010027efc. The old "conjecture" comments
were intended to convey something along the lines of "here is why it
is currently thought necessary to take this conservative approach with
LP_DEAD line pointers". But the comment should say almost the opposite
thing now -- something close to "here is why it's okay that we take
the seemingly lax approach of skipping LP_DEAD line pointers -- that's
actually safe".

The patch has new comments that explain the issue by comparing it to
the approach taken by index AMs such as nbtree during VACUUM
proper/bulk deletion. Index vacuuming can rely on heap pruning records
having generated latestRemovedXid values that obviate any need for
nbtree VACUUM records to explicitly log their own latestRemovedXid
value (which is why nbtree VACUUM cannot include extra "recently dead"
index tuples). This makes it obvious, I think -- LP_DEAD line pointers
in heap pages come from pruning, and pruning generates its own
latestRemovedXid at precisely the point that line pointers become
LP_DEAD.

I would like to commit this patch to v12, the first version that did
this process during original execution rather than in REDO routines.
It seems worth keeping the back branches in sync here. I suspect that
the old approach used prior to Postgres 12 has subtle buglets caused
by inconsistencies during Hot Standby (I have heard rumors). I'd
rather just not go there given the lack of field reports about this
problem.

-- 
Peter Geoghegan

Attachment

Re: HOT chain bug in latestRemovedXid calculation

From
Peter Geoghegan
Date:
On Mon, Dec 28, 2020 at 9:49 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I would like to commit this patch to v12, the first version that did
> this process during original execution rather than in REDO routines.
> It seems worth keeping the back branches in sync here. I suspect that
> the old approach used prior to Postgres 12 has subtle buglets caused
> by inconsistencies during Hot Standby (I have heard rumors). I'd
> rather just not go there given the lack of field reports about this
> problem.

Pushed that a moment ago.

-- 
Peter Geoghegan