Re: HOT chain bug in latestRemovedXid calculation - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: HOT chain bug in latestRemovedXid calculation
Date
Msg-id CAH2-WzmS11O=Y3q9wmUmwUUazXnpqKxAorNuySZDLJdD_3yxNA@mail.gmail.com
Whole thread Raw
In response to HOT chain bug in latestRemovedXid calculation  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: HOT chain bug in latestRemovedXid calculation
List pgsql-hackers
On Tue, Dec 22, 2020 at 9:52 AM Peter Geoghegan <pg@bowt.ie> wrote:
> ISTM that heap_compute_xid_horizon_for_tuples() calculates
> latestRemovedXid for index deletion callers without sufficient care.
> The function only follows line pointer redirects, which is necessary
> but not sufficient to visit all relevant heap tuple headers -- it also
> needs to traverse HOT chains, but that doesn't happen. AFAICT
> heap_compute_xid_horizon_for_tuples() might therefore fail to produce
> a sufficiently recent latestRemovedXid value for the index deletion
> operation as a whole. This might in turn lead to the REDO routine
> (e.g. btree_xlog_delete()) doing conflict processing incorrectly
> during hot standby.

I attach a concrete fix for this bug. My basic approach is to
restructure the code so that it follows both LP_REDIRECT redirects as
well as HOT chain t_ctid page offset numbers in the same loop. This is
loosely based on similar loops in places like heap_hot_search_buffer()
and heap_prune_chain().

I also replaced the old "conjecture" comments about why it is that our
handling of LP_DEAD line pointers is correct. These comments match
what you'll see in the original 2010 commit (commit a760893d), which
is inappropriate. At the time Simon wrote that comment, a
latestRemovedXid return value of InvalidTransactionId had roughly the
opposite meaning. The meaning changed significantly just a few months
after a760893d, in commit 52010027efc. The old "conjecture" comments
were intended to convey something along the lines of "here is why it
is currently thought necessary to take this conservative approach with
LP_DEAD line pointers". But the comment should say almost the opposite
thing now -- something close to "here is why it's okay that we take
the seemingly lax approach of skipping LP_DEAD line pointers -- that's
actually safe".

The patch has new comments that explain the issue by comparing it to
the approach taken by index AMs such as nbtree during VACUUM
proper/bulk deletion. Index vacuuming can rely on heap pruning records
having generated latestRemovedXid values that obviate any need for
nbtree VACUUM records to explicitly log their own latestRemovedXid
value (which is why nbtree VACUUM cannot include extra "recently dead"
index tuples). This makes it obvious, I think -- LP_DEAD line pointers
in heap pages come from pruning, and pruning generates its own
latestRemovedXid at precisely the point that line pointers become
LP_DEAD.

I would like to commit this patch to v12, the first version that did
this process during original execution rather than in REDO routines.
It seems worth keeping the back branches in sync here. I suspect that
the old approach used prior to Postgres 12 has subtle buglets caused
by inconsistencies during Hot Standby (I have heard rumors). I'd
rather just not go there given the lack of field reports about this
problem.

-- 
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Let's start using setenv()
Next
From: Masahiko Sawada
Date:
Subject: Re: New IndexAM API controlling index vacuum strategies