Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains
Date
Msg-id CAH2-Wz=xb_7cSHeqG7h8-h=941Yq7oPx7v8Rojv=sxJ=_-yBjQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains
List pgsql-hackers
On Thu, Oct 19, 2017 at 7:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> The commit message for a5736bf7 doesn't say anything about a race; it
> just claims that it is fixing traversal of half-frozen update chains,
> without making any reference to how such a thing as a half-frozen
> update chain came to exist in the first place.  I think the commit
> that talks about that race condition is 20b65522.

Well, the race was that the wrong tuple is pruned, because of naivety
in the heap_prune_chain() update chain handling, which is what started
work that became commit a5736bf7. That's by far the worst consequence
of the traversal of half-frozen update chain business that a5736bf7's
commit message describes, since we know it leads to wrong answers to
index scans (and REINDEX fails with "failed to find parent tuple").
It's the only consequence that's been directly observed, but probably
not the only one that's possible. Most other places that suffer the
problem are in obscure-to-users paths like EvalPlanQual().

> /me studies the problem for a while.
>
> What's bothering me about this is: how is cutoff_xid managing to be a
> new enough transaction ID for this to happen in the first place?  The
> cutoff XID should certainly be older than anything any current
> snapshot can see, because it's computed using GetOldestXmin() -- which
> means that XIDs older than the cutoff can't still be running (except
> maybe if old_snapshot_threshold is set to a non-default value).

The problem that we directly observed during pruning, which caused
"failed to find parent tuple" even after the first freeze-the-dead
commit (commit 20b65522), was that the mixture of frozen and unfrozen
tuples in a hot chain caused heap_prune_chain() to make an incorrect
conclusion about the continuity of a HOT chain. We'd prune the wrong
tuples -- it failed to find the visible tuple at the end of the HOT
chain, and respect that that meant it could not prune down to a stub
ItemId (blow away what it *incorrectly* thought was an entire HOT
chain).

Now that the problem is fixed by a5736bf7, this test case will prune
and get an LP_REDIRECT ItemId (not an LP_DEAD one), as we're no longer
confused about the continuity of HOT chains within heap_prune_chain().

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
Next
From: Andres Freund
Date:
Subject: Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA