Re: pg_amcheck contrib application - Mailing list pgsql-hackers

From Robert Haas
Subject Re: pg_amcheck contrib application
Date
Msg-id CA+TgmobA_x0A1jh+V=vD1mHB+pzFaR2wTzng4yoKzgr_hvNOcg@mail.gmail.com
Whole thread Raw
In response to Re: pg_amcheck contrib application  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: pg_amcheck contrib application
List pgsql-hackers
On Wed, Mar 24, 2021 at 9:12 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 24, 2021 at 2:13 AM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
> > The visibility rules fix is different in v11, relying on a visibility check which more closely follows the
implementationof HeapTupleSatisfiesVacuumHorizon.
 
>
> Hmm. The header comment you wrote says "If a tuple might not be
> visible to any running transaction, then we must not check it." But, I
> don't find that statement very clear: does it mean "if there could be
> even one transaction to which this tuple is not visible, we must not
> check it"? Or does it mean "if the number of transactions that can see
> this tuple could potentially be zero, then we must not check it"? I
> don't think either of those is actually what we care about. I think
> what we should be saying is "if the tuple could have been inserted by
> a transaction that also added a column to the table, but which
> ultimately did not commit, then the table's current TupleDesc might
> differ from the one used to construct this tuple, so we must not check
> it."

Hit send too soon. And I was wrong, too. Wahoo. Thinking about the
buildfarm failure, I realized that there's a second danger here,
unrelated to the possibility of different TupleDescs, which we talked
about before: if the tuple is dead, we can't safely follow any TOAST
pointers, because the TOAST chunks might disappear at any time. So
technically we could split the return value up into something
three-way: if the inserted is known to have committed, we can check
the tuple itself, because the TupleDesc has to be reasonable. And, if
the tuple is known not to be dead already, and known not to be in a
state where it could become dead while we're doing stuff, we can
follow the TOAST pointer. I'm not sure whether it's worth trying to be
that fancy or not.

If we were only concerned about the mismatched-TupleDesc problem, this
function could return true in a lot more cases. Once we get to the
comment that says "Okay, the inserter committed..." we could just
return true. Similarly, the HEAP_MOVED_IN and HEAP_MOVED_OFF cases
could just skip all the interior test and return true, because if the
tuple is being moved, the original inserter has to have committed.
Conversely, however, the !HeapTupleHeaderXminCommitted ->
TransactionIdIsCurrentTransactionId case probably ought to always
return false. One could argue otherwise: if we're the inserter, then
the only in-progress transaction that might have changed the TupleDesc
is us, so we could just consider this case to be a true return value
also, regardless of what's going on with xmax. In that case, we're not
asking "did the inserter definitely commit?" but "are the inserter's
possible DDL changes definitely visible to us?" which might be an OK
definition too.

However, the could-the-TOAST-data-disappear problem is another story.
I don't see how we can answer that question correctly with the logic
you've got here, because you have no XID threshold. Consider the case
where we reach this code:

+       if (!(tuphdr->t_infomask & HEAP_XMAX_COMMITTED))
+       {
+               if
(TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuphdr)))
+                       return true;            /*
HEAPTUPLE_DELETE_IN_PROGRESS */
+               else if
(!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuphdr)))
+
+                       /*
+                        * Not in Progress, Not Committed, so either
Aborted or crashed
+                        */
+                       return true;            /* HEAPTUPLE_LIVE */
+
+               /* At this point the xmax is known committed */
+       }

If we reach the case where the code comment says
HEAPTUPLE_DELETE_IN_PROGRESS, we know that the tuple isn't dead right
now, and so the TOAST tuples aren't dead either. But, by the time we
go try to look at the TOAST tuples, they might have become dead and
been pruned away, because the deleting transaction can commit at any
time, and after that pruning can happen at any time. Our only
guarantee that that won't happen is if the deleting XID is new enough
that it's invisible to some snapshot that our backend has registered.
That's approximately why HeapTupleSatisfiesVacuumHorizon needs to set
*dead_after in this case and one other, and I think you have the same
requirement.

I just noticed that this whole thing has another, related problem:
check_tuple_header_and_visibilty() and check_tuple_attribute() are
called from within check_tuple(), which is called while we hold a
buffer lock on the heap page. We should not be going and doing complex
operations that might take their own buffer locks - like TOAST index
checks - while we're holding an lwlock. That's going to have to be
changed so that the TOAST pointer checking happens after
UnlockReleaseBuffer(); in other words, we'll need to remember the
TOAST pointers to go look up and actually look them up after
UnlockReleaseBuffer(). But, when we do that, then the HEAPTUPLE_LIVE
case above has the same race condition that is already present in the
HEAPTUPLE_DELETE_IN_PROGRESS case: after we release the buffer pin,
some other transaction might delete the tuple and the associated TOAST
tuples, and they might then commit, and the tuple might become dead
and get pruned away before we check the TOAST table.

On a related note, I notice that your latest patch removes all the
logic that complains about XIDs being out of bounds. I don't think
that's good. Those seem like important checks. They're important for
finding problems with the relation,  and I think we probably also need
them because of the XID-horizon issue mentioned above. One possible
way of looking at it is to say that the XID_BOUNDS_OK case has two
sub-cases: either the XID is within bounds and is one that cannot
become all-visible concurrently because it's not visible to all of our
backend's registered snapshots, or it's within bounds but does have
the possibility of becoming all-visible. In the former case, if it
appears as XMAX we can safely follow TOAST pointers found within the
tuple; in the latter case, we can't.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb
Next
From: Tomas Vondra
Date:
Subject: Re: PoC/WIP: Extended statistics on expressions