Re: [HACKERS] A design for amcheck heapam verification - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: [HACKERS] A design for amcheck heapam verification
Date
Msg-id CABOikdPNVBWh1tDWK_T1ZLCyjftpj5tDiFmqqokucjk3tjMfxw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] A design for amcheck heapam verification  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: [HACKERS] A design for amcheck heapam verification  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers


On Wed, Mar 28, 2018 at 2:48 AM, Peter Geoghegan <pg@bowt.ie> wrote:
On Tue, Mar 27, 2018 at 6:48 AM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
> + * When index-to-heap verification is requested, a Bloom filter is used to
> + * fingerprint all tuples in the target index, as the index is traversed to
> + * verify its structure.  A heap scan later verifies the presence in the
> heap
> + * of all index tuples fingerprinted within the Bloom filter.
> + *
>
> Is that correct? Aren't we actually verifying the presence in the index of
> all
> heap tuples?

I think that you could describe it either way. We're verifying the
presence of heap tuples in the heap that ought to have been in the
index (that is, most of those that were fingerprinted).

Hmm Ok. It still sounds backwards to me, but then English is not my first or even second language. "A heap scan later verifies the presence in the heap of all index tuples fingerprinted" sounds as if we expect to find all fingerprinted index tuples in the heap. But in reality, we check if all heap tuples have a fingerprinted index tuple.
 

You're right - there is a narrow window for REPEATABLE READ and
SERIALIZABLE transactions. This is a regression in v6, the version
removed the TransactionXmin test.

I am tempted to fix this by calling GetLatestSnapshot() instead of
GetTransactionSnapshot(). However, that has a problem of its own -- it
won't work in parallel mode, and we're dealing with parallel
restricted functions, not parallel unsafe functions. I don't want to
change that to fix such a narrow issue. IMV, a better fix is to treat
this as a serialization failure. Attached patch, which applies on top
of v7, shows what I mean.

Ok. I am happy with that. 
 

I think that this bug is practically impossible to hit, because we use
the xmin from the pg_index tuple during "is index safe to use?"
indcheckxmin/TransactionXmin checks (the patch that I attached adds a
similar though distinct check), which raises another question for a
REPEATABLE READ xact. That question is: How is a REPEATABLE READ
transaction supposed to see the pg_index entry to get the index
relation's oid to call a verification function in the first place?

Well pg_index entry will be visible and should be visible. Otherwise how do you even maintain a newly created index. I am not sure, but I guess we take fresh MVCC snapshots for catalog searches, even in RR transactions.
 
My
point is that there is no need for a more complicated solution than
what I propose.

I agree on that.
 

I don't think so. The way we compute OldestXmin for
IndexBuildHeapRangeScan() is rather like a snapshot acquisition --
GetOldestXmin() locks the proc array in shared mode. As you pointed
out, the fact that it comes after everything else (fingerprinting)
means that it must be equal to or later than what index scans saw,
that allowed them to do the kill_prior_tuple() stuff (set LP_DEAD
bits).


That's probably true. 
 

> Are there any interesting cases around INSERT_IN_PROGRESS/DELETE_IN_PROGRESS
> tuples, especially if those tuples were inserted/deleted by our own
> transaction? It probably worth thinking.

Anything here that you would like to check? I understand that you may see such tuples only for catalog tables.
 

IndexBuildHeapRangeScan() doesn't mention anything about CIC's heap
ShareUpdateExclusiveLock (it just says SharedLock), because that lock
strength doesn't have anything to do with IndexBuildHeapRangeScan()
when it operates with an MVCC snapshot. I think that this means that
this patch doesn't need to update comments within
IndexBuildHeapRangeScan(). Perhaps that's a good idea, but it seems
independent.

Ok, I agree. But note that we are now invoking that code with AccessShareLock() whereas the existing callers either use ShareLock or ShareUpdateExclusiveLock. That's probably does not matter, but it's a change worth noting.
 

> Is
> there anything we can do to lessen that burden like telling other backends
> to
> ignore our xmin while computing OldestXmin (like vacuum does)?

I don't think so. The transaction involved is still an ordinary user
transaction.

While that's true, I am thinking if there is anything we can do to stop this a consistency-checking utility to create other non-trivial side effects on the state of the database, even if that means we can not check all heap tuples. For example, can there be a way by which we allow concurrent vacuum or HOT prune to continue to prune away dead tuples, even if that means running bt_check_index() is some specialised way (such as not allowing in a transaction block) and the heap scan  might miss out some tuples? I don't know answer to that, but given that sometimes bt_check_index() may take hours if not days to finish, it seems worth thinking or at least documenting the side-effects somewhere.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Yugo Nagata
Date:
Subject: Re: [HACKERS] [PATCH] Lockable views
Next
From: Etsuro Fujita
Date:
Subject: Re: Oddity in COPY FROM handling of check constraints on partitiontables