Re: [HACKERS] A design for amcheck heapam verification - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: [HACKERS] A design for amcheck heapam verification
Date
Msg-id CABOikdMDr4r_MWwRSKhSZxNuQKi0JTHX6ANGWtAZMkZEDKBkEw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] A design for amcheck heapam verification  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: [HACKERS] A design for amcheck heapam verification  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers


On Tue, Mar 27, 2018 at 8:50 AM, Peter Geoghegan <pg@bowt.ie> wrote:
On Fri, Mar 23, 2018 at 7:13 AM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> I've just flipped patch to WoA. But if above issues will be fixed I think that patch is ready for committer.

Attached is v7, which has the small tweaks that you suggested.

Thank you for the review. I hope that this can be committed shortly.


Sorry for coming a bit too late on this thread, but I started looking at 0002 patch. 

  *
+ * When index-to-heap verification is requested, a Bloom filter is used to
+ * fingerprint all tuples in the target index, as the index is traversed to
+ * verify its structure.  A heap scan later verifies the presence in the heap
+ * of all index tuples fingerprinted within the Bloom filter.
+ *

Is that correct? Aren't we actually verifying the presence in the index of all
heap tuples?

@@ -116,37 +140,47 @@ static inline bool invariant_leq_nontarget_offset(BtreeCheckState *state,
 static Page palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum);
 
 /*
- * bt_index_check(index regclass)
+ * bt_index_check(index regclass, heapallindexed boolean)

Can we come up with a better name for heapallindexed? May be "check_heapindexed"?

+
+ /*
+ * Register our own snapshot in !readonly case, rather than asking
+ * IndexBuildHeapScan() to do this for us later.  This needs to happen
+ * before index fingerprinting begins, so we can later be certain that
+ * index fingerprinting should have reached all tuples returned by
+ * IndexBuildHeapScan().
+ */
+ if (!state->readonly)
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ }
+

So this looks safe. In !readonly mode, we take the snapshot *before*
fingerprinting the index. Since we're using MVCC snapshot, any tuple which is
visible to heapscan must be reachable via indexscan too. So we must find the
index entry for every heap tuple returned by the scan.

What I am not sure about is whether we can really examine an index which is
valid but whose indcheckxmin hasn't crossed our xmin horizon? Notice that this
amcheck might be running in a transaction block, probably in a repeatable-read
isolation level and hence GetTransactionSnapshot() might actually return a
snapshot which can't yet read the index consistently. In practice, this is
quite unlikely, but I think we should check for that case if we agree that it
could be a problem.

The case with readonly mode is also interesting. Since we're scanning heap with
SnapshotAny, heapscan may return us tuples which are RECENTLY_DEAD. So the
question is: can this happen?

- some concurrent index scan sees a heaptuple as DEAD and marks the index
  entry as LP_DEAD
- our index fingerprinting sees index tuple as LP_DEAD
- our heap scan still sees the heaptuple as RECENTLY_DEAD

Now that looks improbable given that we compute OldestXmin after our index
fingerprinting was done i.e between step 2 and 3 and hence if a tuple looked
DEAD to some OldestXmin/RecentGlobalXmin computed before we computed our
OldestXmin, then surely our OldestXmin should also see the tuple DEAD. Or is
there a corner case that we're missing?

Are there any interesting cases around INSERT_IN_PROGRESS/DELETE_IN_PROGRESS
tuples, especially if those tuples were inserted/deleted by our own
transaction? It probably worth thinking.
 
Apart from that, comments in IndexBuildHeapRangeScan() claim that the function
is called only with ShareLock and above, which is no longer true. We should
check if that has any side-effects. I can't think of any, but better to verify
and update the comments to reflect new reality,

The partial indexes look fine since the non-interesting tuples never get called
back.

One thing that worth documenting/noting is the fact that a !readonly check will
run with a long-duration registered snapshot, thus holding OldestXmin back. Is
there anything we can do to lessen that burden like telling other backends to
ignore our xmin while computing OldestXmin (like vacuum does)?

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Teodor Sigaev
Date:
Subject: Re: PATCH: Exclude temp relations from base backup
Next
From: "Jonathan S. Katz"
Date:
Subject: PostgreSQL 11 Release Management Team & Feature Freeze