Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index. - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Date
Msg-id CAH2-WzmSJw2uZnkcFDAaG1HoX07nG5uTcU3anH8Og9AumMj+Yw@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.  (Noah Misch <noah@leadboat.com>)
Responses Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
List pgsql-hackers
On Sun, Mar 24, 2024 at 10:03 PM Noah Misch <noah@leadboat.com> wrote:
> > You're going to have to "couple" buffer locks in the style of
> > _bt_check_unique() (as well as keeping a buffer lock on "the first
> > leaf page a duplicate might be on" throughout) if you need the test to
> > work reliably.
>
> The amcheck feature has no lock coupling at its "first key on the next page"
> check.  I think that's fine, because amcheck takes one snapshot at the
> beginning and looks for pairs of visible-to-that-snapshot heap tuples with the
> same scan key.  _bt_check_unique(), unlike amcheck, must catch concurrent
> inserts.  If amcheck "checkunique" wanted to detect duplicates that would
> appear when all transactions commit, it would need lock coupling.  (I'm not
> suggesting it do that.)  Do you see a problem with the lack of lock coupling
> at "first key on the next page"?

Practically speaking, no, I see no problems.

> I agree, but perhaps the "first key on the next page" code is more complex
> than general-case code would be.  If the lack of lock coupling is fine, then I
> think memory context lifecycle is the only obstacle making index page
> boundaries special.  Are there factors beyond that?

I believe that my concern back in 2021 was that the general complexity
of cross-page checking was unlikely to be worth it. Note that
nbtsplitloc.c is *maximally* aggressive about avoiding split points
that fall within some group of duplicates, so with a unique index it
should be very rare.

Admittedly, I was probably thinking about the complexity of adding a
bunch of code just to be able to check uniqueness across page
boundaries. I did mention lock coupling by name, but that was more of
a catch-all term for the problems in this area.

> We already have
> state->lowkey kept across pages via MemoryContextAlloc().  Similar lines of
> code could preserve the scan key for checkunique, making the "first key on the
> next page" code unnecessary.

I suspect that I was overly focussed on the index structure itself
back when I made these remarks. I might not have considered that just
using an MVCC snapshot for the TIDs makes the whole process safe,
though that now seems quite obvious.

Separately, I now see that the committed patch just reuses the code
that has long been used to check that things are in the correct order
across page boundaries: this is the bt_right_page_check_scankey check,
which existed in the very earliest versions of amcheck. So while I
agree that we could just keep the original scan key (from the last
item on every leaf page), and then make the check at the start of the
next page instead (as opposed to making it at the end of the previous
leaf page, which is how it works now), it's not obvious that that
would be a good trade-off, all things considered.

It might still be a little better that way around, overall, but you're
not just talking about changing the recently committed checkunique
patch (I think). You're also talking about restructuring the long
established bt_right_page_check_scankey check (otherwise, what's the
point?). I'm not categorically opposed to that, but it's not as if
it'll allow you to throw out a bunch of code -- AFAICT that proposal
doesn't have that clear advantage going for it. The race condition
that is described at great length in bt_right_page_check_scankey isn't
ever going to be a problem for the recently committed checkunique
patch (as you more or less pointed out yourself), but obviously it is
still a concern for the cross-page order check.

In summary, the old bt_right_page_check_scankey check is strictly
concerned with the consistency of a physical data structure (the index
itself), whereas the new checkunique check makes sure that the logical
content of the database is consistent (the index, the heap, and all
associated transaction status metadata have to be consistent). That
means that the concerns that are described at length in
bt_right_page_check_scankey (nor anything like those concerns) don't
apply to the new checkunique check. We agree on all that, I think. But
it's less clear that that presents us with an opportunity to simplify
this patch.

> Adding checkunique raised runtime from 58s to 276s, because it checks
> visibility for every heap tuple.  It could do the heap fetch and visibility
> check lazily, when the index yields two heap TIDs for one scan key.  That
> should give zero visibility checks for this particular test case, and it
> doesn't add visibility checks to bloated-table cases.

The added runtime that you report seems quite excessive to me. I'm
really surprised that the code doesn't manage to avoid visibility
checks in the absence of duplicates that might both have TIDs
considered visible. Lazy visibility checking seems almost essential,
and not just a nice-to-have optimization.

It seems like the implication of everything that you said about
refactoring/moving the check was that doing so would enable this
optimization (at least an implementation along the lines of your
pseudo code). If that was what you intended, then it's not obvious to
me why it is relevant. What, if anything, does it have to do with
making the new checkunique visibility checks happen lazily?

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: documentation structure
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: speed up a logical replica setup