Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index. - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index. |
Date | |
Msg-id | CAH2-WzmSJw2uZnkcFDAaG1HoX07nG5uTcU3anH8Og9AumMj+Yw@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index. (Noah Misch <noah@leadboat.com>) |
Responses |
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
|
List | pgsql-hackers |
On Sun, Mar 24, 2024 at 10:03 PM Noah Misch <noah@leadboat.com> wrote: > > You're going to have to "couple" buffer locks in the style of > > _bt_check_unique() (as well as keeping a buffer lock on "the first > > leaf page a duplicate might be on" throughout) if you need the test to > > work reliably. > > The amcheck feature has no lock coupling at its "first key on the next page" > check. I think that's fine, because amcheck takes one snapshot at the > beginning and looks for pairs of visible-to-that-snapshot heap tuples with the > same scan key. _bt_check_unique(), unlike amcheck, must catch concurrent > inserts. If amcheck "checkunique" wanted to detect duplicates that would > appear when all transactions commit, it would need lock coupling. (I'm not > suggesting it do that.) Do you see a problem with the lack of lock coupling > at "first key on the next page"? Practically speaking, no, I see no problems. > I agree, but perhaps the "first key on the next page" code is more complex > than general-case code would be. If the lack of lock coupling is fine, then I > think memory context lifecycle is the only obstacle making index page > boundaries special. Are there factors beyond that? I believe that my concern back in 2021 was that the general complexity of cross-page checking was unlikely to be worth it. Note that nbtsplitloc.c is *maximally* aggressive about avoiding split points that fall within some group of duplicates, so with a unique index it should be very rare. Admittedly, I was probably thinking about the complexity of adding a bunch of code just to be able to check uniqueness across page boundaries. I did mention lock coupling by name, but that was more of a catch-all term for the problems in this area. > We already have > state->lowkey kept across pages via MemoryContextAlloc(). Similar lines of > code could preserve the scan key for checkunique, making the "first key on the > next page" code unnecessary. I suspect that I was overly focussed on the index structure itself back when I made these remarks. I might not have considered that just using an MVCC snapshot for the TIDs makes the whole process safe, though that now seems quite obvious. Separately, I now see that the committed patch just reuses the code that has long been used to check that things are in the correct order across page boundaries: this is the bt_right_page_check_scankey check, which existed in the very earliest versions of amcheck. So while I agree that we could just keep the original scan key (from the last item on every leaf page), and then make the check at the start of the next page instead (as opposed to making it at the end of the previous leaf page, which is how it works now), it's not obvious that that would be a good trade-off, all things considered. It might still be a little better that way around, overall, but you're not just talking about changing the recently committed checkunique patch (I think). You're also talking about restructuring the long established bt_right_page_check_scankey check (otherwise, what's the point?). I'm not categorically opposed to that, but it's not as if it'll allow you to throw out a bunch of code -- AFAICT that proposal doesn't have that clear advantage going for it. The race condition that is described at great length in bt_right_page_check_scankey isn't ever going to be a problem for the recently committed checkunique patch (as you more or less pointed out yourself), but obviously it is still a concern for the cross-page order check. In summary, the old bt_right_page_check_scankey check is strictly concerned with the consistency of a physical data structure (the index itself), whereas the new checkunique check makes sure that the logical content of the database is consistent (the index, the heap, and all associated transaction status metadata have to be consistent). That means that the concerns that are described at length in bt_right_page_check_scankey (nor anything like those concerns) don't apply to the new checkunique check. We agree on all that, I think. But it's less clear that that presents us with an opportunity to simplify this patch. > Adding checkunique raised runtime from 58s to 276s, because it checks > visibility for every heap tuple. It could do the heap fetch and visibility > check lazily, when the index yields two heap TIDs for one scan key. That > should give zero visibility checks for this particular test case, and it > doesn't add visibility checks to bloated-table cases. The added runtime that you report seems quite excessive to me. I'm really surprised that the code doesn't manage to avoid visibility checks in the absence of duplicates that might both have TIDs considered visible. Lazy visibility checking seems almost essential, and not just a nice-to-have optimization. It seems like the implication of everything that you said about refactoring/moving the check was that doing so would enable this optimization (at least an implementation along the lines of your pseudo code). If that was what you intended, then it's not obvious to me why it is relevant. What, if anything, does it have to do with making the new checkunique visibility checks happen lazily? -- Peter Geoghegan
pgsql-hackers by date: