Re: Amcheck: do rightlink verification with lock coupling - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Amcheck: do rightlink verification with lock coupling
Date
Msg-id CAH2-WzntedoYDfGFK_TwEOVuLQHhzOuubV996dHKmWkzA101WQ@mail.gmail.com
Whole thread Raw
In response to Re: Amcheck: do rightlink verification with lock coupling  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Amcheck: do rightlink verification with lock coupling  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: Amcheck: do rightlink verification with lock coupling  (Andrey Borodin <x4mmm@yandex-team.ru>)
List pgsql-hackers
On Fri, Jan 10, 2020 at 5:45 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Peter, any opinion on this proposed amcheck patch? In the other thread
> [1] you seemed to agree this is worth checking, and Alvaro's proposal to
> make this check optional seems like a reasonable compromise with respect
> to the locking.

It's a good idea, and it probably doesn't even need to be made
optional -- lock coupling to the right is safe on a primary, and
should also be safe on standbys (though I should triple check the REDO
routines to be sure). The patch only does lock coupling when it proves
necessary, which ought to only happen when there is a concurrent page
split, which ought to be infrequent. Maybe there is no need to
compromise.

I'm curious why Andrey's corruption problems were not detected by the
cross-page amcheck test, though. We compare the first non-pivot tuple
on the right sibling leaf page with the last one on the target page,
towards the end of bt_target_page_check() -- isn't that almost as good
as what you have here in practice? I probably would have added
something like this myself earlier, if I had reason to think that
verification would be a lot more effective that way.

To be clear, I believe that Andrey wrote this patch for a reason -- I
assume that it makes a noticeable and consistent difference. I would
like to gain a better understanding of why that was for my own
benefit, though. For example, it might be that page deletion was a
factor that made the test I mentioned less effective. I care about the
specifics.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Setting min/max TLS protocol in clientside libpq
Next
From: Tom Lane
Date:
Subject: Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)