Thread: Amcheck: do rightlink verification with lock coupling

Amcheck: do rightlink verification with lock coupling

From
Andrey Borodin
Date:
Hi!

This is a thread to discuss amcheck feature started in other thread[0].

Currently amcheck is scanning every B-tree level. If verification is done with ShareLock - amcheck will test that each
pageleftlink is pointing to page with rightlink backwards. 
This is important invariant, in our experience it proved to be good at detecting various corruptions.
But this invariant can be detected only if both pages are not modified (e.g. split concurrently).

PFA patch, that in case of suspicion locks two pages and retests that invariant. This allows detection of several
corruptionson standby. 

This patch violates one of amcheck design principles: current code does not ever take more than one page lock. I do not
know:should we hold this rule or should we use more deep check? 

Thanks!

Best regards, Andrey Borodin.

[0] https://www.postgresql.org/message-id/flat/DA9B33AC-53CB-4643-96D4-7A0BBC037FA1@yandex-team.ru

Attachment

Re: Amcheck: do rightlink verification with lock coupling

From
Alvaro Herrera
Date:
On 2019-Sep-12, Andrey Borodin wrote:

> This patch violates one of amcheck design principles: current code
> does not ever take more than one page lock. I do not know: should we
> hold this rule or should we use more deep check?

The check does seem worthwhile to me.

As far as I know, in btree you can lock the right sibling of a page
while keeping lock on the page itself, without risk of deadlock.  So I'm
not sure what's the issue with that.  This is in the README:

  In most cases we release our lock and pin on a page before attempting
  to acquire pin and lock on the page we are moving to.  In a few places
  it is necessary to lock the next page before releasing the current one.
  This is safe when moving right or up, but not when moving left or down
  (else we'd create the possibility of deadlocks).

I suppose Peter was concerned about being able to run amcheck without
causing any trouble at all for concurrent operation; maybe we can retain
that property by making this check optional.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Amcheck: do rightlink verification with lock coupling

From
Tomas Vondra
Date:
On Thu, Sep 12, 2019 at 10:16:12AM -0300, Alvaro Herrera wrote:
>On 2019-Sep-12, Andrey Borodin wrote:
>
>> This patch violates one of amcheck design principles: current code
>> does not ever take more than one page lock. I do not know: should we
>> hold this rule or should we use more deep check?
>
>The check does seem worthwhile to me.
>
>As far as I know, in btree you can lock the right sibling of a page
>while keeping lock on the page itself, without risk of deadlock.  So I'm
>not sure what's the issue with that.  This is in the README:
>
>  In most cases we release our lock and pin on a page before attempting
>  to acquire pin and lock on the page we are moving to.  In a few places
>  it is necessary to lock the next page before releasing the current one.
>  This is safe when moving right or up, but not when moving left or down
>  (else we'd create the possibility of deadlocks).
>
>I suppose Peter was concerned about being able to run amcheck without
>causing any trouble at all for concurrent operation; maybe we can retain
>that property by making this check optional.
>

Peter, any opinion on this proposed amcheck patch? In the other thread
[1] you seemed to agree this is worth checking, and Alvaro's proposal to
make this check optional seems like a reasonable compromise with respect
to the locking.

[1] https://www.postgresql.org/message-id/flat/DA9B33AC-53CB-4643-96D4-7A0BBC037FA1@yandex-team.ru

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: Amcheck: do rightlink verification with lock coupling

From
Peter Geoghegan
Date:
On Fri, Jan 10, 2020 at 5:45 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Peter, any opinion on this proposed amcheck patch? In the other thread
> [1] you seemed to agree this is worth checking, and Alvaro's proposal to
> make this check optional seems like a reasonable compromise with respect
> to the locking.

It's a good idea, and it probably doesn't even need to be made
optional -- lock coupling to the right is safe on a primary, and
should also be safe on standbys (though I should triple check the REDO
routines to be sure). The patch only does lock coupling when it proves
necessary, which ought to only happen when there is a concurrent page
split, which ought to be infrequent. Maybe there is no need to
compromise.

I'm curious why Andrey's corruption problems were not detected by the
cross-page amcheck test, though. We compare the first non-pivot tuple
on the right sibling leaf page with the last one on the target page,
towards the end of bt_target_page_check() -- isn't that almost as good
as what you have here in practice? I probably would have added
something like this myself earlier, if I had reason to think that
verification would be a lot more effective that way.

To be clear, I believe that Andrey wrote this patch for a reason -- I
assume that it makes a noticeable and consistent difference. I would
like to gain a better understanding of why that was for my own
benefit, though. For example, it might be that page deletion was a
factor that made the test I mentioned less effective. I care about the
specifics.

-- 
Peter Geoghegan



Re: Amcheck: do rightlink verification with lock coupling

From
Tomas Vondra
Date:
On Fri, Jan 10, 2020 at 06:49:33PM -0800, Peter Geoghegan wrote:
>On Fri, Jan 10, 2020 at 5:45 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>> Peter, any opinion on this proposed amcheck patch? In the other thread
>> [1] you seemed to agree this is worth checking, and Alvaro's proposal to
>> make this check optional seems like a reasonable compromise with respect
>> to the locking.
>
>It's a good idea, and it probably doesn't even need to be made
>optional -- lock coupling to the right is safe on a primary, and
>should also be safe on standbys (though I should triple check the REDO
>routines to be sure). The patch only does lock coupling when it proves
>necessary, which ought to only happen when there is a concurrent page
>split, which ought to be infrequent. Maybe there is no need to
>compromise.
>

OK, that makes sense.

>I'm curious why Andrey's corruption problems were not detected by the
>cross-page amcheck test, though. We compare the first non-pivot tuple
>on the right sibling leaf page with the last one on the target page,
>towards the end of bt_target_page_check() -- isn't that almost as good
>as what you have here in practice? I probably would have added
>something like this myself earlier, if I had reason to think that
>verification would be a lot more effective that way.
>
>To be clear, I believe that Andrey wrote this patch for a reason -- I
>assume that it makes a noticeable and consistent difference. I would
>like to gain a better understanding of why that was for my own
>benefit, though. For example, it might be that page deletion was a
>factor that made the test I mentioned less effective. I care about the
>specifics.
>

Understood. Is that a reason to not commit of this patch now, though?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: Amcheck: do rightlink verification with lock coupling

From
Peter Geoghegan
Date:
On Sat, Jan 11, 2020 at 4:25 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Understood. Is that a reason to not commit of this patch now, though?

It could use some polishing. Are you interested in committing it?

-- 
Peter Geoghegan



Re: Amcheck: do rightlink verification with lock coupling

From
Tomas Vondra
Date:
On Mon, Jan 13, 2020 at 03:49:40PM -0800, Peter Geoghegan wrote:
>On Sat, Jan 11, 2020 at 4:25 AM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>> Understood. Is that a reason to not commit of this patch now, though?
>
>It could use some polishing. Are you interested in committing it?
>

Not really - as a CFM I was trying to revive patches that seem in good
shape but not moving.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: Amcheck: do rightlink verification with lock coupling

From
Andrey Borodin
Date:
Hi Peter! Sorry for answering so long.

> 11 янв. 2020 г., в 7:49, Peter Geoghegan <pg@bowt.ie> написал(а):
>
> I'm curious why Andrey's corruption problems were not detected by the
> cross-page amcheck test, though. We compare the first non-pivot tuple
> on the right sibling leaf page with the last one on the target page,
> towards the end of bt_target_page_check() -- isn't that almost as good
> as what you have here in practice? I probably would have added
> something like this myself earlier, if I had reason to think that
> verification would be a lot more effective that way.

We were dealing with corruption caused by lost page update. Consider two pages:
A->B
If A is split into A` and C we get:
A`->C->B
But if update of A is lost we still have
A->B, but B backward pointers points to C.
B's smallest key is bigger than hikey of A`, but this do not violate
cross-pages invariant.

Page updates may be lost due to bug in backup software with incremental
backups, bug in storage layer of Aurora-style system, bug in page cache, incorrect
fsync error handling, bug in ssd firmware etc. And our data checksums do not
detect this kind of corruption. BTW I think that it would be better if our
checksums were not stored on a page itseft, they could detect this kind of faults.

We were able to timely detect all those problems on primaries in our testing
environment. But much later found out that some standbys were corrupted,
the problem appeared only when they were promoted.
Also, in nearby thread Grygory Rylov (g0djan) is trying to enable one more
invariant in standby checks.


Thanks!

Best regards, Andrey Borodin.


Re: Amcheck: do rightlink verification with lock coupling

From
Andrey Borodin
Date:

> 14 янв. 2020 г., в 9:47, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
> Page updates may be lost due to bug in backup software with incremental
> backups, bug in storage layer of Aurora-style system, bug in page cache, incorrect
> fsync error handling, bug in ssd firmware etc. And our data checksums do not
> detect this kind of corruption. BTW I think that it would be better if our
> checksums were not stored on a page itseft, they could detect this kind of faults.

Observed it just now.
There is one HA cluster where a node was marked dead. This node was disconnected from cluster, but due to human error
therewas postgres running. 
Node managed to install block-level incremental backup to the chain. And backup software did not detect that backup
stepwas taken from part of timeline that was not in actual timeline's history. 
Result of restoration is:

man-w%/%db R # select bt_index_check('%.pk_%');
 bt_index_check
----------------

(1 row)

Time: 1411.065 ms (00:01.411)
man-w%/%db R # select patched_index_check('%.pk_%');
ERROR:  XX002: left link/right link pair in index "pk_labels" not in agreement
DETAIL:  Block=42705 left block=42707 left link from block=45495.
LOCATION:  bt_recheck_block_rightlink, verify_nbtree.c:621
Time: 671.336 ms

('%' is replacing removed chars)

I understand that this corruption was not introduced by postgres itself, but by combination of bug in two 3rd party
toolsand human error. 
But I can imagine similar corruptions with different root causes.

Best regards, Andrey Borodin.


Re: Amcheck: do rightlink verification with lock coupling

From
Peter Geoghegan
Date:
On Mon, Jan 13, 2020 at 8:47 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> > 11 янв. 2020 г., в 7:49, Peter Geoghegan <pg@bowt.ie> написал(а):
> > I'm curious why Andrey's corruption problems were not detected by the
> > cross-page amcheck test, though. We compare the first non-pivot tuple
> > on the right sibling leaf page with the last one on the target page,
> > towards the end of bt_target_page_check() -- isn't that almost as good
> > as what you have here in practice? I probably would have added
> > something like this myself earlier, if I had reason to think that
> > verification would be a lot more effective that way.
>
> We were dealing with corruption caused by lost page update. Consider two pages:
> A->B
> If A is split into A` and C we get:
> A`->C->B
> But if update of A is lost we still have
> A->B, but B backward pointers points to C.
> B's smallest key is bigger than hikey of A`, but this do not violate
> cross-pages invariant.
>
> Page updates may be lost due to bug in backup software with incremental
> backups, bug in storage layer of Aurora-style system, bug in page cache, incorrect
> fsync error handling, bug in ssd firmware etc. And our data checksums do not
> detect this kind of corruption. BTW I think that it would be better if our
> checksums were not stored on a page itseft, they could detect this kind of faults.

I find this argument convincing. I'll try to get this committed soon.

While you could have used bt_index_parent_check() or heapallindexed to
detect the issue, those two options are a lot more expensive (plus the
former option won't work on a standby). Relaxing the principle that
says that we shouldn't hold multiple buffer locks concurrently doesn't
seem like that much to ask for to get such a clear benefit.

I think that this is safe, but page deletion/half-dead pages need more
thought. In general, the target page could have become "ignorable"
when rechecked.

> We were able to timely detect all those problems on primaries in our testing
> environment. But much later found out that some standbys were corrupted,
> the problem appeared only when they were promoted.
> Also, in nearby thread Grygory Rylov (g0djan) is trying to enable one more
> invariant in standby checks.

I looked at that thread just now, but Grygory didn't say why this true
root check was particularly important, so I can't see much upside.
Plus that seems riskier than what you have in mind here.

Does it have something to do with the true root looking like a deleted
page? The details matter.

--
Peter Geoghegan



Re: Amcheck: do rightlink verification with lock coupling

From
Peter Geoghegan
Date:
On Thu, Jan 16, 2020 at 5:11 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I find this argument convincing. I'll try to get this committed soon.
>
> While you could have used bt_index_parent_check() or heapallindexed to
> detect the issue, those two options are a lot more expensive (plus the
> former option won't work on a standby). Relaxing the principle that
> says that we shouldn't hold multiple buffer locks concurrently doesn't
> seem like that much to ask for to get such a clear benefit.

Having looked into it some more, I now have my doubts about this
patch. REDO routine code like btree_xlog_split() and
btree_xlog_unlink_page() feel entitled to only lock one page at a
time, which invalidates the assumption that we can couple locks on the
leaf level to verify mutual agreement in left and right sibling links
(with only an AccessShareLock on bt_index_check()'s target index
relation). It would definitely be safe for bt_index_check() to so this
were it not running in recovery mode, but that doesn't seem very
useful on its own.

I tried to come up with a specific example of how this could be
unsafe, but my explanation was all over the place (this could have had
something to do with it being Friday evening). Even still, it's up to
the patch to justify why it's safe, and that seems even more
difficult.

--
Peter Geoghegan



Re: Amcheck: do rightlink verification with lock coupling

From
Peter Geoghegan
Date:
On Fri, Jan 17, 2020 at 5:43 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I tried to come up with a specific example of how this could be
> unsafe, but my explanation was all over the place (this could have had
> something to do with it being Friday evening). Even still, it's up to
> the patch to justify why it's safe, and that seems even more
> difficult.

I can't see a way around this problem, so I'm marking the patch rejected.

Thanks
-- 
Peter Geoghegan



Re: Amcheck: do rightlink verification with lock coupling

From
"Andrey M. Borodin"
Date:
Hi!

> 23 янв. 2020 г., в 00:59, Peter Geoghegan <pg@bowt.ie> написал(а):
>
> On Fri, Jan 17, 2020 at 5:43 PM Peter Geoghegan <pg@bowt.ie> wrote:
>> I tried to come up with a specific example of how this could be
>> unsafe, but my explanation was all over the place (this could have had
>> something to do with it being Friday evening). Even still, it's up to
>> the patch to justify why it's safe, and that seems even more
>> difficult.
>
> I can't see a way around this problem, so I'm marking the patch rejected.

In this thread [0] we decided that lock coupling is necessary for btree_xlog_unlink_page().
So, maybe let's revive this patch?

Thanks!

Best regards, Andrey Borodin.

[0]
https://www.postgresql.org/message-id/flat/CAH2-Wzm7T_O%2BVUeo7%3D_NGPncs08z3JEybEwVLZGaASnbfg5vDA%40mail.gmail.com#a4ef597251fed0eb5c2896937bdbd0cc


Re: Amcheck: do rightlink verification with lock coupling

From
Peter Geoghegan
Date:
On Mon, Jul 20, 2020 at 11:46 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> In this thread [0] we decided that lock coupling is necessary for btree_xlog_unlink_page().
> So, maybe let's revive this patch?

Yes, let's do that. Can you resubmit it, please?


Peter Geoghegan



Re: Amcheck: do rightlink verification with lock coupling

From
"Andrey M. Borodin"
Date:

> 4 авг. 2020 г., в 03:44, Peter Geoghegan <pg@bowt.ie> написал(а):
>
> On Mon, Jul 20, 2020 at 11:46 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>> In this thread [0] we decided that lock coupling is necessary for btree_xlog_unlink_page().
>> So, maybe let's revive this patch?
>
> Yes, let's do that. Can you resubmit it, please?

PFA v3.
Changes: fixed few typos in comments.

BTW, reviewing this patch again I cannot understand why we verify link coherence only on leaf level but not for
internalpages? 
I do not see any actual problems here.

Corruption detection power of leftlink-rightlinks on internal pages is diminishingly small compared to leaf pages. But
thereseems to be no reason to exclude internal pages? 

Thanks!

Best regards, Andrey Borodin.

Attachment

Re: Amcheck: do rightlink verification with lock coupling

From
Peter Geoghegan
Date:
On Tue, Aug 4, 2020 at 9:33 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> BTW, reviewing this patch again I cannot understand why we verify link coherence only on leaf level but not for
internalpages?
 
> I do not see any actual problems here.

Well, I thought that it might be a good idea to limit it to the leaf
level, based on the theory that we rarely couple locks on internal
page levels in general. But yeah, that's probably not a good enough
reason to avoid lock coupling on internal pages. It's probably better
to do it everywhere than to explain why we don't do it on the internal
level -- the explanation will probably be confusing. And even if there
was a performance issue, it could only happen when there are
concurrent internal page splits -- but those are supposed to be rare.

Attached is v4, which now checks internal pages (just like leaf
pages). The main other change in this revised version is that we make
the error raised by bt_index_check() match the error used in the old
bt_index_parent_check() case -- we always want to blame the current
target page when amcheck complains (technically the page we blame when
the invariant fails isn't strictly guaranteed to be quite the same
thing as the target, but it's close enough to not really matter in
reality). Other adjustments:

* Added _bt_checkpage() calls for buffers, as is standard practice in nbtree.

* Added protection against locking the same page a second time in the
event of a faulty sibling link -- we should avoid a self-deadlock in
the event of a page that is corrupt in just the wrong way.

* Updated obsolescent comments that claimed that we never couple
buffer locks in amcheck.

I would like to commit something like this in the next day or two.

Thoughts?

--
Peter Geoghegan

Attachment

Re: Amcheck: do rightlink verification with lock coupling

From
"Andrey M. Borodin"
Date:

> 6 авг. 2020 г., в 04:25, Peter Geoghegan <pg@bowt.ie> написал(а):
>
> * Added _bt_checkpage() calls for buffers, as is standard practice in nbtree.
>
> * Added protection against locking the same page a second time in the
> event of a faulty sibling link -- we should avoid a self-deadlock in
> the event of a page that is corrupt in just the wrong way.
>
> * Updated obsolescent comments that claimed that we never couple
> buffer locks in amcheck.
Cool, thanks!
There's mintor typo: missing space in "of_bt_check_unique".

>
> I would like to commit something like this in the next day or two.
>
> Thoughts?

Sounds great! Thanks!

Best regards, Andrey Borodin.




Re: Amcheck: do rightlink verification with lock coupling

From
Peter Geoghegan
Date:
On Wed, Aug 5, 2020 at 9:50 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> Sounds great! Thanks!

I'm afraid that there is another problem, this time with
btree_xlog_split(). It's possible to get false positives when running
the new test continually on a standby. You can see this by running
verification on a standby continually, while the primary runs with a
workload that gets many page splits.

It's easy to see if you apply this patch:

--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -435,6 +435,9 @@ btree_xlog_split(bool newitemonleft,
XLogReaderState *record)
  UnlockReleaseBuffer(lbuf);
  UnlockReleaseBuffer(rbuf);

+ /* trick */
+ pg_usleep(10 * 1000L);
+

The only thing that we can do is adjust the locking in
btree_xlog_split() to match the primary (kind of like commit 9a9db08a,
except with page splits instead of page deletion). Attached is a
revised version of the patch, along with the changes that we'd need to
REDO to make the amcheck patch really work.

I'm not sure if this change to the REDO routine is worth the overhead
or trouble, though. I have to think about it some more.

BTW, the first patch in the series now has a new check for page
deletion -- that was missing from v4.

-- 
Peter Geoghegan

Attachment

Re: Amcheck: do rightlink verification with lock coupling

From
"Andrey M. Borodin"
Date:

> 6 авг. 2020 г., в 21:38, Peter Geoghegan <pg@bowt.ie> написал(а):
>
> On Wed, Aug 5, 2020 at 9:50 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>> Sounds great! Thanks!
>
> I'm afraid that there is another problem, this time with
> btree_xlog_split(). It's possible to get false positives when running
> the new test continually on a standby. You can see this by running
> verification on a standby continually, while the primary runs with a
> workload that gets many page splits.
Yes, I see the problem...
>
> The only thing that we can do is adjust the locking in
> btree_xlog_split() to match the primary (kind of like commit 9a9db08a,
> except with page splits instead of page deletion). Attached is a
> revised version of the patch, along with the changes that we'd need to
> REDO to make the amcheck patch really work.
>
> I'm not sure if this change to the REDO routine is worth the overhead
> or trouble, though. I have to think about it some more.
If we want to check relations between pages we must either apply them together (under locks) or tolerate some fraction
offalse positives. I understand that mitigating and tolerating false positives is nonsense in mathematica sense, but
frompractical point of view it's just OK. 

But having complete solution with no false positives seems much better.

>
> BTW, the first patch in the series now has a new check for page
> deletion -- that was missing from v4.
Yes, seems like that was a bug..

Thanks!

Best regards, Andrey Borodin.




Re: Amcheck: do rightlink verification with lock coupling

From
Peter Geoghegan
Date:
On Thu, Aug 6, 2020 at 10:59 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> But having complete solution with no false positives seems much better.

Agreed. I know that you didn't pursue this for no reason -- having the
check available makes bt_check_index() a lot more valuable in
practice. It detects what is actually a classic example of subtle
B-Tree corruption (left link corruption), which appears in Modern
B-Tree techniques in its discussion of corruption detection. It's
actually the canonical example of how B-Tree corruption can be very
subtle in the real world.

I pushed a cleaned up version of this patch just now. I added some
commentary about this canonical example in header comments for the new
function.

Thanks
--
Peter Geoghegan



Re: Amcheck: do rightlink verification with lock coupling

From
"Andrey M. Borodin"
Date:

> 8 авг. 2020 г., в 23:14, Peter Geoghegan <pg@bowt.ie> написал(а):
>
> I pushed a cleaned up version of this patch just now. I added some
> commentary about this canonical example in header comments for the new
> function.

Thanks for working on this!

Best regards, Andrey Borodin.