Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index. - Mailing list pgsql-hackers

From Pavel Borisov
Subject Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Date
Msg-id CALT9ZEGfH_+5Z057KEGpnKE6FE7gssOMRbSvQOkN3G-bYVcmhA@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
List pgsql-hackers
Hi, Alexander!

On Mon, 13 May 2024 at 05:42, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Mon, May 13, 2024 at 12:23 AM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
> On Sat, May 11, 2024 at 4:13 AM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
> > > On May 10, 2024, at 12:05 PM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > The only bt_target_page_check() caller is
> > > bt_check_level_from_leftmost(), which overrides state->target in the
> > > next iteration anyway.  I think the patch is just refactoring to
> > > eliminate the confusion pointer by Peter Geoghegan upthread.
> >
> > I find your argument unconvincing.
> >
> > After bt_target_page_check() returns at line 919, and before bt_check_level_from_leftmost() overrides state->target in the next iteration, bt_check_level_from_leftmost() conditionally fetches an item from the page referenced by state->target.  See line 963.
> >
> > I'm left with four possibilities:
> >
> >
> > 1)  bt_target_page_check() never gets to the code that uses "rightpage" rather than "state->target" in the same iteration where bt_check_level_from_leftmost() conditionally fetches an item from state->target, so the change you're making doesn't matter.
> >
> > 2)  The code prior to v2-0003 was wrong, having changed state->target in an inappropriate way, causing the wrong thing to happen at what is now line 963.  The patch fixes the bug, because state->target no longer gets overwritten where you are now using "rightpage" for the value.
> >
> > 3)  The code used to work, having set up state->target correctly in the place where you are now using "rightpage", but v2-0003 has broken that.
> >
> > 4)  It's been broken all along and your patch just changes from wrong to wrong.
> >
> >
> > If you believe (1) is true, then I'm complaining that you are relying far to much on action at a distance, and that you are not documenting it.  Even with documentation of this interrelationship, I'd be unhappy with how brittle the code is.  I cannot easily discern that the two don't ever happen in the same iteration, and I'm not at all convinced one way or the other.  I tried to set up some Asserts about that, but none of the test cases actually reach the new code, so adding Asserts doesn't help to investigate the question.
> >
> > If (2) is true, then I'm complaining that the commit message doesn't mention the fact that this is a bug fix.  Bug fixes should be clearly documented as such, otherwise future work might assume the commit can be reverted with only stylistic consequences.
> >
> > If (3) is true, then I'm complaining that the patch is flat busted.
> >
> > If (4) is true, then maybe we should revert the entire feature, or have a discussion of mitigation efforts that are needed.
> >
> > Regardless of which of 1..4 you pick, I think it could all do with more regression test coverage.
> >
> >
> > For reference, I said something similar earlier today in another email to this thread:
> >
> > This patch introduces a change that stores a new page into variable "rightpage" rather than overwriting "state->target", which the old implementation most certainly did.  That means that after returning from bt_target_page_check() into the calling function bt_check_level_from_leftmost() the value in state->target is not what it would have been prior to this patch.  Now, that'd be irrelevant if nobody goes on to consult that value, but just 44 lines further down in bt_check_level_from_leftmost() state->target is clearly used.  So the behavior at that point is changing between the old and new versions of the code, and I think I'm within reason to ask if it was wrong before the patch, wrong after the patch, or something else?  Is this a bug being introduced, being fixed, or ... ?
>
> Thank you for your analysis.  I'm inclined to believe in 2, but not
> yet completely sure.  It's really pity that our tests don't cover
> this.  I'm investigating this area.

It seems that I got to the bottom of this.  Changing
BtreeCheckState.target for a cross-page unique constraint check is
wrong, but that happens only for leaf pages.  After that
BtreeCheckState.target is only used for setting the low key.  The low
key is only used for non-leaf pages.  So, that didn't lead to any
visible bug.  I've revised the commit message to reflect this.
 
I agree with your analysis regarding state->target:
- when the unique check is on, state->target was reassigned only for the leaf pages (under P_ISLEAF(topaque) in bt_target_page_check).
- in this level (leaf) in bt_check_level_from_leftmost() this value of state->target was used to get state->lowkey. Then it was reset (in the next iteration of do loop in in bt_check_level_from_leftmost()
- state->lowkey lives until the end of pages level (leaf) iteration cycle. Then, low-key is reset (state->lowkey = NULL in the end of  bt_check_level_from_leftmost())
- state->lowkey is used only in bt_child_check/bt_child_highkey_check. Both are called only from non-leaf pages iteration cycles (under P_ISLEAF(topaque))
- Also there is a check (rightblock_number != P_NONE) in before getting rightpage into state->target in bt_target_page_check() that ensures us that rightpage indeed exists and getting this (unused) lowkey in bt_check_level_from_leftmost will not invoke any page reading errors.

I'm pretty sure that there was no bug in this, not just the bug was hidden.

Indeed re-assigning state->target in leaf page iteration for cross-page unique check was not beautiful, and Peter pointed out this. In my opinion the patch 0003 is a pure code refactoring. 

As for the cross-page check regression/TAP testing, this test had problems since the btree page layout is not fixed (especially it's different on 32-bit arch). I had a variant for testing cross-page check when the test was yet regression one upthread for both 32/64 bit architectures. I remember it was decided not to include it due to complications and low impact for testing the corner case of very rare cross-page duplicates. (There were also suggestions to drop cross-page duplicates check at all, which I didn't agree 2 years ago, but still it can make sense)

Separately, I propose to avoid getting state->lowkey for leaf pages at all as it's unused. PFA is a simple patch for this. (I don't add it to the current patch set as I believe it has nothing to do with UNIQUE constraint check, rather it improves the previous btree amcheck code)

Best regards,
Pavel Borisov,
Supabase

Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Direct SSL connection with ALPN and HBA rules
Next
From: Étienne BERSAC
Date:
Subject: Re: Allowing additional commas between columns, and at the end of the SELECT clause