Re: BUG #16582: Logical index corruption leading to apparent index scan infinite loop - Mailing list pgsql-bugs

From James Lucas
Subject Re: BUG #16582: Logical index corruption leading to apparent index scan infinite loop
Date
Msg-id CAAFmbbOnCtds-Q5vOAmTMBm5sAvBpQhc474zq+LMCidSjgt11A@mail.gmail.com
Whole thread Raw
In response to Re: BUG #16582: Logical index corruption leading to apparent index scan infinite loop  (James Lucas <jlucasdba@gmail.com>)
List pgsql-bugs
Forgot to say, I don't think I can run bt_index_parent_check() right
now due to the broader locks required.  I will try to get a run in if
I get an opportunity.

Thanks,
James

On Mon, Aug 17, 2020 at 10:51 AM James Lucas <jlucasdba@gmail.com> wrote:
>
> Hi Peter,
>
> I re-ran with DEBUG2 messages enabled. Got a bunch of output, but the
> last few lines are like this for each index:
>
> DEBUG: level 965868789 leftmost page of index "xxxxx" was found
> deleted or half dead
> DETAIL: Deleted page found when building scankey from right sibling.
> DEBUG: level 966240004 leftmost page of index "xxxxx" was found
> deleted or half dead
> DETAIL: Deleted page found when building scankey from right sibling.
> ERROR: cross page item order invariant violated for index "xxxxx"
> DETAIL: Last item on page tid=(xx,xx) page lsn=xxxxxxxxxx
>
> DEBUG: level 967745369 leftmost page of index "xxxxx" was found
> deleted or half dead
> DETAIL: Deleted page found when building scankey from right sibling.
> DEBUG: level 967746918 leftmost page of index "xxxxx" was found
> deleted or half dead
> DETAIL: Deleted page found when building scankey from right sibling.
> ERROR: cross page item order invariant violated for index "xxxxx"
> DETAIL: Last item on page tid=(xx,xx) page lsn=xxxxxxxxxx
>
>
> Not sure if pageinspect might be able to tell anything else useful?
> I'd like to find the root cause of the corruption if possible, so this
> doesn't happen in other databases.
>
> Also wanted to see if it might be a good idea to add a
> CHECK_FOR_INTERRUPTS call to _bt_moveright() so if this does happen
> again, at least the session would be killable.  I don't have enough
> background in the code to know where it's safe to add, or I'd submit a
> patch.
>
> Thanks,
> James
>
> On Fri, Aug 14, 2020 at 4:33 PM Peter Geoghegan <pg@bowt.ie> wrote:
> >
> > On Fri, Aug 14, 2020 at 2:03 PM PG Bug reporting form
> > <noreply@postgresql.org> wrote:
> > > The table has two indexes, so I decided to scan both indexes on all
> > > partitions with the bt_index_check function from the amcheck extension.  I
> > > identified one partition where both indexes throw the following result:
> > > ERROR: cross page item order invariant violated for index "xxxxx"
> > > DETAIL: Last item on page tid(xx,xx) page lsn=xxxxxxxxxx
> >
> > This sounds very much like an index with sibling pages that are in the
> > wrong order relative to each other. That's totally consistent with
> > what you describe with _bt_moveright() -- circular sibling links can
> > cause it to just keep going.
> >
> > It's possible that you'll get a better error with
> > bt_index_parent_check(), which might be worth trying. But it probably
> > won't give you any additional information.
> >
> > Note that there is DEBUG1 and DEBUG2 output from amcheck, which might
> > give you a few more details. You can "set client_min_messages =
> > 'debug2'" in an interactive session that runs bt_index_check() to see
> > some additional context. Again, this is unlikely to make all that much
> > difference.
> >
> > --
> > Peter Geoghegan



pgsql-bugs by date:

Previous
From: James Lucas
Date:
Subject: Re: BUG #16582: Logical index corruption leading to apparent index scan infinite loop
Next
From: "David G. Johnston"
Date:
Subject: Re: Weird behaviour after update from 12.2 to 12.3 version