Hi Peter,
I re-ran with DEBUG2 messages enabled. Got a bunch of output, but the
last few lines are like this for each index:
DEBUG: level 965868789 leftmost page of index "xxxxx" was found
deleted or half dead
DETAIL: Deleted page found when building scankey from right sibling.
DEBUG: level 966240004 leftmost page of index "xxxxx" was found
deleted or half dead
DETAIL: Deleted page found when building scankey from right sibling.
ERROR: cross page item order invariant violated for index "xxxxx"
DETAIL: Last item on page tid=(xx,xx) page lsn=xxxxxxxxxx
DEBUG: level 967745369 leftmost page of index "xxxxx" was found
deleted or half dead
DETAIL: Deleted page found when building scankey from right sibling.
DEBUG: level 967746918 leftmost page of index "xxxxx" was found
deleted or half dead
DETAIL: Deleted page found when building scankey from right sibling.
ERROR: cross page item order invariant violated for index "xxxxx"
DETAIL: Last item on page tid=(xx,xx) page lsn=xxxxxxxxxx
Not sure if pageinspect might be able to tell anything else useful?
I'd like to find the root cause of the corruption if possible, so this
doesn't happen in other databases.
Also wanted to see if it might be a good idea to add a
CHECK_FOR_INTERRUPTS call to _bt_moveright() so if this does happen
again, at least the session would be killable. I don't have enough
background in the code to know where it's safe to add, or I'd submit a
patch.
Thanks,
James
On Fri, Aug 14, 2020 at 4:33 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Fri, Aug 14, 2020 at 2:03 PM PG Bug reporting form
> <noreply@postgresql.org> wrote:
> > The table has two indexes, so I decided to scan both indexes on all
> > partitions with the bt_index_check function from the amcheck extension. I
> > identified one partition where both indexes throw the following result:
> > ERROR: cross page item order invariant violated for index "xxxxx"
> > DETAIL: Last item on page tid(xx,xx) page lsn=xxxxxxxxxx
>
> This sounds very much like an index with sibling pages that are in the
> wrong order relative to each other. That's totally consistent with
> what you describe with _bt_moveright() -- circular sibling links can
> cause it to just keep going.
>
> It's possible that you'll get a better error with
> bt_index_parent_check(), which might be worth trying. But it probably
> won't give you any additional information.
>
> Note that there is DEBUG1 and DEBUG2 output from amcheck, which might
> give you a few more details. You can "set client_min_messages =
> 'debug2'" in an interactive session that runs bt_index_check() to see
> some additional context. Again, this is unlikely to make all that much
> difference.
>
> --
> Peter Geoghegan