Thread: BUG #16582: Logical index corruption leading to apparent index scan infinite loop

BUG #16582: Logical index corruption leading to apparent index scan infinite loop

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      16582
Logged by:          James Lucas
Email address:      jlucasdba@gmail.com
PostgreSQL version: 11.5
Operating system:   Centos 7
Description:

I have an 11.5 database with a declaratively partitioned table.  The table
has around 40 partitions.  We began observing a number of long running
sessions (over a week), all executing the same select query against this
table. Attempting to kill the sessions with pg_terminate_backend has no
effect.

I pulled several stack traces against the backend processes executing this
query and traced the issue back to the _bt_moveright function in
src/backend/access/nbtree/nbtsearch.c.  This function appears to be
implemented as an infinite loop, which breaks when it finds a page that
matches what it's looking for.  So my working theory is that I have a
corrupt index where that condition is never properly satisfied, resulting in
an endless loop.

I found a thread in the mailing list archive about a similar issue in one of
the other functions in this file

(https://www.postgresql.org/message-id/flat/CAM3SWZT7dCes%3DuOA3NAHYBA1kth%3Db4pXkszNLMPVtNAAYUp_wg%40mail.gmail.com#e9bf163da8f50f0f2845b59250da6ab5).
 The _bt_moveright function does not have CHECK_FOR_INTERRUPTS in its loop,
which would explain why the sessions don't respond to
pg_terminate_backend.

The table has two indexes, so I decided to scan both indexes on all
partitions with the bt_index_check function from the amcheck extension.  I
identified one partition where both indexes throw the following result:
ERROR: cross page item order invariant violated for index "xxxxx"
DETAIL: Last item on page tid(xx,xx) page lsn=xxxxxxxxxx

Since both indexes appear to have the same issue, it suggests to me
something about this table/data may be triggering a bug.  I have not yet
tried reindexing.  I did pg_dump/pg_restore the entire table (all
partitions) to another system (also 11.5), and the indexes built there test
okay.

Other possibly relevant information:  The table is range partitioned on an
insert timestamp (timestamp(0) without timezone).  The two indexes involved
are both composite indexes, over columns of types (numeric, numeric,
numeric) and (numeric, timestamp(6) without time zone).  The partition key
is not covered by either index.  There is also no primary key.

I realize it may not be possible to remote diagnose the root cause of the
index issues, but figured it was worth reporting.  My next steps would
probably be to try reindexing the affected partition.  If that doesn't work,
it seems like either a vacuum full or pg_dump/pg_restore would be the next
step.  Any other suggestions appreciated.

Thanks,
James


On Fri, Aug 14, 2020 at 2:03 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
> The table has two indexes, so I decided to scan both indexes on all
> partitions with the bt_index_check function from the amcheck extension.  I
> identified one partition where both indexes throw the following result:
> ERROR: cross page item order invariant violated for index "xxxxx"
> DETAIL: Last item on page tid(xx,xx) page lsn=xxxxxxxxxx

This sounds very much like an index with sibling pages that are in the
wrong order relative to each other. That's totally consistent with
what you describe with _bt_moveright() -- circular sibling links can
cause it to just keep going.

It's possible that you'll get a better error with
bt_index_parent_check(), which might be worth trying. But it probably
won't give you any additional information.

Note that there is DEBUG1 and DEBUG2 output from amcheck, which might
give you a few more details. You can "set client_min_messages =
'debug2'" in an interactive session that runs bt_index_check() to see
some additional context. Again, this is unlikely to make all that much
difference.

-- 
Peter Geoghegan



Hi Peter,

I re-ran with DEBUG2 messages enabled. Got a bunch of output, but the
last few lines are like this for each index:

DEBUG: level 965868789 leftmost page of index "xxxxx" was found
deleted or half dead
DETAIL: Deleted page found when building scankey from right sibling.
DEBUG: level 966240004 leftmost page of index "xxxxx" was found
deleted or half dead
DETAIL: Deleted page found when building scankey from right sibling.
ERROR: cross page item order invariant violated for index "xxxxx"
DETAIL: Last item on page tid=(xx,xx) page lsn=xxxxxxxxxx

DEBUG: level 967745369 leftmost page of index "xxxxx" was found
deleted or half dead
DETAIL: Deleted page found when building scankey from right sibling.
DEBUG: level 967746918 leftmost page of index "xxxxx" was found
deleted or half dead
DETAIL: Deleted page found when building scankey from right sibling.
ERROR: cross page item order invariant violated for index "xxxxx"
DETAIL: Last item on page tid=(xx,xx) page lsn=xxxxxxxxxx


Not sure if pageinspect might be able to tell anything else useful?
I'd like to find the root cause of the corruption if possible, so this
doesn't happen in other databases.

Also wanted to see if it might be a good idea to add a
CHECK_FOR_INTERRUPTS call to _bt_moveright() so if this does happen
again, at least the session would be killable.  I don't have enough
background in the code to know where it's safe to add, or I'd submit a
patch.

Thanks,
James

On Fri, Aug 14, 2020 at 4:33 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Fri, Aug 14, 2020 at 2:03 PM PG Bug reporting form
> <noreply@postgresql.org> wrote:
> > The table has two indexes, so I decided to scan both indexes on all
> > partitions with the bt_index_check function from the amcheck extension.  I
> > identified one partition where both indexes throw the following result:
> > ERROR: cross page item order invariant violated for index "xxxxx"
> > DETAIL: Last item on page tid(xx,xx) page lsn=xxxxxxxxxx
>
> This sounds very much like an index with sibling pages that are in the
> wrong order relative to each other. That's totally consistent with
> what you describe with _bt_moveright() -- circular sibling links can
> cause it to just keep going.
>
> It's possible that you'll get a better error with
> bt_index_parent_check(), which might be worth trying. But it probably
> won't give you any additional information.
>
> Note that there is DEBUG1 and DEBUG2 output from amcheck, which might
> give you a few more details. You can "set client_min_messages =
> 'debug2'" in an interactive session that runs bt_index_check() to see
> some additional context. Again, this is unlikely to make all that much
> difference.
>
> --
> Peter Geoghegan



Forgot to say, I don't think I can run bt_index_parent_check() right
now due to the broader locks required.  I will try to get a run in if
I get an opportunity.

Thanks,
James

On Mon, Aug 17, 2020 at 10:51 AM James Lucas <jlucasdba@gmail.com> wrote:
>
> Hi Peter,
>
> I re-ran with DEBUG2 messages enabled. Got a bunch of output, but the
> last few lines are like this for each index:
>
> DEBUG: level 965868789 leftmost page of index "xxxxx" was found
> deleted or half dead
> DETAIL: Deleted page found when building scankey from right sibling.
> DEBUG: level 966240004 leftmost page of index "xxxxx" was found
> deleted or half dead
> DETAIL: Deleted page found when building scankey from right sibling.
> ERROR: cross page item order invariant violated for index "xxxxx"
> DETAIL: Last item on page tid=(xx,xx) page lsn=xxxxxxxxxx
>
> DEBUG: level 967745369 leftmost page of index "xxxxx" was found
> deleted or half dead
> DETAIL: Deleted page found when building scankey from right sibling.
> DEBUG: level 967746918 leftmost page of index "xxxxx" was found
> deleted or half dead
> DETAIL: Deleted page found when building scankey from right sibling.
> ERROR: cross page item order invariant violated for index "xxxxx"
> DETAIL: Last item on page tid=(xx,xx) page lsn=xxxxxxxxxx
>
>
> Not sure if pageinspect might be able to tell anything else useful?
> I'd like to find the root cause of the corruption if possible, so this
> doesn't happen in other databases.
>
> Also wanted to see if it might be a good idea to add a
> CHECK_FOR_INTERRUPTS call to _bt_moveright() so if this does happen
> again, at least the session would be killable.  I don't have enough
> background in the code to know where it's safe to add, or I'd submit a
> patch.
>
> Thanks,
> James
>
> On Fri, Aug 14, 2020 at 4:33 PM Peter Geoghegan <pg@bowt.ie> wrote:
> >
> > On Fri, Aug 14, 2020 at 2:03 PM PG Bug reporting form
> > <noreply@postgresql.org> wrote:
> > > The table has two indexes, so I decided to scan both indexes on all
> > > partitions with the bt_index_check function from the amcheck extension.  I
> > > identified one partition where both indexes throw the following result:
> > > ERROR: cross page item order invariant violated for index "xxxxx"
> > > DETAIL: Last item on page tid(xx,xx) page lsn=xxxxxxxxxx
> >
> > This sounds very much like an index with sibling pages that are in the
> > wrong order relative to each other. That's totally consistent with
> > what you describe with _bt_moveright() -- circular sibling links can
> > cause it to just keep going.
> >
> > It's possible that you'll get a better error with
> > bt_index_parent_check(), which might be worth trying. But it probably
> > won't give you any additional information.
> >
> > Note that there is DEBUG1 and DEBUG2 output from amcheck, which might
> > give you a few more details. You can "set client_min_messages =
> > 'debug2'" in an interactive session that runs bt_index_check() to see
> > some additional context. Again, this is unlikely to make all that much
> > difference.
> >
> > --
> > Peter Geoghegan