Thread: ERROR: posting list tuple with 2 items cannot be split at offset 17

ERROR: posting list tuple with 2 items cannot be split at offset 17

From
Paul McGarry
Date:
I have three databases, two of databases where I am experiencing the issue below.

The first database was created from a dump in Feb 2022 (a few weeks after the time period for which I seem to have problematic indexes, maybe).
The second database was then cloned from the first (ie filesystem level copy) soon after.
Since then all databases have undergone a number of minor version upgrades, including to 13.9 and an OS update last week for the two problem databases (the other is still on 13.8).

Now, a process which does clears some data > 13 months old is getting an error when trying to do that update.

My suspicion is that either:
- there was probably an issue with the index 12 months ago and that problem was copied when I cloned the database, and is just becoming apparent now a script is accessing 13 month olf data.
- something in our recent upgrade has caused the problem.

The third database is still on 13.8, and with some OS updates pending, and is not experiencing the problem.

The problem emerges as:

====
UPDATE widget SET description=NULL WHERE (time>='2022-01-07 17:40:05.140287+00'::timestamp - INTERVAL '4 days' AND time<'2022-01-08 17:40:05.780573+00' AND description IS NOT NULL);
ERROR:  posting list tuple with 2 items cannot be split at offset 17
====

A select on the same data works fine, so presumably a problem updating the index, not accessing it or the corresponding table):

====
db=> select count(*) from widget where (time>='2022-01-07 17:40:05.140287+00'::timestamp - INTERVAL '4 days' AND time<'2022-01-08 17:40:05.780573+00' AND description IS NOT NULL);
 count
--------
 797943
====


The index used as per explain:
====
 explain  UPDATE widget SET description=NULL WHERE (time>='2022-01-07 17:40:05.140287+00'::timestamp - INTERVAL '4 days' AND time<'2022-01-08 17:40:05.780573+00' AND description IS NOT NULL);
                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on widget  (cost=0.57..2921626.80 rows=205910 width=1066)
   ->  Index Scan using widget_time_client_idx on widget  (cost=0.57..2921626.80 rows=205910 width=1066)
         Index Cond: (("time" >= '2022-01-03 17:40:05.140287'::timestamp without time zone) AND ("time" < '2022-01-08 17:40:05.780573+00'::timestamp with time zone))
         Filter: (description IS NOT NULL)
(4 rows)
====

amcheck attempted on that index, but doesn't seem to identify any issues:

====
db=> SELECT bt_index_check('widget_time_client_idx',true);
DEBUG:  verifying consistency of tree structure for index "widget_time_client_idx"
DEBUG:  verifying level 3 (true root level)
DEBUG:  verifying level 2
DEBUG:  verifying level 1
DEBUG:  verifying level 0 (leaf level)
DEBUG:  verifying that tuples from index "widget_time_client_idx" are present in "widget"
DEBUG:  finished verifying presence of 639872196 tuples from table "widget" with bitset 25.94% set
 bt_index_check
----------------

(1 row)

db=> SELECT bt_index_parent_check('widget_time_client_idx',true,true);
DEBUG:  verifying consistency of tree structure for index "widget_time_client_idx" with cross-level checks
DEBUG:  verifying level 3 (true root level)
DEBUG:  verifying level 2
DEBUG:  verifying level 1
DEBUG:  verifying level 0 (leaf level)

DEBUG:  verifying that tuples from index "widget_time_client_idx" are present in "widget"
DEBUG:  finished verifying presence of 639874864 tuples from table "widget" with bitset 25.94% set
 bt_index_parent_check
-----------------------

(1 row)

====

We recreated that index and deleted the old index, and the update then worked.
I've done that on one of the databases so far.

Despite that working, it then occurred to me that the problem might be due to problems updating a different index on the same table.

I then found on the two problem DBs (but not the 13.8 one):

====
db=> select bt_index_check('widget_name_idx');
ERROR:  item order invariant violated for index "widget_name_idx"
DETAIL:  Lower index tid=(682201,85) (points to index tid=(682278,4097)) higher index tid=(682201,86) (points to index tid=(716079,1)) page lsn=580/E554A858.
====

====
db=> select bt_index_check('widget_name_idx');
ERROR:  item order invariant violated for index "widget_name_idx"
DETAIL:  Lower index tid=(682201,49) (points to index tid=(682245,1)) higher index tid=(682201,50) (points to index tid=(734398,1)) page lsn=566/E67C5FF0.
====

which as a text field, seems more likely to be the result of a collation change problem that might accompany an OS update.

But if it is the problem, why did the update start working after I recreated the other index?

I think I should now:
- recreate the widget_name_idx on the problem servers
- run bt_index_check across all other indexes

Any suggestions on what else I should look into, in particular anything I should check before upgrading the remaining 13.8 DB to 13.9?

Thanks for any help,

Paul

Re: ERROR: posting list tuple with 2 items cannot be split at offset 17

From
Adrian Klaver
Date:
On 2/8/23 23:53, Paul McGarry wrote:
> I have three databases, two of databases where I am experiencing the 
> issue below.
> 
> The first database was created from a dump in Feb 2022 (a few weeks 
> after the time period for which I seem to have problematic indexes, maybe).
> The second database was then cloned from the first (ie filesystem level 
> copy) soon after.

What where the commands for the dump/restore?

Second database, same machine or different one?

> Since then all databases have undergone a number of minor version 
> upgrades, including to 13.9 and an OS update last week for the two 
> problem databases (the other is still on 13.8).

The OS's and the update to?

> 
> Now, a process which does clears some data > 13 months old is getting an 
> error when trying to do that update.
> 
> My suspicion is that either:
> - there was probably an issue with the index 12 months ago and that 
> problem was copied when I cloned the database, and is just becoming 
> apparent now a script is accessing 13 month olf data.
> - something in our recent upgrade has caused the problem.

A dump/restore would recreate the index at the restore.

> 
> The third database is still on 13.8, and with some OS updates pending, 
> and is not experiencing the problem.

I'm betting OS change as culprit.


> 
> Any suggestions on what else I should look into, in particular anything 
> I should check before upgrading the remaining 13.8 DB to 13.9?
> 
> Thanks for any help,
> 
> Paul

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: ERROR: posting list tuple with 2 items cannot be split at offset 17

From
Peter Geoghegan
Date:
On Wed, Feb 8, 2023 at 11:54 PM Paul McGarry <paul@paulmcgarry.com> wrote:
> But if it is the problem, why did the update start working after I recreated the other index?

There is no reason why reindexing another index ought to have had that
effect. The likely explanation is that subsequent updates used a
successor version heap TID that didn't overlap with some existing
posting list in whatever way. If you repeat the update again and
again, and get an error each time, the incoming TID will differ each
time. Eventually you won't get an error, because at some point there
won't be a posting-list-TID range overlap for some new successor TID
that leads to the insert/posting list split code detecting a problem.

It's also possible that a concurrent autovacuum "fixed" the issue.

The amcheck error shows a problem in an internal page, which cannot
have posting list tuples -- which suggests broad corruption. An issue
with collation instability due to an OS update does seem likely.

Note that the hardening/defensive checks in this area have increased.
I added an additional defensive check to 13.4, and followed up with
another similar check in 13.5. It looks like the error you've seen
("ERROR:  posting list tuple with 2 items cannot be split at offset
17") comes from the initial 13.4 hardening, since I'd expect the
additional 13.5 hardening to catch the same issue sooner, with a
different error message (something like "table tid from new index
tuple (%u,%u) cannot find insert offset between offsets %u and %u of
block %u in index \"%s\"").

> I think I should now:
> - recreate the widget_name_idx on the problem servers
> - run bt_index_check across all other indexes
>
> Any suggestions on what else I should look into, in particular anything I should check before upgrading the remaining
13.8DB to 13.9?
 

I recommend running amcheck on all indexes, or at least all
possibly-affected text indexes.

-- 
Peter Geoghegan



Re: ERROR: posting list tuple with 2 items cannot be split at offset 17

From
Paul McGarry
Date:
Hi Peter,

Thanks for your input, I'm feeling more comfortable that I correctly understand the problem now and it's "just" a collation related issue.


I recommend running amcheck on all indexes, or at least all
possibly-affected text indexes.


Will the amcheck reliably identify all issues that may arise from a collation change?

I don't enough about the details of B-tree index to know whether the problems are only "obvious" if they happen to interact with some boundary, or whether it will always be evident with the basic amcheck, ie

bt_index_check(index regclass, false)

and therefore I can just use that to identify bad indexes and recreate them, or should I recreate all btree indexes involving text fields?

We planned to do a dump/restore upgrade to PG14 in a month or so (already done in dev).
I am wondering whether it will be less work overall to bring that forward than rebuild these indexes...

Thanks again for your advice.

Paul
 

Re: ERROR: posting list tuple with 2 items cannot be split at offset 17

From
Peter Geoghegan
Date:
On Thu, Feb 9, 2023 at 3:55 PM Paul McGarry <paul@paulmcgarry.com> wrote:
> Will the amcheck reliably identify all issues that may arise from a collation change?

Theoretically it might not. In practice I'd be very surprised if it
ever failed to detect such an inconsistency. If you want to be extra
careful, and can afford to block concurrent writes, then I suggest
using bt_index_parent_check instead of bt_index_check.

> and therefore I can just use that to identify bad indexes and recreate them, or should I recreate all btree indexes
involvingtext fields?
 

It might be easier to just reindex them all. Hard to say.

> We planned to do a dump/restore upgrade to PG14 in a month or so (already done in dev).
> I am wondering whether it will be less work overall to bring that forward than rebuild these indexes...

pg_amcheck is available on 14. It offers a much simpler interface for
running amcheck routine, so maybe look into that once you upgrade.

-- 
Peter Geoghegan