Thread: Pg14 possible index corruption after reindex concurrently

Pg14 possible index corruption after reindex concurrently

From
Aleš Zelený
Date:
Hello,

we have a problem with an index on a database we recently upgraded from PG13 to Pg14.3 using pg_upgrade. After all the upgrade steps including analyze in stages, we run  "vacuumdb -Fvaz -j 8" and the user workload was started afterward.
In order to get one of the Pg14 benefits (b-tree deduplication), we decided to rebuild all indexes:

-- CONCURRENTLY can not be used for REINDEX DATABASE and system catalog was brand new on the upgraded database, so no need to reindex system.
SELECT format('REINDEX SCHEMA CONCURRENTLY %I;', n.nspname)
FROM pg_catalog.pg_namespace n
WHERE n.nspname !~ '^pg_' AND n.nspname <> 'information_schema'
ORDER BY 1;
\gexec


PG Version: PostgreSQL 14.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
OS: CentOS 7
Indexed columns are BIGINT (so no collation issues).

A few days later we realized, that some queries did not return the expected data. By disabling indexscan and bitmapindex scan, expected row was found using sequential scan.

Now we have a table (80GB, 314.108.951 rows) with a reproducible testcase to demonstrate that using the primary key, requested data are returned while using another index no rows are returned.

Testcase:

START TRANSACTION;
EXPLAIN ANALYZE SELECT * FROM opportunities.tab_odds WHERE id_odds = 1652734429;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Index Scan using pk_tabodds_idodds on tab_odds  (cost=0.57..2.59 rows=1 width=229) (actual time=0.076..0.078 rows=1 loops=1)
   Index Cond: (id_odds = 1652734429)
 Planning Time: 0.152 ms
 Execution Time: 0.119 ms
(4 rows)

ALTER TABLE opportunities.tab_odds DROP CONSTRAINT pk_tabodds_idodds CASCADE;

EXPLAIN ANALYZE SELECT * FROM opportunities.tab_odds WHERE id_odds = 1652734429;
                                                                 QUERY PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using ix_tabodds_idodds_idopportunity on tab_odds  (cost=0.57..2.59 rows=1 width=229) (actual time=0.026..0.027 rows=0 loops=1)
   Index Cond: (id_odds = 1652734429)
 Planning Time: 0.248 ms
 Execution Time: 0.072 ms
(4 rows)

ROLLBACK;

The indexes are:
Indexes:
    "pk_tabodds_idodds" PRIMARY KEY, btree (id_odds)
    "ix_tabodds_idodds_idopportunity" btree (id_odds, id_opportunity)

While we can do another reindex and check query results, we would like to be able to identify such issues systematically rather than by en user complaints.

I've tried to use amcheck extension to validate the index:

select * from bt_index_check('opportunities.ix_tabodds_idodds_idopportunity'::regclass);
 bt_index_check
----------------
 
(1 row)

Running the pg_amcheck utility on the table (including its indexes):

/usr/pgsql-14/bin/pg_amcheck -d prematch -r opportunities.tab_odds -j 8 -P ; echo $?
heap table "prematch.opportunities.tab_odds", block 1579813, offset 62:
    xmax 4051057343 precedes relation freeze threshold 2:3960858664
heap table "prematch.opportunities.tab_odds", block 1580759, offset 43:
    xmin 4051053353 precedes relation freeze threshold 2:3960858664
....
heap table "prematch.opportunities.tab_odds", block 2164163, offset 5:
    xmin 4051075211 precedes relation freeze threshold 2:3960858664
heap table "prematch.opportunities.tab_odds", block 2164163, offset 6:
    xmin 4051075236 precedes relation freeze threshold 2:3960858664
heap table "prematch.opportunities.tab_odds", block 2164163, offset 7:
    xmin 4051075236 precedes relation freeze threshold 2:3960858664
heap table "prematch.opportunities.tab_odds", block 2164164, offset 1:
    xmin 4051075260 precedes relation freeze threshold 2:3960858664
2

The default access method check returns exit code2, while when using --rootdescend option  return code is zero and no xmin/xmax messages:

/usr/pgsql-14/bin/pg_amcheck -d prematch -r opportunities.tab_odds --rootdescend -j 8 -P ; echo $?                                                                            
 0/15 relations (0%),        0/16831435 pages (0%)
 9/15 relations (60%), 16831427/16831435 pages (99%)
10/15 relations (66%), 16831429/16831435 pages (99%)
11/15 relations (73%), 16831431/16831435 pages (99%)
12/15 relations (80%), 16831433/16831435 pages (99%)
13/15 relations (86%), 16831434/16831435 pages (99%)
14/15 relations (93%), 16831435/16831435 pages (100%)
15/15 relations (100%), 16831435/16831435 pages (100%)
0

I've thought about using pageinspect to dump the invalid index page, but haven't found a way how to identify the index block number.

Is there a way, how to diagnose such corruption to make sure that after (and ideally also before) upgrading the database no such corruption happened (I'd like to test the diagnostic approach before rebuilding the index)?

Unfortunately, the cluster was so old, and planned outage was limited - page checksums were not enabled yet.

Thanks for any recommendations for diagnostic and or mitigation.

Ales

Re: Pg14 possible index corruption after reindex concurrently

From
Thomas Munro
Date:
On Wed, May 25, 2022 at 6:17 AM Aleš Zelený <zeleny.ales@gmail.com> wrote:
> SELECT format('REINDEX SCHEMA CONCURRENTLY %I;', n.nspname)

This may be related to bug #17485, discussed at:

https://www.postgresql.org/message-id/flat/17485-396609c6925b982d%40postgresql.org



Re: Pg14 possible index corruption after reindex concurrently

From
Aleš Zelený
Date:
Thanks for the bug link, I haven't found it.

Ales

út 24. 5. 2022 v 21:58 odesílatel Thomas Munro <thomas.munro@gmail.com> napsal:
On Wed, May 25, 2022 at 6:17 AM Aleš Zelený <zeleny.ales@gmail.com> wrote:
> SELECT format('REINDEX SCHEMA CONCURRENTLY %I;', n.nspname)

This may be related to bug #17485, discussed at:

https://www.postgresql.org/message-id/flat/17485-396609c6925b982d%40postgresql.org

Re: Pg14 possible index corruption after reindex concurrently

From
Aleš Zelený
Date:
Hello,

you are right it looks to be related, on our running system with connected users such an issue happens not only on primary keys but also on other (non-unique) indexes.

I've checked all indexes using amcheck:

select * from bt_index_check(index => 'prematch.opportunities.pk_tabodds_idodds'::regclass::oid, heapallindexed => true);

Failed one rebuild and check again to ensure all things are OK.

Still have a problem and haven't found any explanation in the manuals:

Run amcheck to validate a table and its indexes:

/usr/pgsql-14/bin/pg_amcheck -d prematch -r opportunities.tab_odds --heapallindexed -j 16 -Pv ; echo $?
...
btree index "prematch.opportunities.pk_tabodds_idodds":
    ERROR:  heap tuple (2199116,5) from table "tab_odds" lacks matching index tuple within index "pk_tabodds_idodds"
    HINT:  Retrying verification using the function bt_index_parent_check() might provide a more specific error.
2


OK, rebuild the index (first give it a try to use the concurrently option):
prematch=# REINDEX INDEX CONCURRENTLY prematch.opportunities.pk_tabodds_idodds;

REINDEX
prematch=#
prematch=# select * from bt_index_check(index => 'prematch.opportunities.pk_tabodds_idodds'::regclass::oid, heapallindexed => true);                                                                                                                                                                                          
 bt_index_check
----------------
 
(1 row)

Looks we were lucky.

Just to be sure, one more check from the command line:

/usr/pgsql-14/bin/pg_amcheck -d prematch -r opportunities.tab_odds --heapallindexed -j 16 -P ; echo $?
 0/15 relations (0%),        0/16807950 pages (0%)
heap table "prematch.opportunities.tab_odds", block 1649057, offset 47:
    xmin 4062380236 precedes relation freeze threshold 2:3960858664
heap table "prematch.opportunities.tab_odds", block 1649057, offset 48:
    xmax 4062380236 precedes relation freeze threshold 2:3960858664
...
    xmax 4062380813 precedes relation freeze threshold 2:3960858664
heap table "prematch.opportunities.tab_odds", block 2210728, offset 8:
    xmax 4062380814 precedes relation freeze threshold 2:3960858664
2

So I got non-zero exit code, but no errors in the log file:

-bash-4.2$ grep -i ERROR 20220525_1710_prematch.opportunities.tab_odds.log
-bash-4.2$

We have multiple DB servers upgraded to 14, only a few amchecks:

( time /usr/pgsql-14/bin/pg_amcheck -a --heapallindexed -j 16 -P ; echo $? ) |& tee -a `date +%Y%m%d_%H%M`_`hostname`_amcheck.log

finished with zero exit code, most of them returned 2, within the xmin/xmax messages as shown above. Is that an issue (since missing entries in the index cause ERRRO reported in the log, but after rebuilding the indexes it was not the case anymore)?
If it is an issue, is there a way to fix it?

Thanks Ales

út 24. 5. 2022 v 21:58 odesílatel Thomas Munro <thomas.munro@gmail.com> napsal:
On Wed, May 25, 2022 at 6:17 AM Aleš Zelený <zeleny.ales@gmail.com> wrote:
> SELECT format('REINDEX SCHEMA CONCURRENTLY %I;', n.nspname)

This may be related to bug #17485, discussed at:

https://www.postgresql.org/message-id/flat/17485-396609c6925b982d%40postgresql.org