Re: BUG #17386: btree index corruption after reindex concurrently on write heavy table - Mailing list pgsql-bugs

From Maxim Boguk
Subject Re: BUG #17386: btree index corruption after reindex concurrently on write heavy table
Date
Msg-id CAK-MWwQ7fboHz_s8aTn08i0W2aNH6QoELeWwmH-y80tOfsdDBQ@mail.gmail.com
Whole thread Raw
Responses Re: BUG #17386: btree index corruption after reindex concurrently on write heavy table  (Noah Misch <noah@leadboat.com>)
List pgsql-bugs


On Fri, Jan 28, 2022 at 3:42 PM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      17386
Logged by:          Maxim Boguk
Email address:      maxim.boguk@gmail.com
PostgreSQL version: 14.1
Operating system:   Ubuntu Linux
Description:       

Hi,

I found multiple cases of Btree index corruption after REINDEX someindex
CONCURRENTLY;.

What I found so far:
1)corruption exists in master, on all replicas and test server recovered
from base backup + wal archive (so it isn't some local hardware error)
2)it doesn't happen every time but at least two cases of corrupted indexes
found after database verification with amcheck.
3)it happen on huge and write heavy table

Attempt fix situation with REINDEX INDEX jobs_pkey CONCURRENTLY;
lead to corrupted index again in different tuple:
ERROR:  heap tuple (69306318,15) from table "jobs" lacks matching index tuple within index "jobs_pkey"

SELECT * FROM heap_page_item_attrs(get_raw_page('jobs', 69306318), 'jobs'::regclass) where lp=15;
-[ RECORD 1 ]---
lp          | 15
lp_off      | 24
lp_flags    | 2
lp_len      | 0

select  ctid,id,updated_at,created_at from jobs where ctid='(69306318,24)'::tid;
-[ RECORD 1 ]--------------------------
ctid       | (69306318,24)
id         | 26192320674
updated_at | 2022-01-28 13:08:27.224275
created_at | 2021-11-17 14:35:59.441979

And again updated_at happens during the REINDEX run.

select ctid,id,updated_at,created_at from jobs where id=26192320674;
(0 rows)

SELECT t_ctid, raw_flags, combined_flags FROM heap_page_item_attrs(get_raw_page('jobs', 69306318), 'jobs'::regclass), LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2)  where lp=24;
-[ RECORD 1 ]--+---------------------------------------------------------------------------------------------------------------------------------------------
t_ctid         | (69306318,24)
raw_flags      | {HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_KEYSHR_LOCK,HEAP_XMAX_LOCK_ONLY,HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE}
combined_flags | {}


Seems something broken in combination of HOT update and REINDEX CONCURRENTLY.


--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone RU: +7  985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"

pgsql-bugs by date:

Previous
From: "两个孩子的爹"
Date:
Subject: Re: BUG #17382: When vacuum full or vacuumdb - F is executed, a large number of empty files will be generated in the
Next
From: Dmitry Koval
Date:
Subject: Re: BUG #17385: "RESET transaction_isolation" inside serializable transaction causes Assert at the transaction end