Thread: BUG #17568: unexpected zero page at block 0 during REINDEX CONCURRENTLY
BUG #17568: unexpected zero page at block 0 during REINDEX CONCURRENTLY
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 17568 Logged by: Sergei Kornilov Email address: sk@zsrv.org PostgreSQL version: 14.4 Operating system: Ubuntu 20.04 Description: Hello I recently ran "REINDEX INDEX CONCURRENTLY i_sess_uuid;" (pg14.4, table around 700gb), but suddenly, after the start of phase "index validation: scanning index", the insert and update operations started returning an error: ERROR: index "i_sess_uuid_ccnew" contains unexpected zero page at block 0 HINT: Please REINDEX it. i_sess_uuid_ccnew is exactly the new index that builds reindex concurrently at this time. It is clear that the errors started after index_set_state_flags INDEX_CREATE_SET_READY, because insert and update queries now need to update this index too. But it remains unclear how exactly page 0 turned out to be all zeros at this point. I think some process may have loaded btree metapage (page 0) into shared buffers prior the end of _bt_load. In this case, the error is reproduced (14.4, 14 STABLE, HEAD): create extension pageinspect; create table test as select generate_series(1,1e4) as id; create index test_id_idx on test(id); # prepare gdb for this backend with breakpoint on _bt_uppershutdown reindex index concurrently test_id_idx ; While gdb is stopped on breakpoint run from second session: insert into test values (0); SELECT * FROM bt_metap('test_id_idx_ccnew'); -[ RECORD 1 ]-------------+--- magic | 0 version | 0 root | 0 level | 0 fastroot | 0 fastlevel | 0 last_cleanup_num_delpages | 0 last_cleanup_num_tuples | -1 allequalimage | f Then continue reindex backend. New inserts along with reindex itself will give error "index "test_id_idx_ccnew" contains unexpected zero page at block 0". The metapage on disk after _bt_uppershutdown call will be written correctly and correctly replicated to standby. But it is still erroneous in shared buffers on primary. I still don't know if this is what happened to my base. Monitoring requests (like pg_total_relation_size, pg_stat_user_indexes, pg_statio_user_indexes) do not load metapage into shared buffers. Normal select/insert/update/delete should not touch in any way not ready index. This database does not have any extensions installed other than those available in contrib. Thoughts? regards, Sergei
Re:BUG #17568: unexpected zero page at block 0 during REINDEX CONCURRENTLY
From
Sergei Kornilov
Date:
Hello Luckily, I found a call of bt_metap function in one query that was definitely called while reindex was running and beforethese errors started to appear. Accidentally pilot error. It would be nice to protect shared buffers from such premature page loading, but it's probably not going to work withoutperformance penalty. regards, Sergei
Re: BUG #17568: unexpected zero page at block 0 during REINDEX CONCURRENTLY
From
Andres Freund
Date:
Hi, On 2022-08-03 14:23:34 +0000, PG Bug reporting form wrote: > I think some process may have loaded btree metapage (page 0) into shared > buffers prior the end of _bt_load. In this case, the error is reproduced > (14.4, 14 STABLE, HEAD): > > create extension pageinspect; > create table test as select generate_series(1,1e4) as id; > create index test_id_idx on test(id); > # prepare gdb for this backend with breakpoint on _bt_uppershutdown > reindex index concurrently test_id_idx ; Worth noting that this doesn't even require reindex concurrently, it's also an issue for CIC. The problem basically is that once the first non-meta page from the btree is written (e.g. _bt_blwritepage() calling smgrextend()), concurrent sessions can read in the metapage (and potentially other pages that are also zero filled) into shared_buffers. at the end of _bt_uppershutdown() we'll write the metapage to disk, bypassing shared buffers. And boom, the all-zeroes version read into memory earlier is suddenly out of date. The easiest fix is likely to force all buffers to be forgotten at the end of index_concurrently_build() or such. I don't immediately see a nicer way to fix this, we can't just lock the new index relation exclusively. We could of course also stop bypassing s_b for CIC/RIC, but that seems mighty invasive for a bugfix. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > The easiest fix is likely to force all buffers to be forgotten at the end of > index_concurrently_build() or such. Race conditions there ... > I don't immediately see a nicer way to fix > this, we can't just lock the new index relation exclusively. Why not? If the index isn't valid yet, other backends have zero business touching it. I'd think about taking an exclusive lock to start with, and releasing it (downgrading to a non-exclusive lock) once the index is valid enough that other backends can access it, which would be just before we set pg_index.indisready to true. Basically, this is to enforce the previously-implicit contract that other sessions won't touch the index too soon against careless superusers. regards, tom lane
Re: BUG #17568: unexpected zero page at block 0 during REINDEX CONCURRENTLY
From
Andres Freund
Date:
Hi, On 2022-08-15 19:56:40 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > The easiest fix is likely to force all buffers to be forgotten at the end of > > index_concurrently_build() or such. > > Race conditions there ... Not immediately seeing it? New reads from disk will read valid data. But I agree, it's a shitty approach. > > I don't immediately see a nicer way to fix > > this, we can't just lock the new index relation exclusively. > > Why not? If the index isn't valid yet, other backends have zero > business touching it. I'd think about taking an exclusive lock > to start with, and releasing it (downgrading to a non-exclusive > lock) once the index is valid enough that other backends can > access it, which would be just before we set pg_index.indisready > to true. I'm afraid we'd start blocking in quite a few places, both inside and outside of core PG. E.g. ExecOpenIndices(), ExecInitPartitionInfo(), calculate_toast_table_size(), ... will open all indislive indexes, even if not indisready. > Basically, this is to enforce the previously-implicit contract > that other sessions won't touch the index too soon against > careless superusers. I suspect this isn't restricted to superusers, fwiw. E.g. pg_prewarm doesn't() require superuser. Greetings, Andres Freund