Re: [BUG] Error in BRIN summarization - Mailing list pgsql-hackers
From | Anastasia Lubennikova |
---|---|
Subject | Re: [BUG] Error in BRIN summarization |
Date | |
Msg-id | 2588667e-d07d-7e10-74e2-7e1e46194491@postgrespro.ru Whole thread Raw |
In response to | Re: [BUG] Error in BRIN summarization (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: [BUG] Error in BRIN summarization
Re: [BUG] Error in BRIN summarization |
List | pgsql-hackers |
I don't think we need a recheck for a single tuple, because we only care about finding its root, which simply must exist somewhere on this page, as concurrent pruning is not allowed. We also may catch root_offsets[] for subsequent tuples, but it's okay if we don't. These tuples will do the same recheck on their turn.On 2020-Jul-27, Anastasia Lubennikova wrote:Here is the updated version of the fix. The problem can be reproduced on all supported versions, so I suggest to backpatch it. Code slightly changed in v12, so here are two patches: one for versions 9.5 to 11 and another for versions from 12 to master. (I was also considering whether it needs to be a loop to reobtain root tuples, in case a concurrent transaction can create a new item while we're checking that item; but I don't think that can really happen for one individual tuple.)
While testing this fix, Alexander Lakhin spotted another problem. I simplified the test case to this:
1) prepare a table with brin index
create table tbl (i int, b char(1000)) with (fillfactor=10); insert into tbl select i, md5(i::text) from generate_series(0,1000) as i; create index idx on tbl using brin(i, b) with (pages_per_range=1);
2) run brin_desummarize_range() in a loop:
echo "-- desummarize all ranges SELECT FROM generate_series(0, pg_relation_size('tbl')/8192 - 1) as i, lateral (SELECT brin_desummarize_range('idx', i)) as d; -- summarize them back VACUUM tbl" > brin_desum_test.sql pgbench postgres -f brin_desum_test.sql -n -T 120
3) run a search that invokes bringetbitmap:
set enable_seqscan to off; explain analyze select * from tbl where i>10 and i<100; \watch 1
After a few runs, it will fail with "ERROR: corrupted BRIN index: inconsistent range map"
The problem is caused by a race in page locking in brinGetTupleForHeapBlock [1]:
(1) bitmapsan locks revmap->rm_currBuf and finds the address of the tuple on a regular page "page", then unlocks revmap->rm_currBuf
(2) in another transaction desummarize locks both revmap->rm_currBuf and "page", cleans up the tuple and unlocks both buffers
(1) bitmapscan locks buffer, containing "page", attempts to access the tuple and fails to find it
At first, I tried to fix it by holding the lock on revmap->rm_currBuf until we locked the regular page, but it causes a deadlock with brinsummarize(), It can be easily reproduced with the same test as above.
Is there any rule about the order of locking revmap and regular pages in brin? I haven't found anything in README.
As an alternative, we can leave locks as is and add a recheck, before throwing an error.
What do you think?
[1] https://github.com/postgres/postgres/blob/master/src/backend/access/brin/brin_revmap.c#L269
-- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: