Re: [BUG] Error in BRIN summarization - Mailing list pgsql-hackers

From Anastasia Lubennikova
Subject Re: [BUG] Error in BRIN summarization
Date
Msg-id 2588667e-d07d-7e10-74e2-7e1e46194491@postgrespro.ru
Whole thread Raw
In response to Re: [BUG] Error in BRIN summarization  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: [BUG] Error in BRIN summarization
Re: [BUG] Error in BRIN summarization
List pgsql-hackers
On 27.07.2020 20:25, Alvaro Herrera wrote:
On 2020-Jul-27, Anastasia Lubennikova wrote:

Here is the updated version of the fix.
The problem can be reproduced on all supported versions, so I suggest to
backpatch it.
Code slightly changed in v12, so here are two patches: one for versions 9.5
to 11 and another for versions from 12 to master.



(I was also considering whether it needs to be a loop to reobtain root
tuples, in case a concurrent transaction can create a new item while
we're checking that item; but I don't think that can really happen for
one individual tuple.)
I don't think we need a recheck for a single tuple, because we only care about finding its root, which simply must exist somewhere on this page, as concurrent pruning is not allowed. We also may catch root_offsets[] for subsequent tuples, but it's okay if we don't. These tuples will do the same recheck on their turn.


While testing this fix, Alexander Lakhin spotted another problem. I simplified  the test case to this:

1) prepare a table with brin index

create table tbl (i int, b char(1000)) with (fillfactor=10);
insert into tbl select i, md5(i::text) from generate_series(0,1000) as i;
create index idx on tbl using brin(i, b) with (pages_per_range=1);

2) run brin_desummarize_range() in a loop:

echo "-- desummarize all ranges SELECT FROM generate_series(0, pg_relation_size('tbl')/8192 - 1) as i, lateral (SELECT brin_desummarize_range('idx', i)) as d; 
-- summarize them back 
VACUUM tbl" > brin_desum_test.sql

pgbench postgres -f  brin_desum_test.sql -n -T 120


3) run a search that invokes bringetbitmap:

 set enable_seqscan to off;
 explain analyze select * from tbl where i>10 and i<100; \watch 1

After a few runs, it will fail with "ERROR: corrupted BRIN index: inconsistent range map"

The problem is caused by a race in page locking in brinGetTupleForHeapBlock [1]:

(1) bitmapsan locks revmap->rm_currBuf and finds the address of the tuple on a regular page "page", then unlocks revmap->rm_currBuf
(2) in another transaction desummarize locks both revmap->rm_currBuf and "page", cleans up the tuple and unlocks both buffers
(1) bitmapscan locks buffer, containing "page", attempts to access the tuple and fails to find it


At first, I tried to fix it by holding the lock on revmap->rm_currBuf until we locked the regular page, but it causes a deadlock with brinsummarize(), It can be easily reproduced with the same test as above.
Is there any rule about the order of locking revmap and regular pages in brin? I haven't found anything in README.

As an alternative, we can leave locks as is and add a recheck, before throwing an error.

What do you think?

[1] https://github.com/postgres/postgres/blob/master/src/backend/access/brin/brin_revmap.c#L269

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: INSERT INTO SELECT, Why Parallelism is not selected?
Next
From: Stephen Frost
Date:
Subject: Re: Missing CFI in hlCover()?