On 30.07.2020 16:40, Anastasia Lubennikova wrote:
> While testing this fix, Alexander Lakhin spotted another problem.
>
> After a few runs, it will fail with "ERROR: corrupted BRIN index:
> inconsistent range map"
>
> The problem is caused by a race in page locking in
> brinGetTupleForHeapBlock [1]:
>
> (1) bitmapsan locks revmap->rm_currBuf and finds the address of the
> tuple on a regular page "page", then unlocks revmap->rm_currBuf
> (2) in another transaction desummarize locks both revmap->rm_currBuf
> and "page", cleans up the tuple and unlocks both buffers
> (1) bitmapscan locks buffer, containing "page", attempts to access the
> tuple and fails to find it
>
>
> At first, I tried to fix it by holding the lock on revmap->rm_currBuf
> until we locked the regular page, but it causes a deadlock with
> brinsummarize(), It can be easily reproduced with the same test as above.
> Is there any rule about the order of locking revmap and regular pages
> in brin? I haven't found anything in README.
>
> As an alternative, we can leave locks as is and add a recheck, before
> throwing an error.
>
Here are the updated patches for both problems.
1) brin_summarize_fix_REL_12_v2 fixes
"failed to find parent tuple for heap-only tuple at (50661,130) in table
"tbl'"
This patch checks that we only access initialized entries of
root_offsets[] array. If necessary, collect the array again. One recheck
is enough here, since concurrent pruning is not possible.
2) brin_pagelock_fix_REL_12_v1.patch fixes
"ERROR: corrupted BRIN index: inconsistent range map"
This patch adds a recheck as suggested in previous message.
I am not sure if one recheck is enough to eliminate the race completely,
but the problem cannot be reproduced anymore.
--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company