Thread: [BUG] Error in BRIN summarization

[BUG] Error in BRIN summarization

From
Anastasia Lubennikova
Date:
One of our clients caught an error "failed to find parent tuple for 
heap-only tuple at (50661,130) in table "tbl'" in PostgreSQL v12.

Steps to reproduce (REL_12_STABLE):

1) Create table with primary key, create brin index, fill table with 
some initial data:

create table tbl (id int primary key, a int) with (fillfactor=50);
create index idx on tbl using brin (a) with (autosummarize=on);
insert into tbl select i, i from generate_series(0,100000) as i;

2) Run script test_brin.sql using pgbench:

  pgbench postgres -f ../review/brin_test.sql  -n -T 120

The script is a bit messy because I was trying to reproduce a 
problematic workload. Though I didn't manage to simplify it.
The idea is that it inserts new values into the table to produce 
unindexed pages and also updates some values to trigger HOT-updates on 
these pages.

3) Open psql session and run brin_summarize_new_values

select brin_summarize_new_values('idx'::regclass::oid); \watch 2

Wait a bit. And in psql you will see the ERROR.

This error is caused by the problem with root_offsets array bounds. It 
occurs if a new HOT tuple was inserted after we've collected 
root_offsets, and thus we don't have root_offset for tuple's offnum. 
Concurrent insertions are possible, because brin_summarize_new_values() 
only holds ShareUpdateLock on table and no lock (only pin) on the page.

The draft fix is in the attachments. It saves root_offsets_size and 
checks that we only access valid fields.
Patch also adds some debug messages, just to ensure that problem was caught.

TODO:

- check if  heapam_index_validate_scan() has the same problem
- code cleanup
- test other PostgreSQL versions

[1] 
https://www.postgresql.org/message-id/flat/CA%2BTgmoYgwjmmjK24Qxb_vWAu8_Hh7gfVFcr3%2BR7ocdLvYOWJXg%40mail.gmail.com

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: [BUG] Error in BRIN summarization

From
Anastasia Lubennikova
Date:
On 23.07.2020 20:39, Anastasia Lubennikova wrote:
> One of our clients caught an error "failed to find parent tuple for 
> heap-only tuple at (50661,130) in table "tbl'" in PostgreSQL v12.
>
> Steps to reproduce (REL_12_STABLE):
>
> 1) Create table with primary key, create brin index, fill table with 
> some initial data:
>
> create table tbl (id int primary key, a int) with (fillfactor=50);
> create index idx on tbl using brin (a) with (autosummarize=on);
> insert into tbl select i, i from generate_series(0,100000) as i;
>
> 2) Run script test_brin.sql using pgbench:
>
>  pgbench postgres -f ../review/brin_test.sql  -n -T 120
>
> The script is a bit messy because I was trying to reproduce a 
> problematic workload. Though I didn't manage to simplify it.
> The idea is that it inserts new values into the table to produce 
> unindexed pages and also updates some values to trigger HOT-updates on 
> these pages.
>
> 3) Open psql session and run brin_summarize_new_values
>
> select brin_summarize_new_values('idx'::regclass::oid); \watch 2
>
> Wait a bit. And in psql you will see the ERROR.
>
> This error is caused by the problem with root_offsets array bounds. It 
> occurs if a new HOT tuple was inserted after we've collected 
> root_offsets, and thus we don't have root_offset for tuple's offnum. 
> Concurrent insertions are possible, because 
> brin_summarize_new_values() only holds ShareUpdateLock on table and no 
> lock (only pin) on the page.
>
> The draft fix is in the attachments. It saves root_offsets_size and 
> checks that we only access valid fields.
> Patch also adds some debug messages, just to ensure that problem was 
> caught.
>
> TODO:
>
> - check if  heapam_index_validate_scan() has the same problem
> - code cleanup
> - test other PostgreSQL versions
>
> [1] 
> https://www.postgresql.org/message-id/flat/CA%2BTgmoYgwjmmjK24Qxb_vWAu8_Hh7gfVFcr3%2BR7ocdLvYOWJXg%40mail.gmail.com
>

Here is the updated version of the fix.
The problem can be reproduced on all supported versions, so I suggest to 
backpatch it.
Code slightly changed in v12, so here are two patches: one for versions 
9.5 to 11 and another for versions from 12 to master.

As for heapam_index_validate_scan(), I've tried to reproduce the same 
error with CREATE INDEX CONCURRENTLY, but haven't found any problem with it.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: [BUG] Error in BRIN summarization

From
Alvaro Herrera
Date:
On 2020-Jul-27, Anastasia Lubennikova wrote:

> Here is the updated version of the fix.
> The problem can be reproduced on all supported versions, so I suggest to
> backpatch it.
> Code slightly changed in v12, so here are two patches: one for versions 9.5
> to 11 and another for versions from 12 to master.

Hi Anastasia, thanks for this report and fix.  I was considering this
last week and noticed that the patch changes the ABI of
heap_get_root_tuples, which may be problematic in back branches.  I
suggest that for unreleased branches (12 and prior) we need to create a
new function with the new signature, and keep heap_get_root_tuples
unchanged.  In 13 and master we don't need that trick, so we can keep
the code as you have it in this version of the patch.

OffsetNumber
heap_get_root_tuples_new(Page page, OffsetNumber *root_offsets)
{ .. full implementation ... }

/* ABI compatibility only */
void
heap_get_root_tuples(Page page, OffsetNumber *root_offsets)
{
    (void) heap_get_root_tuples_new(page, root_offsets);
}


(I was also considering whether it needs to be a loop to reobtain root
tuples, in case a concurrent transaction can create a new item while
we're checking that item; but I don't think that can really happen for
one individual tuple.)

Thanks

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [BUG] Error in BRIN summarization

From
Peter Geoghegan
Date:
On Mon, Jul 27, 2020 at 10:25 AM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> (I was also considering whether it needs to be a loop to reobtain root
> tuples, in case a concurrent transaction can create a new item while
> we're checking that item; but I don't think that can really happen for
> one individual tuple.)

I wonder if something like that is the underlying problem in a recent
problem case involving a "REINDEX index
pg_class_tblspc_relfilenode_index" command that runs concurrently with
the regression tests:

https://postgr.es/m/CAH2-WzmBxu4o=pMsniur+bwHqCGCmV_AOLkuK6BuU7ngA6evqw@mail.gmail.com

We see a violation of the HOT invariant in this case, though only for
a system catalog index, and only in fairly particular circumstances
involving concurrency.

-- 
Peter Geoghegan



Re: [BUG] Error in BRIN summarization

From
Anastasia Lubennikova
Date:
On 27.07.2020 20:25, Alvaro Herrera wrote:
On 2020-Jul-27, Anastasia Lubennikova wrote:

Here is the updated version of the fix.
The problem can be reproduced on all supported versions, so I suggest to
backpatch it.
Code slightly changed in v12, so here are two patches: one for versions 9.5
to 11 and another for versions from 12 to master.



(I was also considering whether it needs to be a loop to reobtain root
tuples, in case a concurrent transaction can create a new item while
we're checking that item; but I don't think that can really happen for
one individual tuple.)
I don't think we need a recheck for a single tuple, because we only care about finding its root, which simply must exist somewhere on this page, as concurrent pruning is not allowed. We also may catch root_offsets[] for subsequent tuples, but it's okay if we don't. These tuples will do the same recheck on their turn.


While testing this fix, Alexander Lakhin spotted another problem. I simplified  the test case to this:

1) prepare a table with brin index

create table tbl (i int, b char(1000)) with (fillfactor=10);
insert into tbl select i, md5(i::text) from generate_series(0,1000) as i;
create index idx on tbl using brin(i, b) with (pages_per_range=1);

2) run brin_desummarize_range() in a loop:

echo "-- desummarize all ranges SELECT FROM generate_series(0, pg_relation_size('tbl')/8192 - 1) as i, lateral (SELECT brin_desummarize_range('idx', i)) as d; 
-- summarize them back 
VACUUM tbl" > brin_desum_test.sql

pgbench postgres -f  brin_desum_test.sql -n -T 120


3) run a search that invokes bringetbitmap:

 set enable_seqscan to off;
 explain analyze select * from tbl where i>10 and i<100; \watch 1

After a few runs, it will fail with "ERROR: corrupted BRIN index: inconsistent range map"

The problem is caused by a race in page locking in brinGetTupleForHeapBlock [1]:

(1) bitmapsan locks revmap->rm_currBuf and finds the address of the tuple on a regular page "page", then unlocks revmap->rm_currBuf
(2) in another transaction desummarize locks both revmap->rm_currBuf and "page", cleans up the tuple and unlocks both buffers
(1) bitmapscan locks buffer, containing "page", attempts to access the tuple and fails to find it


At first, I tried to fix it by holding the lock on revmap->rm_currBuf until we locked the regular page, but it causes a deadlock with brinsummarize(), It can be easily reproduced with the same test as above.
Is there any rule about the order of locking revmap and regular pages in brin? I haven't found anything in README.

As an alternative, we can leave locks as is and add a recheck, before throwing an error.

What do you think?

[1] https://github.com/postgres/postgres/blob/master/src/backend/access/brin/brin_revmap.c#L269

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [BUG] Error in BRIN summarization

From
Anastasia Lubennikova
Date:
On 30.07.2020 16:40, Anastasia Lubennikova wrote:
> While testing this fix, Alexander Lakhin spotted another problem.
>
> After a few runs, it will fail with "ERROR: corrupted BRIN index: 
> inconsistent range map"
>
> The problem is caused by a race in page locking in 
> brinGetTupleForHeapBlock [1]:
>
> (1) bitmapsan locks revmap->rm_currBuf and finds the address of the 
> tuple on a regular page "page", then unlocks revmap->rm_currBuf
> (2) in another transaction desummarize locks both revmap->rm_currBuf 
> and "page", cleans up the tuple and unlocks both buffers
> (1) bitmapscan locks buffer, containing "page", attempts to access the 
> tuple and fails to find it
>
>
> At first, I tried to fix it by holding the lock on revmap->rm_currBuf 
> until we locked the regular page, but it causes a deadlock with 
> brinsummarize(), It can be easily reproduced with the same test as above.
> Is there any rule about the order of locking revmap and regular pages 
> in brin? I haven't found anything in README.
>
> As an alternative, we can leave locks as is and add a recheck, before 
> throwing an error.
>
Here are the updated patches for both problems.

1) brin_summarize_fix_REL_12_v2 fixes
"failed to find parent tuple for heap-only tuple at (50661,130) in table 
"tbl'"

This patch checks that we only access initialized entries of 
root_offsets[] array. If necessary, collect the array again. One recheck 
is enough here, since concurrent pruning is not possible.

2) brin_pagelock_fix_REL_12_v1.patch fixes
"ERROR: corrupted BRIN index: inconsistent range map"

This patch adds a recheck as suggested in previous message.
I am not sure if one recheck is enough to eliminate the race completely, 
but the problem cannot be reproduced anymore.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: [BUG] Error in BRIN summarization

From
Alvaro Herrera
Date:
On 2020-Jul-30, Anastasia Lubennikova wrote:

> While testing this fix, Alexander Lakhin spotted another problem. I
> simplified  the test case to this:

Ah, good catch.  I think a cleaner way to fix this problem is to just
consider the range as not summarized and return NULL from there, as in
the attached patch.  Running your test case with a telltale WARNING
added at that point, it's clear that it's being hit.

By returning NULL, we're forcing the caller to scan the heap, which is
not great.  But note that if you retry, and your VACUUM hasn't run yet
by the time we go through the loop again, the same thing would happen.
So it seems to me a good enough answer.

A much more troubling thought is what happens if the range is
desummarized, then the index item is used for the summary of a different
range.  Then the index might end up returning corrupt results.

> At first, I tried to fix it by holding the lock on revmap->rm_currBuf until
> we locked the regular page, but it causes a deadlock with brinsummarize(),
> It can be easily reproduced with the same test as above.
> Is there any rule about the order of locking revmap and regular pages in
> brin? I haven't found anything in README.

Umm, I thought that stuff was in the README, but it seems I didn't add
it there.  I think I had a .org file with my notes on that ... must be
in an older laptop disk, because it's not in my worktree for that.  I'll
see if I can fish it out.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [BUG] Error in BRIN summarization

From
Alvaro Herrera
Date:
On 2020-Jul-23, Anastasia Lubennikova wrote:

> This error is caused by the problem with root_offsets array bounds. It
> occurs if a new HOT tuple was inserted after we've collected root_offsets,
> and thus we don't have root_offset for tuple's offnum. Concurrent insertions
> are possible, because brin_summarize_new_values() only holds ShareUpdateLock
> on table and no lock (only pin) on the page.

Excellent detective work, thanks.

> The draft fix is in the attachments. It saves root_offsets_size and checks
> that we only access valid fields.

I think this is more complicated than necessary.  It seems easier to
solve this problem by just checking whether the given root pointer is
set to InvalidOffsetNumber, which is already done in the existing coding
of heap_get_root_tuples (only they spell it "0" rather than
InvalidOffsetNumber, which I propose to change).  AFAIR this should only
happen in the 'anyvisible' mode, so I added that in an assert.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [BUG] Error in BRIN summarization

From
Alvaro Herrera
Date:
On 2020-Jul-28, Peter Geoghegan wrote:

> On Mon, Jul 27, 2020 at 10:25 AM Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > (I was also considering whether it needs to be a loop to reobtain root
> > tuples, in case a concurrent transaction can create a new item while
> > we're checking that item; but I don't think that can really happen for
> > one individual tuple.)
> 
> I wonder if something like that is the underlying problem in a recent
> problem case involving a "REINDEX index
> pg_class_tblspc_relfilenode_index" command that runs concurrently with
> the regression tests:
> 
> https://postgr.es/m/CAH2-WzmBxu4o=pMsniur+bwHqCGCmV_AOLkuK6BuU7ngA6evqw@mail.gmail.com
> 
> We see a violation of the HOT invariant in this case, though only for
> a system catalog index, and only in fairly particular circumstances
> involving concurrency.

Hmm.  As far as I understand, the bug Anastasia reports can only hit an
index build that occurs concurrently to heap updates; and that cannot
happen for a regular index build, only for CREATE INDEX CONCURRENTLY and
REINDEX CONCURRENTLY.  So unless I miss something, it's not related to
that other bug.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [BUG] Error in BRIN summarization

From
Alvaro Herrera
Date:
On 2020-Aug-11, Alvaro Herrera wrote:

> I think this is more complicated than necessary.  It seems easier to
> solve this problem by just checking whether the given root pointer is
> set to InvalidOffsetNumber, which is already done in the existing coding
> of heap_get_root_tuples (only they spell it "0" rather than
> InvalidOffsetNumber, which I propose to change).  AFAIR this should only
> happen in the 'anyvisible' mode, so I added that in an assert.

'anyvisible' mode is not required AFAICS; reading the code, I think this
could also hit REINDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY, which
do not use that flag.  I didn't try to reproduce it there, though.
Anyway, I'm going to remove that Assert() I added.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [BUG] Error in BRIN summarization

From
Alvaro Herrera
Date:
On 2020-Aug-11, Alvaro Herrera wrote:

> A much more troubling thought is what happens if the range is
> desummarized, then the index item is used for the summary of a different
> range.  Then the index might end up returning corrupt results.

Actually, this is not a concern because the brin tuple's bt_blkno is
rechecked before returning it, and if it doesn't match what we're
searching, the loop is restarted.  It becomes an infinite loop problem
if the revmap is pointing to a tuple that's labelled with a different
range's blkno.  So I think my patch as posted is a sufficient fix for
this problem.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [BUG] Error in BRIN summarization

From
Alvaro Herrera
Date:
On 2020-Aug-12, Alvaro Herrera wrote:

> 'anyvisible' mode is not required AFAICS; reading the code, I think this
> could also hit REINDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY, which
> do not use that flag.  I didn't try to reproduce it there, though.
> Anyway, I'm going to remove that Assert() I added.

So this is what I propose as the final form of the fix.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [BUG] Error in BRIN summarization

From
Anastasia Lubennikova
Date:
On 12.08.2020 22:58, Alvaro Herrera wrote:
> On 2020-Aug-12, Alvaro Herrera wrote:
>
>> 'anyvisible' mode is not required AFAICS; reading the code, I think this
>> could also hit REINDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY, which
>> do not use that flag.  I didn't try to reproduce it there, though.
>> Anyway, I'm going to remove that Assert() I added.
> So this is what I propose as the final form of the fix.
>
Cool.
This version looks much simpler than mine and passes the tests fine.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [BUG] Error in BRIN summarization

From
Alvaro Herrera
Date:
On 2020-Aug-13, Anastasia Lubennikova wrote:

> Cool.
> This version looks much simpler than mine and passes the tests fine.

Thanks, pushed it to all branches.  Thanks for diagnosing this problem!

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [BUG] Error in BRIN summarization

From
Tom Lane
Date:
hyrax's latest report suggests that this patch has issues under
CLOBBER_CACHE_ALWAYS:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-08-13%2005%3A09%3A58

Hard to tell whether there's an actual bug there or just test instability,
but either way it needs to be resolved.

            regards, tom lane



Re: [BUG] Error in BRIN summarization

From
Alvaro Herrera
Date:
On 2020-Aug-15, Tom Lane wrote:

> hyrax's latest report suggests that this patch has issues under
> CLOBBER_CACHE_ALWAYS:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-08-13%2005%3A09%3A58
> 
> Hard to tell whether there's an actual bug there or just test instability,
> but either way it needs to be resolved.

Hmm, the only explanation I can see for this is that autovacuum managed
to summarize the range before the test script did it.  So the solution
would simply be to disable autovacuum for the table across the whole
script.

I'm running the scripts and dependencies to verify that fix, but under
CLOBBER_CACHE_ALWAYS that takes quite a bit.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [BUG] Error in BRIN summarization

From
Alvaro Herrera
Date:
On 2020-Aug-17, Alvaro Herrera wrote:

> Hmm, the only explanation I can see for this is that autovacuum managed
> to summarize the range before the test script did it.  So the solution
> would simply be to disable autovacuum for the table across the whole
> script.
> 
> I'm running the scripts and dependencies to verify that fix, but under
> CLOBBER_CACHE_ALWAYS that takes quite a bit.

I ran a subset of tests a few times, but was unable to reproduce the
problem.  I'll just push this to all branches and hope for the best.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment