Thread: BRIN INDEX value

BRIN INDEX value

From
Tatsuo Ishii
Date:
When creating a brin index, it shows an interesting behavior when used
with VACUUM.

First, I created a brin index after inserting data.
===============================================================
DROP TABLE t1;
DROP TABLE
CREATE TABLE t1(i int);
CREATE TABLE
INSERT INTO t1 VALUES (generate_series(1, 100000));
INSERT 0 100000
CREATE INDEX brinidx ON t1 USING brin (i);
CREATE INDEX
SELECT * FROM brin_revmap_data(get_raw_page('brinidx', 1)) WHERE pages != '(0,0)'::tid;pages 
-------(2,1)(2,2)(2,3)(2,4)
(4 rows)

SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx');itemoffset | blknum | attnum | allnulls | hasnulls
|placeholder |       value       
 
------------+--------+--------+----------+----------+-------------+-------------------         1 |      0 |      1 | f
     | f        | f           | {1 .. 28928}         2 |    128 |      1 | f        | f        | f           | {28929
..57856}         3 |    256 |      1 | f        | f        | f           | {57857 .. 86784}         4 |    384 |      1
|f        | f        | f           | {86785 .. 100000}
 
(4 rows)

SELECT * FROM brin_revmap_data(get_raw_page('brinidx', 1)) WHERE pages != '(0,0)'::tid;pages 
-------(2,1)(2,2)(2,3)(2,4)
(4 rows)

VACUUM;
VACUUM
SELECT * FROM brin_revmap_data(get_raw_page('brinidx', 1)) WHERE pages != '(0,0)'::tid;pages 
-------(2,1)(2,2)(2,3)(2,4)
(4 rows)

SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx');itemoffset | blknum | attnum | allnulls | hasnulls
|placeholder |       value       
 
------------+--------+--------+----------+----------+-------------+-------------------         1 |      0 |      1 | f
     | f        | f           | {1 .. 28928}         2 |    128 |      1 | f        | f        | f           | {28929
..57856}         3 |    256 |      1 | f        | f        | f           | {57857 .. 86784}         4 |    384 |      1
|f        | f        | f           | {86785 .. 100000}
 
(4 rows)
===============================================================

As you can see brin index value for block 384 or more is {86785.. 100000}. Good.

However I inserted data *after* creating index, the value is
different.
===============================================================
psql -e -f test.sql test
Pager usage is off.
DROP TABLE t1;
DROP TABLE
CREATE TABLE t1(i int);
CREATE TABLE
CREATE INDEX brinidx ON t1 USING brin (i);
CREATE INDEX
SELECT * FROM brin_revmap_data(get_raw_page('brinidx', 1)) WHERE pages != '(0,0)'::tid;pages 
-------(2,1)
(1 row)

SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx');itemoffset | blknum | attnum | allnulls | hasnulls
|placeholder | value 
 
------------+--------+--------+----------+----------+-------------+-------         1 |      0 |      1 | t        | f
    | f           | 
 
(1 row)

INSERT INTO t1 VALUES (generate_series(1, 100000));
INSERT 0 100000
SELECT * FROM brin_revmap_data(get_raw_page('brinidx', 1)) WHERE pages != '(0,0)'::tid;pages 
-------(2,1)
(1 row)

VACUUM;
VACUUM
SELECT * FROM brin_revmap_data(get_raw_page('brinidx', 1)) WHERE pages != '(0,0)'::tid;pages 
-------(2,1)(2,2)(2,3)(2,4)
(4 rows)

SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx');itemoffset | blknum | attnum | allnulls | hasnulls
|placeholder |      value       
 
------------+--------+--------+----------+----------+-------------+------------------         1 |      0 |      1 | f
    | f        | f           | {1 .. 28928}         2 |    128 |      1 | f        | f        | f           | {28929 ..
57856}        3 |    256 |      1 | f        | f        | f           | {57857 .. 86784}         4 |    384 |      1 |
f       | f        | f           | {1 .. 100000}
 
(4 rows)
===============================================================

How the index value for block 384 could be {1 .. 100000}?

I have tested with 9.5 alpha2 and 9.5-stable head as of today.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: BRIN INDEX value

From
Amit Langote
Date:
On 9/3/2015 5:49 PM, Tatsuo Ishii wrote:
>
> However I inserted data *after* creating index, the value is
> different.
> VACUUM;
> VACUUM
> SELECT * FROM brin_revmap_data(get_raw_page('brinidx', 1)) WHERE pages != '(0,0)'::tid;
>  pages
> -------
>  (2,1)
>  (2,2)
>  (2,3)
>  (2,4)
> (4 rows)
>
> SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx');
>  itemoffset | blknum | attnum | allnulls | hasnulls | placeholder |      value
> ------------+--------+--------+----------+----------+-------------+------------------
>           1 |      0 |      1 | f        | f        | f           | {1 .. 28928}
>           2 |    128 |      1 | f        | f        | f           | {28929 .. 57856}
>           3 |    256 |      1 | f        | f        | f           | {57857 .. 86784}
>           4 |    384 |      1 | f        | f        | f           | {1 .. 100000}
> (4 rows)
> ===============================================================
>
> How the index value for block 384 could be {1 .. 100000}?
>

The summarization during VACUUM invokes IndexBuildHeapRangeScan() which is
passed scanStartBlock and scanNumBlocks. If scanStartBlock + scanNumBlocks
> heapTotalBlocks, further down the line, heapgettup() may start returning
tuples from the beginning given the following code in it:

  page++;
  if (page >= scan->rs_nblocks)
      page = 0;

  finished = (page == scan->rs_startblock) ||
               (scan->rs_numblocks != InvalidBlockNumber ?
                 --scan->rs_numblocks == 0 :
                 false);

Where finished indicates whether it thinks the end of heap is reached.

In this case, scan->rs_startblock is 384 set by IndexBuildHeapRangeScan()
using heap_setscanlimits(). One can imagine how the above heap finish
criteria might not work as expected.

That helps explain why 1 becomes the min for that brin tuple.

Attached hack fixes the symptom but perhaps not the correct fix for this.

Thanks,
Amit

Attachment

Re: BRIN INDEX value

From
Tatsuo Ishii
Date:
> The summarization during VACUUM invokes IndexBuildHeapRangeScan() which is
> passed scanStartBlock and scanNumBlocks. If scanStartBlock + scanNumBlocks
>> heapTotalBlocks, further down the line, heapgettup() may start returning
> tuples from the beginning given the following code in it:
> 
>   page++;
>   if (page >= scan->rs_nblocks)
>       page = 0;
> 
>   finished = (page == scan->rs_startblock) ||
>                (scan->rs_numblocks != InvalidBlockNumber ?
>                  --scan->rs_numblocks == 0 :
>                  false);
> 
> Where finished indicates whether it thinks the end of heap is reached.
> 
> In this case, scan->rs_startblock is 384 set by IndexBuildHeapRangeScan()
> using heap_setscanlimits(). One can imagine how the above heap finish
> criteria might not work as expected.
> 
> That helps explain why 1 becomes the min for that brin tuple.
> 
> Attached hack fixes the symptom but perhaps not the correct fix for this.

Why can't we fix summarize_range() in brin.c:
IndexBuildHeapRangeScan(heapRel, state->bs_irel, indexInfo, false, true,                        heapBlk,
state->bs_pagesPerRange,                       brinbuildCallback, (void *) state);
 

This currently thoughtlessly passes scannumblocks as
state->bs_pagesPerRange. Shouldn't we change this so that
(scanStartBlock + scanNumBlocks) does not exceed scan->rs_nblocks?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: BRIN INDEX value

From
Tatsuo Ishii
Date:
> The summarization during VACUUM invokes IndexBuildHeapRangeScan() which is
> passed scanStartBlock and scanNumBlocks. If scanStartBlock + scanNumBlocks
>> heapTotalBlocks, further down the line, heapgettup() may start returning
> tuples from the beginning given the following code in it:
> 
>   page++;
>   if (page >= scan->rs_nblocks)
>       page = 0;
> 
>   finished = (page == scan->rs_startblock) ||
>                (scan->rs_numblocks != InvalidBlockNumber ?
>                  --scan->rs_numblocks == 0 :
>                  false);
> 
> Where finished indicates whether it thinks the end of heap is reached.
> 
> In this case, scan->rs_startblock is 384 set by IndexBuildHeapRangeScan()
> using heap_setscanlimits(). One can imagine how the above heap finish
> criteria might not work as expected.

What scares me is:

1) the bug will not be found unless someone inspects the internal data  of BRIN. Regression test is useless here.

2) the bug effectively causes vacuum scans the heap *twice*, which  will produce lots of I/O if the heap is not small.

3) I wonder if other index type is suffered by this type of bug.

I vaguely recall that more internal inspecting type brin regression
test was removed because it depends on contrib modules. I am not sure
whether the decision was appropreate or not.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: BRIN INDEX value

From
Amit Langote
Date:
On 9/4/2015 9:22 AM, Tatsuo Ishii wrote:
>>
>> In this case, scan->rs_startblock is 384 set by IndexBuildHeapRangeScan()
>> using heap_setscanlimits(). One can imagine how the above heap finish
>> criteria might not work as expected.
> 
> What scares me is:
> 
> 1) the bug will not be found unless someone inspects the internal data
>    of BRIN. Regression test is useless here.
> 
> 2) the bug effectively causes vacuum scans the heap *twice*, which
>    will produce lots of I/O if the heap is not small.
> 
> 3) I wonder if other index type is suffered by this type of bug.
> 

About 3, it seems unlikely. Both the IndexBuildHeapRangeScan() and
heap_setscanlimits() were introduced by the BRIN patch and I didn't find
anything else using it, yet.

Thanks,
Amit




Re: BRIN INDEX value

From
Amit Langote
Date:
On 9/4/2015 8:28 AM, Tatsuo Ishii wrote:
>>
>> Attached hack fixes the symptom but perhaps not the correct fix for this.
>
> Why can't we fix summarize_range() in brin.c:
>
>     IndexBuildHeapRangeScan(heapRel, state->bs_irel, indexInfo, false, true,
>                             heapBlk, state->bs_pagesPerRange,
>                             brinbuildCallback, (void *) state);
>
> This currently thoughtlessly passes scannumblocks as
> state->bs_pagesPerRange. Shouldn't we change this so that
> (scanStartBlock + scanNumBlocks) does not exceed scan->rs_nblocks?
>

Ah, it did cross my mind to the fix it in brin.c but was not sure. I did
it that way in the attached patch.

Thanks,
Amit

Attachment

Re: BRIN INDEX value

From
Alvaro Herrera
Date:
Amit Langote wrote:
> On 9/4/2015 9:22 AM, Tatsuo Ishii wrote:
> >>
> >> In this case, scan->rs_startblock is 384 set by IndexBuildHeapRangeScan()
> >> using heap_setscanlimits(). One can imagine how the above heap finish
> >> criteria might not work as expected.
> > 
> > What scares me is:
> > 
> > 1) the bug will not be found unless someone inspects the internal data
> >    of BRIN. Regression test is useless here.
> > 
> > 2) the bug effectively causes vacuum scans the heap *twice*, which
> >    will produce lots of I/O if the heap is not small.
> > 
> > 3) I wonder if other index type is suffered by this type of bug.
> > 
> 
> About 3, it seems unlikely. Both the IndexBuildHeapRangeScan() and
> heap_setscanlimits() were introduced by the BRIN patch and I didn't find
> anything else using it, yet.

As I recall, there's some patch that Robert Haas or Amit Kapila that
wants to use that function.  Maybe something in the parallel seqscan
patch series?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: BRIN INDEX value

From
Amit Langote
Date:
On 9/4/2015 12:01 PM, Alvaro Herrera wrote:
> Amit Langote wrote:
>> On 9/4/2015 9:22 AM, Tatsuo Ishii wrote:
>>>
>>> 3) I wonder if other index type is suffered by this type of bug.
>>>
>>
>> About 3, it seems unlikely. Both the IndexBuildHeapRangeScan() and
>> heap_setscanlimits() were introduced by the BRIN patch and I didn't find
>> anything else using it, yet.
> 
> As I recall, there's some patch that Robert Haas or Amit Kapila that
> wants to use that function.  Maybe something in the parallel seqscan
> patch series?

I just checked Amit Kapila's latest[1] patch, and found they have since
abandoned the heap_setscanlimits() approach.

Thanks,
Amit

[1]
http://www.postgresql.org/message-id/CAA4eK1J6SqOEAkxidoxp7kqx2_D-XNspx4rq4AO17P+SqKygFw@mail.gmail.com




Re: BRIN INDEX value

From
Tatsuo Ishii
Date:
> On 9/4/2015 8:28 AM, Tatsuo Ishii wrote:
>>>
>>> Attached hack fixes the symptom but perhaps not the correct fix for this.
>> 
>> Why can't we fix summarize_range() in brin.c:
>> 
>>     IndexBuildHeapRangeScan(heapRel, state->bs_irel, indexInfo, false, true,
>>                             heapBlk, state->bs_pagesPerRange,
>>                             brinbuildCallback, (void *) state);
>> 
>> This currently thoughtlessly passes scannumblocks as
>> state->bs_pagesPerRange. Shouldn't we change this so that
>> (scanStartBlock + scanNumBlocks) does not exceed scan->rs_nblocks?
>> 
> 
> Ah, it did cross my mind to the fix it in brin.c but was not sure. I did
> it that way in the attached patch.

Amit,

I have looked into your patch and am a little bit worried because it
calls RelationGetNumberOfBlocks() which is an expensive function.
I think there are two ways to avoid it.

1) fall back to your former patch. However it may break existing  behavior what you said. I think there's very little
chanceit  actually happens because IndexBuildHeapRangeScan() is only used by  brin (I confirmed this). But Alvaro said
somepatches may be it's  customers. Robert, Amit, Can you please confirm this?
 

2) change the signature of IndexBuildHeapRangeScan() to be able to  teach it to limit the target block number to not
exceedthe last  page of the relation. Again this may break someone's patch and need  confirmation.
 

What do you think?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: BRIN INDEX value

From
Amit Langote
Date:
On 9/4/2015 1:33 PM, Tatsuo Ishii wrote:
>> Ah, it did cross my mind to the fix it in brin.c but was not sure. I did
>> it that way in the attached patch.
> 
> Amit,
> 
> I have looked into your patch and am a little bit worried because it
> calls RelationGetNumberOfBlocks() which is an expensive function.
> I think there are two ways to avoid it.
> 
> 1) fall back to your former patch. However it may break existing
>    behavior what you said. I think there's very little chance it
>    actually happens because IndexBuildHeapRangeScan() is only used by
>    brin (I confirmed this). But Alvaro said some patches may be it's
>    customers. Robert, Amit, Can you please confirm this?
> 
> 2) change the signature of IndexBuildHeapRangeScan() to be able to
>    teach it to limit the target block number to not exceed the last
>    page of the relation. Again this may break someone's patch and need
>    confirmation.
> 
> What do you think?

One thing I imagine we could do is to change the signature of
summrize_range() to also include heapNumBlks which its (only) caller
brinsummarize() already computes. It will look like:

static void summarize_range(IndexInfo *indexInfo, BrinBuildState *state,                             Relation heapRel,
                          BlockNumber heapBlk,                             BlockNumber heapNumBlks);
 

I'd think changing summarize_range()'s signature would be relatively
easier/safer.

Thanks,
Amit




Re: BRIN INDEX value

From
Tatsuo Ishii
Date:
> One thing I imagine we could do is to change the signature of
> summrize_range() to also include heapNumBlks which its (only) caller
> brinsummarize() already computes. It will look like:
> 
> static void summarize_range(IndexInfo *indexInfo, BrinBuildState *state,
>                               Relation heapRel,
>                               BlockNumber heapBlk,
>                               BlockNumber heapNumBlks);
> 
> I'd think changing summarize_range()'s signature would be relatively
> easier/safer.

Yeah, sounds good.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: BRIN INDEX value

From
Amit Langote
Date:
On 9/4/2015 2:04 PM, Tatsuo Ishii wrote:
>> One thing I imagine we could do is to change the signature of
>> summrize_range() to also include heapNumBlks which its (only) caller
>> brinsummarize() already computes. It will look like:
>>
>> static void summarize_range(IndexInfo *indexInfo, BrinBuildState *state,
>>                               Relation heapRel,
>>                               BlockNumber heapBlk,
>>                               BlockNumber heapNumBlks);
>>
>> I'd think changing summarize_range()'s signature would be relatively
>> easier/safer.
>
> Yeah, sounds good.

Here's a patch to do that.

Thanks,
Amit

Attachment

Re: BRIN INDEX value

From
Tatsuo Ishii
Date:
> On 9/4/2015 2:04 PM, Tatsuo Ishii wrote:
>>> One thing I imagine we could do is to change the signature of
>>> summrize_range() to also include heapNumBlks which its (only) caller
>>> brinsummarize() already computes. It will look like:
>>>
>>> static void summarize_range(IndexInfo *indexInfo, BrinBuildState *state,
>>>                               Relation heapRel,
>>>                               BlockNumber heapBlk,
>>>                               BlockNumber heapNumBlks);
>>>
>>> I'd think changing summarize_range()'s signature would be relatively
>>> easier/safer.
>> 
>> Yeah, sounds good.
> 
> Here's a patch to do that.

Thanks. It looks good to me (and passed all the regression tests in
master branch). I will commit your patch if there's no objection.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: BRIN INDEX value

From
Amit Kapila
Date:
On Fri, Sep 4, 2015 at 10:03 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>
> I have looked into your patch and am a little bit worried because it
> calls RelationGetNumberOfBlocks() which is an expensive function.
> I think there are two ways to avoid it.
>
> 1) fall back to your former patch. However it may break existing
>    behavior what you said. I think there's very little chance it
>    actually happens because IndexBuildHeapRangeScan() is only used by
>    brin (I confirmed this). But Alvaro said some patches may be it's
>    customers. Robert, Amit, Can you please confirm this?
>

In earlier version of parallel_seqscan patch, heap_setscanlimits() was being
used, but now altogether a different mechanism is used.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: BRIN INDEX value

From
Tatsuo Ishii
Date:
> On Fri, Sep 4, 2015 at 10:03 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>>
>> I have looked into your patch and am a little bit worried because it
>> calls RelationGetNumberOfBlocks() which is an expensive function.
>> I think there are two ways to avoid it.
>>
>> 1) fall back to your former patch. However it may break existing
>>    behavior what you said. I think there's very little chance it
>>    actually happens because IndexBuildHeapRangeScan() is only used by
>>    brin (I confirmed this). But Alvaro said some patches may be it's
>>    customers. Robert, Amit, Can you please confirm this?
>>
> 
> In earlier version of parallel_seqscan patch, heap_setscanlimits() was being
> used, but now altogether a different mechanism is used.

Thank you for the confirmation!

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: BRIN INDEX value

From
Alvaro Herrera
Date:
Tatsuo Ishii wrote:
> > On 9/4/2015 2:04 PM, Tatsuo Ishii wrote:
> >>> One thing I imagine we could do is to change the signature of
> >>> summrize_range() to also include heapNumBlks which its (only) caller
> >>> brinsummarize() already computes. It will look like:
> >>>
> >>> static void summarize_range(IndexInfo *indexInfo, BrinBuildState *state,
> >>>                               Relation heapRel,
> >>>                               BlockNumber heapBlk,
> >>>                               BlockNumber heapNumBlks);
> >>>
> >>> I'd think changing summarize_range()'s signature would be relatively
> >>> easier/safer.
> >> 
> >> Yeah, sounds good.
> > 
> > Here's a patch to do that.
> 
> Thanks. It looks good to me (and passed all the regression tests in
> master branch). I will commit your patch if there's no objection.

Yeah, thanks, please go ahead.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: BRIN INDEX value

From
Tatsuo Ishii
Date:
> Tatsuo Ishii wrote:
>> > On 9/4/2015 2:04 PM, Tatsuo Ishii wrote:
>> >>> One thing I imagine we could do is to change the signature of
>> >>> summrize_range() to also include heapNumBlks which its (only) caller
>> >>> brinsummarize() already computes. It will look like:
>> >>>
>> >>> static void summarize_range(IndexInfo *indexInfo, BrinBuildState *state,
>> >>>                               Relation heapRel,
>> >>>                               BlockNumber heapBlk,
>> >>>                               BlockNumber heapNumBlks);
>> >>>
>> >>> I'd think changing summarize_range()'s signature would be relatively
>> >>> easier/safer.
>> >> 
>> >> Yeah, sounds good.
>> > 
>> > Here's a patch to do that.
>> 
>> Thanks. It looks good to me (and passed all the regression tests in
>> master branch). I will commit your patch if there's no objection.
> 
> Yeah, thanks, please go ahead.

Thanks. Fix committed.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: BRIN INDEX value

From
Amit Langote
Date:


On Saturday, September 5, 2015, Tatsuo Ishii <ishii@postgresql.org> wrote:
> Tatsuo Ishii wrote:
>> > On 9/4/2015 2:04 PM, Tatsuo Ishii wrote:
>> >>> One thing I imagine we could do is to change the signature of
>> >>> summrize_range() to also include heapNumBlks which its (only) caller
>> >>> brinsummarize() already computes. It will look like:
>> >>>
>> >>> static void summarize_range(IndexInfo *indexInfo, BrinBuildState *state,
>> >>>                               Relation heapRel,
>> >>>                               BlockNumber heapBlk,
>> >>>                               BlockNumber heapNumBlks);
>> >>>
>> >>> I'd think changing summarize_range()'s signature would be relatively
>> >>> easier/safer.
>> >>
>> >> Yeah, sounds good.
>> >
>> > Here's a patch to do that.
>>
>> Thanks. It looks good to me (and passed all the regression tests in
>> master branch). I will commit your patch if there's no objection.
>
> Yeah, thanks, please go ahead.

Thanks. Fix committed.

Thank you Ishii-san!

Amit