Thread: BRIN INDEX value
When creating a brin index, it shows an interesting behavior when used with VACUUM. First, I created a brin index after inserting data. =============================================================== DROP TABLE t1; DROP TABLE CREATE TABLE t1(i int); CREATE TABLE INSERT INTO t1 VALUES (generate_series(1, 100000)); INSERT 0 100000 CREATE INDEX brinidx ON t1 USING brin (i); CREATE INDEX SELECT * FROM brin_revmap_data(get_raw_page('brinidx', 1)) WHERE pages != '(0,0)'::tid;pages -------(2,1)(2,2)(2,3)(2,4) (4 rows) SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx');itemoffset | blknum | attnum | allnulls | hasnulls |placeholder | value ------------+--------+--------+----------+----------+-------------+------------------- 1 | 0 | 1 | f | f | f | {1 .. 28928} 2 | 128 | 1 | f | f | f | {28929 ..57856} 3 | 256 | 1 | f | f | f | {57857 .. 86784} 4 | 384 | 1 |f | f | f | {86785 .. 100000} (4 rows) SELECT * FROM brin_revmap_data(get_raw_page('brinidx', 1)) WHERE pages != '(0,0)'::tid;pages -------(2,1)(2,2)(2,3)(2,4) (4 rows) VACUUM; VACUUM SELECT * FROM brin_revmap_data(get_raw_page('brinidx', 1)) WHERE pages != '(0,0)'::tid;pages -------(2,1)(2,2)(2,3)(2,4) (4 rows) SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx');itemoffset | blknum | attnum | allnulls | hasnulls |placeholder | value ------------+--------+--------+----------+----------+-------------+------------------- 1 | 0 | 1 | f | f | f | {1 .. 28928} 2 | 128 | 1 | f | f | f | {28929 ..57856} 3 | 256 | 1 | f | f | f | {57857 .. 86784} 4 | 384 | 1 |f | f | f | {86785 .. 100000} (4 rows) =============================================================== As you can see brin index value for block 384 or more is {86785.. 100000}. Good. However I inserted data *after* creating index, the value is different. =============================================================== psql -e -f test.sql test Pager usage is off. DROP TABLE t1; DROP TABLE CREATE TABLE t1(i int); CREATE TABLE CREATE INDEX brinidx ON t1 USING brin (i); CREATE INDEX SELECT * FROM brin_revmap_data(get_raw_page('brinidx', 1)) WHERE pages != '(0,0)'::tid;pages -------(2,1) (1 row) SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx');itemoffset | blknum | attnum | allnulls | hasnulls |placeholder | value ------------+--------+--------+----------+----------+-------------+------- 1 | 0 | 1 | t | f | f | (1 row) INSERT INTO t1 VALUES (generate_series(1, 100000)); INSERT 0 100000 SELECT * FROM brin_revmap_data(get_raw_page('brinidx', 1)) WHERE pages != '(0,0)'::tid;pages -------(2,1) (1 row) VACUUM; VACUUM SELECT * FROM brin_revmap_data(get_raw_page('brinidx', 1)) WHERE pages != '(0,0)'::tid;pages -------(2,1)(2,2)(2,3)(2,4) (4 rows) SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx');itemoffset | blknum | attnum | allnulls | hasnulls |placeholder | value ------------+--------+--------+----------+----------+-------------+------------------ 1 | 0 | 1 | f | f | f | {1 .. 28928} 2 | 128 | 1 | f | f | f | {28929 .. 57856} 3 | 256 | 1 | f | f | f | {57857 .. 86784} 4 | 384 | 1 | f | f | f | {1 .. 100000} (4 rows) =============================================================== How the index value for block 384 could be {1 .. 100000}? I have tested with 9.5 alpha2 and 9.5-stable head as of today. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On 9/3/2015 5:49 PM, Tatsuo Ishii wrote: > > However I inserted data *after* creating index, the value is > different. > VACUUM; > VACUUM > SELECT * FROM brin_revmap_data(get_raw_page('brinidx', 1)) WHERE pages != '(0,0)'::tid; > pages > ------- > (2,1) > (2,2) > (2,3) > (2,4) > (4 rows) > > SELECT * FROM brin_page_items(get_raw_page('brinidx', 2), 'brinidx'); > itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value > ------------+--------+--------+----------+----------+-------------+------------------ > 1 | 0 | 1 | f | f | f | {1 .. 28928} > 2 | 128 | 1 | f | f | f | {28929 .. 57856} > 3 | 256 | 1 | f | f | f | {57857 .. 86784} > 4 | 384 | 1 | f | f | f | {1 .. 100000} > (4 rows) > =============================================================== > > How the index value for block 384 could be {1 .. 100000}? > The summarization during VACUUM invokes IndexBuildHeapRangeScan() which is passed scanStartBlock and scanNumBlocks. If scanStartBlock + scanNumBlocks > heapTotalBlocks, further down the line, heapgettup() may start returning tuples from the beginning given the following code in it: page++; if (page >= scan->rs_nblocks) page = 0; finished = (page == scan->rs_startblock) || (scan->rs_numblocks != InvalidBlockNumber ? --scan->rs_numblocks == 0 : false); Where finished indicates whether it thinks the end of heap is reached. In this case, scan->rs_startblock is 384 set by IndexBuildHeapRangeScan() using heap_setscanlimits(). One can imagine how the above heap finish criteria might not work as expected. That helps explain why 1 becomes the min for that brin tuple. Attached hack fixes the symptom but perhaps not the correct fix for this. Thanks, Amit
Attachment
> The summarization during VACUUM invokes IndexBuildHeapRangeScan() which is > passed scanStartBlock and scanNumBlocks. If scanStartBlock + scanNumBlocks >> heapTotalBlocks, further down the line, heapgettup() may start returning > tuples from the beginning given the following code in it: > > page++; > if (page >= scan->rs_nblocks) > page = 0; > > finished = (page == scan->rs_startblock) || > (scan->rs_numblocks != InvalidBlockNumber ? > --scan->rs_numblocks == 0 : > false); > > Where finished indicates whether it thinks the end of heap is reached. > > In this case, scan->rs_startblock is 384 set by IndexBuildHeapRangeScan() > using heap_setscanlimits(). One can imagine how the above heap finish > criteria might not work as expected. > > That helps explain why 1 becomes the min for that brin tuple. > > Attached hack fixes the symptom but perhaps not the correct fix for this. Why can't we fix summarize_range() in brin.c: IndexBuildHeapRangeScan(heapRel, state->bs_irel, indexInfo, false, true, heapBlk, state->bs_pagesPerRange, brinbuildCallback, (void *) state); This currently thoughtlessly passes scannumblocks as state->bs_pagesPerRange. Shouldn't we change this so that (scanStartBlock + scanNumBlocks) does not exceed scan->rs_nblocks? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
> The summarization during VACUUM invokes IndexBuildHeapRangeScan() which is > passed scanStartBlock and scanNumBlocks. If scanStartBlock + scanNumBlocks >> heapTotalBlocks, further down the line, heapgettup() may start returning > tuples from the beginning given the following code in it: > > page++; > if (page >= scan->rs_nblocks) > page = 0; > > finished = (page == scan->rs_startblock) || > (scan->rs_numblocks != InvalidBlockNumber ? > --scan->rs_numblocks == 0 : > false); > > Where finished indicates whether it thinks the end of heap is reached. > > In this case, scan->rs_startblock is 384 set by IndexBuildHeapRangeScan() > using heap_setscanlimits(). One can imagine how the above heap finish > criteria might not work as expected. What scares me is: 1) the bug will not be found unless someone inspects the internal data of BRIN. Regression test is useless here. 2) the bug effectively causes vacuum scans the heap *twice*, which will produce lots of I/O if the heap is not small. 3) I wonder if other index type is suffered by this type of bug. I vaguely recall that more internal inspecting type brin regression test was removed because it depends on contrib modules. I am not sure whether the decision was appropreate or not. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On 9/4/2015 9:22 AM, Tatsuo Ishii wrote: >> >> In this case, scan->rs_startblock is 384 set by IndexBuildHeapRangeScan() >> using heap_setscanlimits(). One can imagine how the above heap finish >> criteria might not work as expected. > > What scares me is: > > 1) the bug will not be found unless someone inspects the internal data > of BRIN. Regression test is useless here. > > 2) the bug effectively causes vacuum scans the heap *twice*, which > will produce lots of I/O if the heap is not small. > > 3) I wonder if other index type is suffered by this type of bug. > About 3, it seems unlikely. Both the IndexBuildHeapRangeScan() and heap_setscanlimits() were introduced by the BRIN patch and I didn't find anything else using it, yet. Thanks, Amit
On 9/4/2015 8:28 AM, Tatsuo Ishii wrote: >> >> Attached hack fixes the symptom but perhaps not the correct fix for this. > > Why can't we fix summarize_range() in brin.c: > > IndexBuildHeapRangeScan(heapRel, state->bs_irel, indexInfo, false, true, > heapBlk, state->bs_pagesPerRange, > brinbuildCallback, (void *) state); > > This currently thoughtlessly passes scannumblocks as > state->bs_pagesPerRange. Shouldn't we change this so that > (scanStartBlock + scanNumBlocks) does not exceed scan->rs_nblocks? > Ah, it did cross my mind to the fix it in brin.c but was not sure. I did it that way in the attached patch. Thanks, Amit
Attachment
Amit Langote wrote: > On 9/4/2015 9:22 AM, Tatsuo Ishii wrote: > >> > >> In this case, scan->rs_startblock is 384 set by IndexBuildHeapRangeScan() > >> using heap_setscanlimits(). One can imagine how the above heap finish > >> criteria might not work as expected. > > > > What scares me is: > > > > 1) the bug will not be found unless someone inspects the internal data > > of BRIN. Regression test is useless here. > > > > 2) the bug effectively causes vacuum scans the heap *twice*, which > > will produce lots of I/O if the heap is not small. > > > > 3) I wonder if other index type is suffered by this type of bug. > > > > About 3, it seems unlikely. Both the IndexBuildHeapRangeScan() and > heap_setscanlimits() were introduced by the BRIN patch and I didn't find > anything else using it, yet. As I recall, there's some patch that Robert Haas or Amit Kapila that wants to use that function. Maybe something in the parallel seqscan patch series? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 9/4/2015 12:01 PM, Alvaro Herrera wrote: > Amit Langote wrote: >> On 9/4/2015 9:22 AM, Tatsuo Ishii wrote: >>> >>> 3) I wonder if other index type is suffered by this type of bug. >>> >> >> About 3, it seems unlikely. Both the IndexBuildHeapRangeScan() and >> heap_setscanlimits() were introduced by the BRIN patch and I didn't find >> anything else using it, yet. > > As I recall, there's some patch that Robert Haas or Amit Kapila that > wants to use that function. Maybe something in the parallel seqscan > patch series? I just checked Amit Kapila's latest[1] patch, and found they have since abandoned the heap_setscanlimits() approach. Thanks, Amit [1] http://www.postgresql.org/message-id/CAA4eK1J6SqOEAkxidoxp7kqx2_D-XNspx4rq4AO17P+SqKygFw@mail.gmail.com
> On 9/4/2015 8:28 AM, Tatsuo Ishii wrote: >>> >>> Attached hack fixes the symptom but perhaps not the correct fix for this. >> >> Why can't we fix summarize_range() in brin.c: >> >> IndexBuildHeapRangeScan(heapRel, state->bs_irel, indexInfo, false, true, >> heapBlk, state->bs_pagesPerRange, >> brinbuildCallback, (void *) state); >> >> This currently thoughtlessly passes scannumblocks as >> state->bs_pagesPerRange. Shouldn't we change this so that >> (scanStartBlock + scanNumBlocks) does not exceed scan->rs_nblocks? >> > > Ah, it did cross my mind to the fix it in brin.c but was not sure. I did > it that way in the attached patch. Amit, I have looked into your patch and am a little bit worried because it calls RelationGetNumberOfBlocks() which is an expensive function. I think there are two ways to avoid it. 1) fall back to your former patch. However it may break existing behavior what you said. I think there's very little chanceit actually happens because IndexBuildHeapRangeScan() is only used by brin (I confirmed this). But Alvaro said somepatches may be it's customers. Robert, Amit, Can you please confirm this? 2) change the signature of IndexBuildHeapRangeScan() to be able to teach it to limit the target block number to not exceedthe last page of the relation. Again this may break someone's patch and need confirmation. What do you think? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On 9/4/2015 1:33 PM, Tatsuo Ishii wrote: >> Ah, it did cross my mind to the fix it in brin.c but was not sure. I did >> it that way in the attached patch. > > Amit, > > I have looked into your patch and am a little bit worried because it > calls RelationGetNumberOfBlocks() which is an expensive function. > I think there are two ways to avoid it. > > 1) fall back to your former patch. However it may break existing > behavior what you said. I think there's very little chance it > actually happens because IndexBuildHeapRangeScan() is only used by > brin (I confirmed this). But Alvaro said some patches may be it's > customers. Robert, Amit, Can you please confirm this? > > 2) change the signature of IndexBuildHeapRangeScan() to be able to > teach it to limit the target block number to not exceed the last > page of the relation. Again this may break someone's patch and need > confirmation. > > What do you think? One thing I imagine we could do is to change the signature of summrize_range() to also include heapNumBlks which its (only) caller brinsummarize() already computes. It will look like: static void summarize_range(IndexInfo *indexInfo, BrinBuildState *state, Relation heapRel, BlockNumber heapBlk, BlockNumber heapNumBlks); I'd think changing summarize_range()'s signature would be relatively easier/safer. Thanks, Amit
> One thing I imagine we could do is to change the signature of > summrize_range() to also include heapNumBlks which its (only) caller > brinsummarize() already computes. It will look like: > > static void summarize_range(IndexInfo *indexInfo, BrinBuildState *state, > Relation heapRel, > BlockNumber heapBlk, > BlockNumber heapNumBlks); > > I'd think changing summarize_range()'s signature would be relatively > easier/safer. Yeah, sounds good. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On 9/4/2015 2:04 PM, Tatsuo Ishii wrote: >> One thing I imagine we could do is to change the signature of >> summrize_range() to also include heapNumBlks which its (only) caller >> brinsummarize() already computes. It will look like: >> >> static void summarize_range(IndexInfo *indexInfo, BrinBuildState *state, >> Relation heapRel, >> BlockNumber heapBlk, >> BlockNumber heapNumBlks); >> >> I'd think changing summarize_range()'s signature would be relatively >> easier/safer. > > Yeah, sounds good. Here's a patch to do that. Thanks, Amit
Attachment
> On 9/4/2015 2:04 PM, Tatsuo Ishii wrote: >>> One thing I imagine we could do is to change the signature of >>> summrize_range() to also include heapNumBlks which its (only) caller >>> brinsummarize() already computes. It will look like: >>> >>> static void summarize_range(IndexInfo *indexInfo, BrinBuildState *state, >>> Relation heapRel, >>> BlockNumber heapBlk, >>> BlockNumber heapNumBlks); >>> >>> I'd think changing summarize_range()'s signature would be relatively >>> easier/safer. >> >> Yeah, sounds good. > > Here's a patch to do that. Thanks. It looks good to me (and passed all the regression tests in master branch). I will commit your patch if there's no objection. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Fri, Sep 4, 2015 at 10:03 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>
> I have looked into your patch and am a little bit worried because it
> calls RelationGetNumberOfBlocks() which is an expensive function.
> I think there are two ways to avoid it.
>
> 1) fall back to your former patch. However it may break existing
> behavior what you said. I think there's very little chance it
> actually happens because IndexBuildHeapRangeScan() is only used by
> brin (I confirmed this). But Alvaro said some patches may be it's
> customers. Robert, Amit, Can you please confirm this?
>
>
> I have looked into your patch and am a little bit worried because it
> calls RelationGetNumberOfBlocks() which is an expensive function.
> I think there are two ways to avoid it.
>
> 1) fall back to your former patch. However it may break existing
> behavior what you said. I think there's very little chance it
> actually happens because IndexBuildHeapRangeScan() is only used by
> brin (I confirmed this). But Alvaro said some patches may be it's
> customers. Robert, Amit, Can you please confirm this?
>
In earlier version of parallel_seqscan patch, heap_setscanlimits() was being
used, but now altogether a different mechanism is used.
> On Fri, Sep 4, 2015 at 10:03 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >> >> I have looked into your patch and am a little bit worried because it >> calls RelationGetNumberOfBlocks() which is an expensive function. >> I think there are two ways to avoid it. >> >> 1) fall back to your former patch. However it may break existing >> behavior what you said. I think there's very little chance it >> actually happens because IndexBuildHeapRangeScan() is only used by >> brin (I confirmed this). But Alvaro said some patches may be it's >> customers. Robert, Amit, Can you please confirm this? >> > > In earlier version of parallel_seqscan patch, heap_setscanlimits() was being > used, but now altogether a different mechanism is used. Thank you for the confirmation! Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Tatsuo Ishii wrote: > > On 9/4/2015 2:04 PM, Tatsuo Ishii wrote: > >>> One thing I imagine we could do is to change the signature of > >>> summrize_range() to also include heapNumBlks which its (only) caller > >>> brinsummarize() already computes. It will look like: > >>> > >>> static void summarize_range(IndexInfo *indexInfo, BrinBuildState *state, > >>> Relation heapRel, > >>> BlockNumber heapBlk, > >>> BlockNumber heapNumBlks); > >>> > >>> I'd think changing summarize_range()'s signature would be relatively > >>> easier/safer. > >> > >> Yeah, sounds good. > > > > Here's a patch to do that. > > Thanks. It looks good to me (and passed all the regression tests in > master branch). I will commit your patch if there's no objection. Yeah, thanks, please go ahead. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> Tatsuo Ishii wrote: >> > On 9/4/2015 2:04 PM, Tatsuo Ishii wrote: >> >>> One thing I imagine we could do is to change the signature of >> >>> summrize_range() to also include heapNumBlks which its (only) caller >> >>> brinsummarize() already computes. It will look like: >> >>> >> >>> static void summarize_range(IndexInfo *indexInfo, BrinBuildState *state, >> >>> Relation heapRel, >> >>> BlockNumber heapBlk, >> >>> BlockNumber heapNumBlks); >> >>> >> >>> I'd think changing summarize_range()'s signature would be relatively >> >>> easier/safer. >> >> >> >> Yeah, sounds good. >> > >> > Here's a patch to do that. >> >> Thanks. It looks good to me (and passed all the regression tests in >> master branch). I will commit your patch if there's no objection. > > Yeah, thanks, please go ahead. Thanks. Fix committed. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Saturday, September 5, 2015, Tatsuo Ishii <ishii@postgresql.org> wrote:
> Tatsuo Ishii wrote:
>> > On 9/4/2015 2:04 PM, Tatsuo Ishii wrote:
>> >>> One thing I imagine we could do is to change the signature of
>> >>> summrize_range() to also include heapNumBlks which its (only) caller
>> >>> brinsummarize() already computes. It will look like:
>> >>>
>> >>> static void summarize_range(IndexInfo *indexInfo, BrinBuildState *state,
>> >>> Relation heapRel,
>> >>> BlockNumber heapBlk,
>> >>> BlockNumber heapNumBlks);
>> >>>
>> >>> I'd think changing summarize_range()'s signature would be relatively
>> >>> easier/safer.
>> >>
>> >> Yeah, sounds good.
>> >
>> > Here's a patch to do that.
>>
>> Thanks. It looks good to me (and passed all the regression tests in
>> master branch). I will commit your patch if there's no objection.
>
> Yeah, thanks, please go ahead.
Thanks. Fix committed.
Thank you Ishii-san!
Amit