Thread: maximum columns for brin bloom indexes
Hi everyone, When testing brin bloom indexes I noted that we need to reduce the PAGES_PER_RANGE parameter of the index to allow more columns on it. Sadly, this could be a problem if you create the index before the table grows, once it reaches some number of rows (i see the error as early as 1000 rows) it starts error out. create table t1(i int, j int); -- uses default PAGES_PER_RANGE=128 create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ; insert into t1 select random()*1000, random()*1000 from generate_series(1, 1000); ERROR: index row size 8968 exceeds maximum 8152 for index "t1_i_j_idx" if instead you create the index with a minor PAGES_PER_RANGE it goes fine, in this case it works once you reduce it to at least 116 create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) with (pages_per_range=116); so, for having: two int columns - PAGES_PER_RANGE should be max 116 three int columns - PAGES_PER_RANGE should be max 77 one int and one timestamp - PAGES_PER_RANGE should be max 121 and so on -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
On 4/8/21 9:08 AM, Jaime Casanova wrote: > Hi everyone, > > When testing brin bloom indexes I noted that we need to reduce the > PAGES_PER_RANGE parameter of the index to allow more columns on it. > > Sadly, this could be a problem if you create the index before the table > grows, once it reaches some number of rows (i see the error as early as > 1000 rows) it starts error out. > > create table t1(i int, j int); > > -- uses default PAGES_PER_RANGE=128 > create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ; > > insert into t1 > select random()*1000, random()*1000 from generate_series(1, 1000); > ERROR: index row size 8968 exceeds maximum 8152 for index "t1_i_j_idx" > > if instead you create the index with a minor PAGES_PER_RANGE it goes > fine, in this case it works once you reduce it to at least 116 > > create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) > with (pages_per_range=116); > > > so, for having: > two int columns - PAGES_PER_RANGE should be max 116 > three int columns - PAGES_PER_RANGE should be max 77 > one int and one timestamp - PAGES_PER_RANGE should be max 121 > > and so on > No, because this very much depends on the number if distinct values in the page page range, which determines how well the bloom filter compresses. You used 1000, but that's just an arbitrary value and the actual data might have any other value. And it's unlikely that all three columns will have the same number of distinct values. Of course, this also depends on the false positive rate. FWIW I doubt people are using multi-column BRIN indexes very often. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Apr 08, 2021 at 12:18:36PM +0200, Tomas Vondra wrote: > On 4/8/21 9:08 AM, Jaime Casanova wrote: > > Hi everyone, > > > > When testing brin bloom indexes I noted that we need to reduce the > > PAGES_PER_RANGE parameter of the index to allow more columns on it. > > > > Sadly, this could be a problem if you create the index before the table > > grows, once it reaches some number of rows (i see the error as early as > > 1000 rows) it starts error out. > > > > create table t1(i int, j int); > > > > -- uses default PAGES_PER_RANGE=128 > > create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ; > > > > insert into t1 > > select random()*1000, random()*1000 from generate_series(1, 1000); > > ERROR: index row size 8968 exceeds maximum 8152 for index "t1_i_j_idx" > > > > if instead you create the index with a minor PAGES_PER_RANGE it goes > > fine, in this case it works once you reduce it to at least 116 > > > > create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) > > with (pages_per_range=116); > > > > > > so, for having: > > two int columns - PAGES_PER_RANGE should be max 116 > > three int columns - PAGES_PER_RANGE should be max 77 > > one int and one timestamp - PAGES_PER_RANGE should be max 121 > > > > and so on > > > > No, because this very much depends on the number if distinct values in > the page page range, which determines how well the bloom filter > compresses. You used 1000, but that's just an arbitrary value and the > actual data might have any other value. And it's unlikely that all three > columns will have the same number of distinct values. > Ok, that makes sense. Still I see a few odd things: """ drop table if exists t1; create table t1(i int, j int); create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ; -- This one will succeed, I guess because it has less different -- values insert into t1 select random()*20, random()*100 from generate_series(1, 1000); -- succeed insert into t1 select random()*20, random()*100 from generate_series(1, 100000); -- succeed insert into t1 select random()*200, random()*1000 from generate_series(1, 1000); -- succeed insert into t1 select random()*200, random()*1000 from generate_series(1, 1000); -- succeed? This is the case it has been causing problems before insert into t1 select random()*1000, random()*1000 from generate_series(1, 1000); """ Maybe this makes sense, but it looks random to me. If it makes sense this is something we should document better. Let's try another combination: """ drop table if exists t1; create table t1(i int, j int); create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ; -- this fails again insert into t1 select random()*1000, random()*1000 from generate_series(1, 1000); -- and this starts to fail now, but this worked before insert into t1 select random()*20, random()*100 from generate_series(1, 1000); """ > Of course, this also depends on the false positive rate. > How the false positive rate work? > FWIW I doubt people are using multi-column BRIN indexes very often. > true. Another question, should we allow to create a brin multi column index that uses different opclasses? CREATE INDEX ON t1 USING brin (i int4_bloom_ops, j int4_minmax_ops); -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
On 4/8/21 4:49 PM, Jaime Casanova wrote: > On Thu, Apr 08, 2021 at 12:18:36PM +0200, Tomas Vondra wrote: >> On 4/8/21 9:08 AM, Jaime Casanova wrote: >>> Hi everyone, >>> >>> When testing brin bloom indexes I noted that we need to reduce the >>> PAGES_PER_RANGE parameter of the index to allow more columns on it. >>> >>> Sadly, this could be a problem if you create the index before the table >>> grows, once it reaches some number of rows (i see the error as early as >>> 1000 rows) it starts error out. >>> >>> create table t1(i int, j int); >>> >>> -- uses default PAGES_PER_RANGE=128 >>> create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ; >>> >>> insert into t1 >>> select random()*1000, random()*1000 from generate_series(1, 1000); >>> ERROR: index row size 8968 exceeds maximum 8152 for index "t1_i_j_idx" >>> >>> if instead you create the index with a minor PAGES_PER_RANGE it goes >>> fine, in this case it works once you reduce it to at least 116 >>> >>> create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) >>> with (pages_per_range=116); >>> >>> >>> so, for having: >>> two int columns - PAGES_PER_RANGE should be max 116 >>> three int columns - PAGES_PER_RANGE should be max 77 >>> one int and one timestamp - PAGES_PER_RANGE should be max 121 >>> >>> and so on >>> >> >> No, because this very much depends on the number if distinct values in >> the page page range, which determines how well the bloom filter >> compresses. You used 1000, but that's just an arbitrary value and the >> actual data might have any other value. And it's unlikely that all three >> columns will have the same number of distinct values. >> > > Ok, that makes sense. Still I see a few odd things: > > """ > drop table if exists t1; > create table t1(i int, j int); > create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ; > > -- This one will succeed, I guess because it has less different > -- values > insert into t1 > select random()*20, random()*100 from generate_series(1, 1000); > > -- succeed > insert into t1 > select random()*20, random()*100 from generate_series(1, 100000); > > -- succeed > insert into t1 > select random()*200, random()*1000 from generate_series(1, 1000); > > -- succeed > insert into t1 > select random()*200, random()*1000 from generate_series(1, 1000); > > -- succeed? This is the case it has been causing problems before > insert into t1 > select random()*1000, random()*1000 from generate_series(1, 1000); > """ > > Maybe this makes sense, but it looks random to me. If it makes sense > this is something we should document better. > Presumably it's about where exactly are the new rows added, and when we summarize the page range. > Let's try another combination: > > """ > drop table if exists t1; > create table t1(i int, j int); > create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ; > > -- this fails again > insert into t1 > select random()*1000, random()*1000 from generate_series(1, 1000); > > -- and this starts to fail now, but this worked before > insert into t1 > select random()*20, random()*100 from generate_series(1, 1000); > """ > >> Of course, this also depends on the false positive rate. >> > > How the false positive rate work? > The lower the false positive rate, the more "accurate" the index is, because fewer page ranges not containing the value will be added to the bitmap. The bloom filter however has to be larger. >> FWIW I doubt people are using multi-column BRIN indexes very often. >> > > true. > > Another question, should we allow to create a brin multi column index > that uses different opclasses? > > CREATE INDEX ON t1 USING brin (i int4_bloom_ops, j int4_minmax_ops); > Why not? Without that you couldn't create index on (int, bigint) because those are in principle different opclasses too. I don't see what would this restriction give us. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company