Re: maximum columns for brin bloom indexes - Mailing list pgsql-hackers
From | Jaime Casanova |
---|---|
Subject | Re: maximum columns for brin bloom indexes |
Date | |
Msg-id | 20210408144918.GA3086@ahch-to Whole thread Raw |
In response to | Re: maximum columns for brin bloom indexes (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: maximum columns for brin bloom indexes
|
List | pgsql-hackers |
On Thu, Apr 08, 2021 at 12:18:36PM +0200, Tomas Vondra wrote: > On 4/8/21 9:08 AM, Jaime Casanova wrote: > > Hi everyone, > > > > When testing brin bloom indexes I noted that we need to reduce the > > PAGES_PER_RANGE parameter of the index to allow more columns on it. > > > > Sadly, this could be a problem if you create the index before the table > > grows, once it reaches some number of rows (i see the error as early as > > 1000 rows) it starts error out. > > > > create table t1(i int, j int); > > > > -- uses default PAGES_PER_RANGE=128 > > create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ; > > > > insert into t1 > > select random()*1000, random()*1000 from generate_series(1, 1000); > > ERROR: index row size 8968 exceeds maximum 8152 for index "t1_i_j_idx" > > > > if instead you create the index with a minor PAGES_PER_RANGE it goes > > fine, in this case it works once you reduce it to at least 116 > > > > create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) > > with (pages_per_range=116); > > > > > > so, for having: > > two int columns - PAGES_PER_RANGE should be max 116 > > three int columns - PAGES_PER_RANGE should be max 77 > > one int and one timestamp - PAGES_PER_RANGE should be max 121 > > > > and so on > > > > No, because this very much depends on the number if distinct values in > the page page range, which determines how well the bloom filter > compresses. You used 1000, but that's just an arbitrary value and the > actual data might have any other value. And it's unlikely that all three > columns will have the same number of distinct values. > Ok, that makes sense. Still I see a few odd things: """ drop table if exists t1; create table t1(i int, j int); create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ; -- This one will succeed, I guess because it has less different -- values insert into t1 select random()*20, random()*100 from generate_series(1, 1000); -- succeed insert into t1 select random()*20, random()*100 from generate_series(1, 100000); -- succeed insert into t1 select random()*200, random()*1000 from generate_series(1, 1000); -- succeed insert into t1 select random()*200, random()*1000 from generate_series(1, 1000); -- succeed? This is the case it has been causing problems before insert into t1 select random()*1000, random()*1000 from generate_series(1, 1000); """ Maybe this makes sense, but it looks random to me. If it makes sense this is something we should document better. Let's try another combination: """ drop table if exists t1; create table t1(i int, j int); create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ; -- this fails again insert into t1 select random()*1000, random()*1000 from generate_series(1, 1000); -- and this starts to fail now, but this worked before insert into t1 select random()*20, random()*100 from generate_series(1, 1000); """ > Of course, this also depends on the false positive rate. > How the false positive rate work? > FWIW I doubt people are using multi-column BRIN indexes very often. > true. Another question, should we allow to create a brin multi column index that uses different opclasses? CREATE INDEX ON t1 USING brin (i int4_bloom_ops, j int4_minmax_ops); -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
pgsql-hackers by date: