Re: SP-GiST support for inet datatypes - Mailing list pgsql-hackers
From | Emre Hasegeli |
---|---|
Subject | Re: SP-GiST support for inet datatypes |
Date | |
Msg-id | CAE2gYzz264-9s3coGAc1Rv8bkN4vAoEALuY0rBMx+NJxZX=+2A@mail.gmail.com Whole thread Raw |
In response to | Re: SP-GiST support for inet datatypes (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
> Aside from the disturbing frequency of 100-to-1 split ratios, it also > looks like the inclusion of the masklen bit is hardly pulling its weight, > though that might be a artifact of this data set. I was expecting this. Including masklen bit to decision was something we could easily do. It doesn't make the index more complicated, even more simpler. I think it would be useful, when host addresses with masklen are indexed. IRRExplorer dataset is just networks. > I think that it'd be worth investigating changing to a split rule that > uses the next two or three or four bits of the address, not just the > single next bit, to index into more than four subnodes. It's pretty clear > how to adjust the insertion functions to support that, and a crude hack in > that line suggested that the index does get noticeably smaller. However, > I didn't quite grok how to adjust the search (consistent) functions. > Obviously we could apply the same rules in a loop, considering each > successive address bit, but there may be a better way. I have experimented with such designs before posting this patch, but couldn't make any of them work. It gets very complicated when more than one bit is used. When only 2 bits are used, there would be 12 child nodes: * network bits 00 * network bits 01 * network bits 10 * network bits 11 * network bit 0 and host bit 0 * network bit 0 and host bit 1 * network bit 1 and host bit 0 * network bit 1 and host bit 1 * host bits 00 * host bits 01 * host bits 10 * host bits 11 Another design would be a prefix tree. We could split by using a byte, and store that byte as label. Then there wouldn't be static number of nodes. We would need to iterate trough the labels and re-execute the condition on all of them. Surely this would perform better for some working sets, but I don't think on all them. I will experiment with this, if I get the chance. I belive the current index is useful as it is. The wasted space depends on the distribution of the keys: > # create table a3 as select (trunc(random() * 256)::text || '.' || trunc(random() * 8)::text || '.0.0')::inet from generate_series(1,1000000) as i; > SELECT 1000000 > # create table a4 as select (trunc(random() * 256)::text || '.' || trunc(random() * 16)::text || '.0.0')::inet from generate_series(1,1000000) as i; > SELECT 1000000 > # create table a5 as select (trunc(random() * 256)::text || '.' || trunc(random() * 32)::text || '.0.0')::inet from generate_series(1,1000000) as i; > SELECT 1000000 > # create table a6 as select (trunc(random() * 256)::text || '.' || trunc(random() * 64)::text || '.0.0')::inet from generate_series(1,1000000) as i; > SELECT 1000000 > # create index on a3 using spgist (inet); > CREATE INDEX > # create index on a4 using spgist (inet); > CREATE INDEX > # create index on a5 using spgist (inet); > CREATE INDEX > # create index on a6 using spgist (inet); > CREATE INDEX > # \di+ > List of relations > Schema | Name | Type | Owner | Table | Size | Description > -------+-------------+-------+----------+-------+-------+------------- > public | a3_inet_idx | index | hasegeli | a3 | 42 MB | > public | a4_inet_idx | index | hasegeli | a4 | 46 MB | > public | a5_inet_idx | index | hasegeli | a5 | 55 MB | > public | a6_inet_idx | index | hasegeli | a6 | 56 MB | > (4 rows) It also gets better with the number of keys: > # create table a6 as select (trunc(random() * 256)::text || '.' || trunc(random() * 64)::text || '.0.0')::inet from generate_series(1,1000000) as i; > SELECT 1000000 > # create table b6 as select (trunc(random() * 256)::text || '.' || trunc(random() * 64)::text || '.0.0')::inet from generate_series(1,2000000) as i; > SELECT 2000000 > # create table c6 as select (trunc(random() * 256)::text || '.' || trunc(random() * 64)::text || '.0.0')::inet from generate_series(1,4000000) as i; > SELECT 4000000 > # create table d6 as select (trunc(random() * 256)::text || '.' || trunc(random() * 64)::text || '.0.0')::inet from generate_series(1,8000000) as i; > SELECT 8000000 > # create index on a6 using spgist (inet); > CREATE INDEX > # create index on b6 using spgist (inet); > CREATE INDEX > # create index on c6 using spgist (inet); > CREATE INDEX > # create index on d6 using spgist (inet); > CREATE INDEX > # \di+ > List of relations > Schema | Name | Type | Owner | Table | Size | Description > -------+-------------+-------+----------+-------+--------+------------- > public | a6_inet_idx | index | hasegeli | a6 | 56 MB | > public | b6_inet_idx | index | hasegeli | b6 | 109 MB | > public | c6_inet_idx | index | hasegeli | c6 | 181 MB | > public | d6_inet_idx | index | hasegeli | a6 | 336 MB | > (4 rows) Thank you for committing this.
pgsql-hackers by date: