Re: performance of implicit join vs. explicit conditions on inet queries? - Mailing list pgsql-performance
From | Qingqing Zhou |
---|---|
Subject | Re: performance of implicit join vs. explicit conditions on inet queries? |
Date | |
Msg-id | dk4p9d$30q4$1@news.hub.org Whole thread Raw |
In response to | performance of implicit join vs. explicit conditions on inet queries? (Robert Edmonds <edmonds42@bellsouth.net>) |
Responses |
Re: performance of implicit join vs. explicit conditions on inet queries?
|
List | pgsql-performance |
"Robert Edmonds" <edmonds42@bellsouth.net> wrote > > EXPLAIN ANALYZE > SELECT * > FROM inet_addresses > WHERE addr << inet('10.2.0.0/24') > OR addr << inet('10.4.0.0/24') > OR addr << inet('10.8.0.0/24'); > > Bitmap Heap Scan on inet_addresses (cost=6.51..324.48 rows=1792335 > width=11) (actual time=0.350..1.104 rows=381 loops=1) > Recheck Cond: ((addr << '10.2.0.0/24'::inet) OR (addr << > '10.4.0.0/24'::inet) OR (addr << '10.8.0.0/24'::inet)) > Filter: ((addr << '10.2.0.0/24'::inet) OR (addr << '10.4.0.0/24'::inet) > OR (addr << '10.8.0.0/24'::inet)) > -> BitmapOr (cost=6.51..6.51 rows=85 width=0) (actual > time=0.336..0.336 rows=0 loops=1) > -> Bitmap Index Scan on inet_addresses_pkey (cost=0.00..2.17 > rows=28 width=0) (actual time=0.127..0.127 rows=127 loops=1) > Index Cond: ((addr > '10.2.0.0/24'::inet) AND (addr <= > '10.2.0.255'::inet)) > -> Bitmap Index Scan on inet_addresses_pkey (cost=0.00..2.17 > rows=28 width=0) (actual time=0.109..0.109 rows=127 loops=1) > Index Cond: ((addr > '10.4.0.0/24'::inet) AND (addr <= > '10.4.0.255'::inet)) > -> Bitmap Index Scan on inet_addresses_pkey (cost=0.00..2.17 > rows=28 width=0) (actual time=0.096..0.096 rows=127 loops=1) > Index Cond: ((addr > '10.8.0.0/24'::inet) AND (addr <= > '10.8.0.255'::inet)) > Total runtime: 1.613 ms > > > Instead of specifying explicit address ranges in the query, I'd like > to store the ranges in a table: > > > inet_test_db=# \d inet_ranges > Table "public.inet_ranges" > Column | Type | Modifiers > ----------+---------+----------- > range | inet | not null > range_id | integer | > Indexes: > "inet_ranges_pkey" PRIMARY KEY, btree (range) > "inet_ranges_range_id_idx" btree (range_id) > > inet_test_db=# SELECT * FROM inet_ranges; > range | range_id > --------------+---------- > 10.2.0.0/24 | 1 > 10.4.0.0/24 | 1 > 10.8.0.0/24 | 1 > 10.16.0.0/24 | 2 > 10.32.0.0/24 | 2 > 10.64.0.0/24 | 2 > (6 rows) > > > > EXPLAIN ANALYZE > SELECT * > FROM inet_addresses as ia, inet_ranges as ir > WHERE ia.addr << ir.range > AND ir.range_id=1; > > Nested Loop (cost=0.00..171485.93 rows=3072574 width=26) (actual > time=1465.803..16922.979 rows=381 loops=1) > Join Filter: ("inner".addr << "outer".range) > -> Seq Scan on inet_ranges ir (cost=0.00..1.07 rows=3 width=15) > (actual time=0.008..0.021 rows=3 loops=1) > Filter: (range_id = 1) > -> Seq Scan on inet_addresses ia (cost=0.00..31556.83 rows=2048383 > width=11) (actual time=0.003..2919.405 rows=2048383 loops=3) > Total runtime: 16923.457 ms > Good illustration. I guess we have a problem of the historgram statistical information. That is, the historgrams we used can effectively record the linear space ranges(like ordinary <, >, =), but failed to do it for nonlinear ranges like inet data type. So the Nested Loop node make an error in estmating number of rows (est: 3072574, real: 381), thus a sequential scan is obviously better under this estimation. I am thinking the historgram problem is not easy to fix, but is there a way to change Inet type a little bit to make it linear for your range operators? (for example, align the length to 000.000.000.000/00?) Regards, Qingqing
pgsql-performance by date: