performance of implicit join vs. explicit conditions on inet queries? - Mailing list pgsql-performance
From | Robert Edmonds |
---|---|
Subject | performance of implicit join vs. explicit conditions on inet queries? |
Date | |
Msg-id | 20051024035809.GA18261@edmonds.ath.cx Whole thread Raw |
List | pgsql-performance |
The preliminaries: - PostgreSQL 8.1 beta 3, Debian experimental - database has been VACUUMed FULL ANALYZE. - a pg_dump -Fc exists at http://199.77.129.48/inet_test.db - ia32 hardware with 2 GB physical memory and the following settings: shared_buffers = 40960 temp_buffers = 16384 work_mem = 131072 maintenance_work_mem = 262144 effective_cache_size = 65536 I've populated a table evenly with about 2 million rows of RFC 1918 addresses: Table "public.inet_addresses" Column | Type | Modifiers --------+------+----------- addr | inet | not null Indexes: "inet_addresses_pkey" PRIMARY KEY, btree (addr) The following query is very fast: EXPLAIN ANALYZE SELECT * FROM inet_addresses WHERE addr << inet('10.2.0.0/24') OR addr << inet('10.4.0.0/24') OR addr << inet('10.8.0.0/24'); Bitmap Heap Scan on inet_addresses (cost=6.51..324.48 rows=1792335 width=11) (actual time=0.350..1.104 rows=381 loops=1) Recheck Cond: ((addr << '10.2.0.0/24'::inet) OR (addr << '10.4.0.0/24'::inet) OR (addr << '10.8.0.0/24'::inet)) Filter: ((addr << '10.2.0.0/24'::inet) OR (addr << '10.4.0.0/24'::inet) OR (addr << '10.8.0.0/24'::inet)) -> BitmapOr (cost=6.51..6.51 rows=85 width=0) (actual time=0.336..0.336 rows=0 loops=1) -> Bitmap Index Scan on inet_addresses_pkey (cost=0.00..2.17 rows=28 width=0) (actual time=0.127..0.127 rows=127loops=1) Index Cond: ((addr > '10.2.0.0/24'::inet) AND (addr <= '10.2.0.255'::inet)) -> Bitmap Index Scan on inet_addresses_pkey (cost=0.00..2.17 rows=28 width=0) (actual time=0.109..0.109 rows=127loops=1) Index Cond: ((addr > '10.4.0.0/24'::inet) AND (addr <= '10.4.0.255'::inet)) -> Bitmap Index Scan on inet_addresses_pkey (cost=0.00..2.17 rows=28 width=0) (actual time=0.096..0.096 rows=127loops=1) Index Cond: ((addr > '10.8.0.0/24'::inet) AND (addr <= '10.8.0.255'::inet)) Total runtime: 1.613 ms Instead of specifying explicit address ranges in the query, I'd like to store the ranges in a table: inet_test_db=# \d inet_ranges Table "public.inet_ranges" Column | Type | Modifiers ----------+---------+----------- range | inet | not null range_id | integer | Indexes: "inet_ranges_pkey" PRIMARY KEY, btree (range) "inet_ranges_range_id_idx" btree (range_id) inet_test_db=# SELECT * FROM inet_ranges; range | range_id --------------+---------- 10.2.0.0/24 | 1 10.4.0.0/24 | 1 10.8.0.0/24 | 1 10.16.0.0/24 | 2 10.32.0.0/24 | 2 10.64.0.0/24 | 2 (6 rows) This query is far slower, even though it generates the same result: EXPLAIN ANALYZE SELECT * FROM inet_addresses as ia, inet_ranges as ir WHERE ia.addr << ir.range AND ir.range_id=1; Nested Loop (cost=0.00..171485.93 rows=3072574 width=26) (actual time=1465.803..16922.979 rows=381 loops=1) Join Filter: ("inner".addr << "outer".range) -> Seq Scan on inet_ranges ir (cost=0.00..1.07 rows=3 width=15) (actual time=0.008..0.021 rows=3 loops=1) Filter: (range_id = 1) -> Seq Scan on inet_addresses ia (cost=0.00..31556.83 rows=2048383 width=11) (actual time=0.003..2919.405 rows=2048383loops=3) Total runtime: 16923.457 ms Even when disabling sequential scans, the query planner is unable to make use of the inet_addresses_pkey index: Nested Loop (cost=100033605.21..100171874.11 rows=3072574 width=26) (actual time=2796.928..23453.585 rows=381 loops=1) Join Filter: ("inner".addr << "outer".range) -> Index Scan using inet_ranges_range_id_idx on inet_ranges ir (cost=0.00..3.04 rows=3 width=15) (actual time=0.069..0.095rows=3 loops=1) Index Cond: (range_id = 1) -> Materialize (cost=100033605.21..100054089.04 rows=2048383 width=11) (actual time=0.016..5133.349 rows=2048383 loops=3) -> Seq Scan on inet_addresses ia (cost=100000000.00..100031556.83 rows=2048383 width=11) (actual time=0.005..2938.012rows=2048383 loops=1) Total runtime: 23521.418 ms Is it possible to attain the speed of the first query and the flexibility of the second? Or will I have to resort to generating queries of the first form with the range table in the application layer? -- Robert Edmonds edmonds42@bellsouth.net
pgsql-performance by date: