I have a query that's cauing pgsql choose either a hash or merge join
depending on how I mess with the stats variables, but it won't choose an
nested loop, even though it's the fastest.
The estimate for the nested loop index scans always seems to be way high
on the high end. Note that it's 0-3 in one case and 0-2 in the other,
but the actual time is very low in both cases. Why is this? I haven't
been able to make much of a difference by changing the optimizer
variables.
This is on a solaris machine, if that matters. Tinput_data, locality,
and postal code have 1300, 28000 and 43000 rows, respectively, and
locality and postal code are very narrow tables (full definition below).
usps=# explain analyze SELECT key, pc.locality_id, l.state_code::varchar FROM Tinput_data i, postal_code pc, locality
lWHERE i.zip = pc.postal_code AND l.locality_id = pc.locality_id;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=940.20..1417.94 rows=1380 width=36) (actual time=1727.30..2363.91 rows=1380 loops=1)
Merge Cond: ("outer".locality_id = "inner".locality_id)
-> Index Scan using locality_pkey on locality l (cost=0.00..455.99 rows=27789 width=10) (actual time=0.62..495.39
rows=27632loops=1)
-> Sort (cost=940.20..940.55 rows=1380 width=26) (actual time=1725.53..1726.71 rows=1380 loops=1)
Sort Key: pc.locality_id
-> Merge Join (cost=42.00..933.00 rows=1380 width=26) (actual time=56.27..1684.67 rows=1380 loops=1)
Merge Cond: ("outer".postal_code = "inner".zip)
-> Index Scan using postal_code_postal_code_key on postal_code pc (cost=0.00..869.31 rows=42704
width=13)(actual time=10.05..1396.11 rows=42418 loops=1)
-> Sort (cost=42.00..42.34 rows=1380 width=13) (actual time=39.63..40.97 rows=1380 loops=1)
Sort Key: i.zip
-> Seq Scan on tinput_data i (cost=0.00..34.80 rows=1380 width=13) (actual time=0.02..12.13
rows=1380loops=1)
Total runtime: 2367.50 msec
(12 rows)
usps=# set enable_mergejoin=0;
SET
usps=# set enable_hashjoin=0;
SET
usps=# explain analyze SELECT key, pc.locality_id, l.state_code::varchar FROM Tinput_data i, postal_code pc, locality
lWHERE i.zip = pc.postal_code AND l.locality_id = pc.locality_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..6991.66 rows=1380 width=36) (actual time=0.22..231.00 rows=1380 loops=1)
-> Nested Loop (cost=0.00..4203.23 rows=1380 width=26) (actual time=0.14..132.70 rows=1380 loops=1)
-> Seq Scan on tinput_data i (cost=0.00..34.80 rows=1380 width=13) (actual time=0.02..17.41 rows=1380
loops=1)
-> Index Scan using postal_code_postal_code_key on postal_code pc (cost=0.00..3.01 rows=1 width=13) (actual
time=0.06..0.06rows=1 loops=1380)
Index Cond: ("outer".zip = pc.postal_code)
-> Index Scan using locality_pkey on locality l (cost=0.00..2.01 rows=1 width=10) (actual time=0.05..0.05 rows=1
loops=1380)
Index Cond: (l.locality_id = "outer".locality_id)
Total runtime: 233.60 msec
(8 rows)
Table "pg_temp_1.tinput_data"
Column | Type | Modifiers
------------------+-----------------------+-----------
key | integer | not null
firm | character varying(40) |
address | integer |
address_v | character varying(10) |
odd_even | character(1) |
street_name | character varying(40) |
street_metaphone | character varying(4) |
apartment | integer |
apartment_v | character varying(10) |
apartment_label | character varying(5) |
city | character varying(40) |
city_metaphone | character varying(4) |
state | character varying(40) |
zip | character varying(5) |
Indexes: tinput_data_pkey primary key btree ("key")
usps=# \d postal_code
Table "public.postal_code"
Column | Type | Modifiers
----------------+-----------------------+-------------------------------------------------------------------------
postal_code_id | integer | not null default nextval('public.postal_code_postal_code_id_seq'::text)
postal_code | character varying(10) | not null
locality_id | integer | not null
Indexes: postal_code_pkey primary key btree (postal_code_id),
postal_code_postal_code_key unique btree (postal_code)
usps=# \d locality
Table "public.locality"
Column | Type | Modifiers
-------------+-----------------------+-------------------------------------------------------------------
locality_id | integer | not null default nextval('public.locality_locality_id_seq'::text)
locality | character varying(10) | not null
state_code | character(2) | not null
Indexes: locality_pkey primary key btree (locality_id)
Foreign Key constraints: $1 FOREIGN KEY (state_code) REFERENCES state(state_code) ON UPDATE NO ACTION ON DELETE NO
ACTION
--
Jim C. Nasby (aka Decibel!) jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828
Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"