JOIN performance - Mailing list pgsql-sql
From | Dean Gibson (DB Administrator) |
---|---|
Subject | JOIN performance |
Date | |
Msg-id | 5.1.0.14.2.20040920121606.00a89c68@imaps.mailpen.net Whole thread Raw |
In response to | Re: COUNT(*) to find records which have a certain number of (Greg Stark <gsstark@mit.edu>) |
Responses |
Re: JOIN performance
|
List | pgsql-sql |
I have a view that when used, is slow: CREATE VIEW "Data".genlic_a4avail AS SELECT genlic_a4.*, last_action_date, end_date, canceled FROM genlic_a4 LEFT JOIN lic_hd USING( sys_id) WHERE status != 'A'; Here is the EXPLAIN output: Merge Join (cost=155360.47..159965.70 rows=13063 width=75) Merge Cond: ("outer".sys_id = "inner".sys_id) -> Sort (cost=3912.51..3916.48 rows=1589 width=62) Sort Key: "_GenLicGroupA4".sys_id -> Nested Loop (cost=0.00..3828.04rows=1589 width=62) Join Filter: ("outer".callsign ~ ("inner".pattern)::text) -> Seq Scan on "_GenLicGroupA4" (cost=0.00..1667.40 rows=1589 width=21) Filter: ((status <> 'R'::bpchar) AND (status <> 'A'::bpchar) AND (geo_region = 12)) -> Seq Scan on "_GeoRestrict" (cost=0.00..1.16 rows=16 width=41) -> Sort (cost=262032.96..264249.96 rows=886799 width=72) Sort Key: lic_hd.sys_id -> SubqueryScan lic_hd (cost=0.00..24529.99 rows=886799 width=72) -> Seq Scan on "_LicHD" (cost=0.00..24529.99rows=886799 width=72) If I change the view to this: CREATE VIEW "Data".genlic_a4avail AS SELECT genlic_a4.*, (SELECT last_action_dateFROM lic_hd WHERE sys_id = genlic_a4.sys_id LIMIT 1) AS last_action_date, (SELECT end_date FROM lic_hd WHERE sys_id= genlic_a4.sys_id LIMIT 1) AS end_date, (SELECT canceled FROM lic_hd WHERE sys_id = genlic_a4.sys_id LIMIT 1) AS canceled FROM genlic_a4 WHERE status != 'A'; Then the performance is MUCH better: Subquery Scan genlic_a4avail (cost=0.00..3828.04 rows=1589 width=62) -> Nested Loop (cost=0.00..3828.04 rows=1589 width=62) Join Filter: ("outer".callsign ~ ("inner".pattern)::text) -> Seq Scan on "_GenLicGroupA4" (cost=0.00..1667.40rows=1589 width=21) Filter: ((status <> 'R'::bpchar) AND (status <> 'A'::bpchar) AND (geo_region = 12)) -> Seq Scan on "_GeoRestrict" (cost=0.00..1.16 rows=16 width=41) SubPlan -> Limit (cost=0.00..3.01 rows=1 width=4) -> Index Scan using "_LicHD_pkey" on "_LicHD" (cost=0.00..3.01 rows=1 width=4) Index Cond: (unique_system_identifier = $0) -> Limit (cost=0.00..3.01 rows=1 width=8) -> Index Scan using "_LicHD_pkey" on "_LicHD" (cost=0.00..3.01 rows=1 width=8) Index Cond: (unique_system_identifier = $0) -> Limit (cost=0.00..3.01 rows=1 width=8) -> Index Scan using "_LicHD_pkey" on "_LicHD" (cost=0.00..3.01 rows=1 width=8) Index Cond: (unique_system_identifier = $0) Note that genlic_a4 is small (4519), and lic_hd is large (886799), and lic_hd has sys_id as its PRIMARY KEY. Is there a better way to write the LEFT JOIN so as to achieve the performance of the second VIEW without the clumsiness of the three (SELECT ... LIMIT 1) ???