JOIN performance - Mailing list pgsql-sql

From Dean Gibson (DB Administrator)
Subject JOIN performance
Date
Msg-id 5.1.0.14.2.20040920121606.00a89c68@imaps.mailpen.net
Whole thread Raw
In response to Re: COUNT(*) to find records which have a certain number of  (Greg Stark <gsstark@mit.edu>)
Responses Re: JOIN performance  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-sql
I have a view that when used, is slow:

CREATE  VIEW     "Data".genlic_a4avail  AS        SELECT                genlic_a4.*,                last_action_date,
            end_date,                canceled            FROM    genlic_a4                LEFT JOIN lic_hd  USING(
sys_id)            WHERE       status != 'A';
 

Here is the EXPLAIN output:
 Merge Join  (cost=155360.47..159965.70 rows=13063 width=75)   Merge Cond: ("outer".sys_id = "inner".sys_id)   ->  Sort
(cost=3912.51..3916.48 rows=1589 width=62)         Sort Key: "_GenLicGroupA4".sys_id         ->  Nested Loop
(cost=0.00..3828.04rows=1589 width=62)               Join Filter: ("outer".callsign ~ ("inner".pattern)::text)
    ->  Seq Scan on "_GenLicGroupA4"  (cost=0.00..1667.40 
 
rows=1589 width=21)                     Filter: ((status <> 'R'::bpchar) AND (status <> 
'A'::bpchar) AND (geo_region = 12))               ->  Seq Scan on "_GeoRestrict"  (cost=0.00..1.16 rows=16 
width=41)   ->  Sort  (cost=262032.96..264249.96 rows=886799 width=72)         Sort Key: lic_hd.sys_id         ->
SubqueryScan lic_hd  (cost=0.00..24529.99 rows=886799 width=72)               ->  Seq Scan on "_LicHD"
(cost=0.00..24529.99rows=886799 
 
width=72)

If I change the view to this:

CREATE  VIEW     "Data".genlic_a4avail  AS        SELECT                genlic_a4.*,                (SELECT
last_action_dateFROM lic_hd WHERE sys_id = 
 
genlic_a4.sys_id LIMIT 1) AS                last_action_date,                (SELECT end_date         FROM lic_hd WHERE
sys_id= 
 
genlic_a4.sys_id LIMIT 1) AS                end_date,                (SELECT canceled         FROM lic_hd WHERE sys_id
=
 
genlic_a4.sys_id LIMIT 1) AS                canceled            FROM    genlic_a4            WHERE       status !=
'A';

Then the performance is MUCH better:
 Subquery Scan genlic_a4avail  (cost=0.00..3828.04 rows=1589 width=62)   ->  Nested Loop  (cost=0.00..3828.04 rows=1589
width=62)        Join Filter: ("outer".callsign ~ ("inner".pattern)::text)         ->  Seq Scan on "_GenLicGroupA4"
(cost=0.00..1667.40rows=1589 
 
width=21)               Filter: ((status <> 'R'::bpchar) AND (status <> 
'A'::bpchar) AND (geo_region = 12))         ->  Seq Scan on "_GeoRestrict"  (cost=0.00..1.16 rows=16 width=41)
SubPlan          ->  Limit  (cost=0.00..3.01 rows=1 width=4)                 ->  Index Scan using "_LicHD_pkey" on 
 
"_LicHD"  (cost=0.00..3.01 rows=1 width=4)                       Index Cond: (unique_system_identifier = $0)
-> Limit  (cost=0.00..3.01 rows=1 width=8)                 ->  Index Scan using "_LicHD_pkey" on 
 
"_LicHD"  (cost=0.00..3.01 rows=1 width=8)                       Index Cond: (unique_system_identifier = $0)
-> Limit  (cost=0.00..3.01 rows=1 width=8)                 ->  Index Scan using "_LicHD_pkey" on 
 
"_LicHD"  (cost=0.00..3.01 rows=1 width=8)                       Index Cond: (unique_system_identifier = $0)

Note that genlic_a4 is small (4519), and lic_hd is large (886799), and 
lic_hd has sys_id as its PRIMARY KEY.

Is there a better way to write the LEFT JOIN so as to achieve the 
performance of the second VIEW without the clumsiness of the three (SELECT 
... LIMIT 1) ???



pgsql-sql by date:

Previous
From: Greg Stark
Date:
Subject: Re: COUNT(*) to find records which have a certain number of
Next
From: T E Schmitz
Date:
Subject: Re: COUNT(*) to find records which have a certain number of