Re: JOIN performance - Mailing list pgsql-sql

From Dean Gibson (DB Administrator)
Subject Re: JOIN performance
Date
Msg-id 5.1.0.14.2.20040920164348.00a99868@imaps.mailpen.net
Whole thread Raw
In response to Re: JOIN performance  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: JOIN performance  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-sql
Tom Lane wrote on 2004-09-20 16:06:
>"Dean Gibson (DB Administrator)" <postgresql3@ultimeth.com> writes:
> > I have a view that when used, is slow:
>
>... If you want useful help you need to be more complete.

I use views to "hide" tables so that I can populate new tables and then 
atomically switch to them with "CREATE OR REPLACE ...".  Here is the same 
data with the raw tables:

=> explain select * from "20040920_070010"."_GenLicGroupA4" AS x LEFT JOIN 
"20040919_070713"."_LicHD" AS y ON x.sys_id = 
y.unique_system_identifier;                                           QUERY 
PLAN
------------------------------------------------------------------------------------------------- Merge Join
(cost=5235.14..35123.51rows=43680 width=365)   Merge Cond: ("outer".unique_system_identifier = "inner".sys_id)   ->
IndexScan using "_LicHD_pkey" on "_LicHD" y  (cost=0.00..27361.79 
 
rows=886799 width=344)   ->  Sort  (cost=5235.14..5344.34 rows=43680 width=21)         Sort Key: x.sys_id         ->
SeqScan on "_GenLicGroupA4" x  (cost=0.00..1339.80 
 
rows=43680 width=21)

Using first level views, as mentioned above, the results are the same:

=> explain select * from "Base"."GenLicGroupA4" AS x LEFT JOIN 
"Base"."LicHD" AS y ON x.sys_id = 
y.unique_system_identifier;                                          QUERY 
PLAN
----------------------------------------------------------------------------------------------- Merge Join
(cost=5235.14..35123.51rows=43680 width=365)   Merge Cond: ("outer".unique_system_identifier = "inner".sys_id)   ->
IndexScan using "_LicHD_pkey" on "_LicHD"  (cost=0.00..27361.79 
 
rows=886799 width=344)   ->  Sort  (cost=5235.14..5344.34 rows=43680 width=21)         Sort Key:
"_GenLicGroupA4".sys_id        ->  Seq Scan on "_GenLicGroupA4"  (cost=0.00..1339.80 rows=43680 
 
width=21)

However, when I introduce a second-level view for the second table of:

CREATE  VIEW     "Data".lic_hd  AS        SELECT                unique_system_identifier                        AS
sys_id,               callsign                                        AS callsign,                uls_file_number
                         AS 
 
uls_file_num,                applicant_type_code                             AS 
applicant_type,                radio_service_code                              AS 
radio_service,                license_status                                  AS 
license_status,                grant_date                                      AS grant_date,
effective_date                                 AS 
 
effective_date,                cancellation_date                               AS 
cancel_date,                expired_date                                    AS 
expire_date,                last_action_date                                AS 
last_action_date,                CASE WHEN cancellation_date < expired_date                     THEN cancellation_date
                  ELSE expired_date                END                                             AS end_date,
      cancellation_date < expired_date                AS canceled            FROM    "Base"."LicHD";
 

And then change the query to use it, I get:

=> explain select * from "Base"."GenLicGroupA4" AS x LEFT JOIN lic_hd AS y 
ON x.sys_id = y.sys_id;                                     QUERY 
PLAN
------------------------------------------------------------------------------------ Merge Join
(cost=280258.11..289399.92rows=359154 width=98)   Merge Cond: ("outer".sys_id = "inner".sys_id)   ->  Sort
(cost=5235.14..5344.34rows=43680 width=21)         Sort Key: "_GenLicGroupA4".sys_id         ->  Seq Scan on
"_GenLicGroupA4" (cost=0.00..1339.80 rows=43680 
 
width=21)   ->  Sort  (cost=262032.96..264249.96 rows=886799 width=72)         Sort Key: y.sys_id         ->  Subquery
Scany  (cost=0.00..24529.99 rows=886799 width=72)               ->  Seq Scan on "_LicHD"  (cost=0.00..24529.99
rows=886799
 
width=72)

Note that the scan on _LicHD is now sequential.  If I change the above view 
to remove the last two columns, I get:
                                     QUERY 
PLAN
------------------------------------------------------------------------------------ Merge Join
(cost=5235.14..35123.51rows=43680 width=93)   Merge Cond: ("outer".unique_system_identifier = "inner".sys_id)   ->
IndexScan using "_LicHD_pkey" on "_LicHD"  (cost=0.00..27361.79 
 
rows=886799 width=72)   ->  Sort  (cost=5235.14..5344.34 rows=43680 width=21)         Sort Key: x.sys_id         ->
SeqScan on "_GenLicGroupA4" x  (cost=0.00..1339.80 
 
rows=43680 width=21)

Which is back to my original (good) performance.

Question:  why do the last two column definitions in the second VIEW change 
the scan on _LicHD from indexed to sequential ??

-- Dean



pgsql-sql by date:

Previous
From: Tom Lane
Date:
Subject: Re: JOIN performance
Next
From: Tom Lane
Date:
Subject: Re: JOIN performance