Performance Optimization for Dummies 2 - the SQL - Mailing list pgsql-performance

From Carlo Stonebanks
Subject Performance Optimization for Dummies 2 - the SQL
Date
Msg-id eft782$5a1$1@news.hub.org
Whole thread Raw
Responses Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure" <mmoncure@gmail.com>)
List pgsql-performance
Some very helpful people had asked that I post the troublesome code that was
generated by my import program.

I installed a SQL log feature in my import program. I have
posted samples of the SQL statements that cause the biggest delays.

Thanks for all of your help.

Carlo

----------
Sample 1:
This one is very expensive on my system.
----------
select
f.facility_id,
provider_practice_id
from
mdx_core.provider_practice as pp
join mdx_core.facility as f
on f.facility_id = pp.facility_id
join mdx_core.facility_address as fa
on fa.facility_id = pp.facility_id
join mdx_core.address as a
on a.address_id = fa.address_id
where
pp.provider_id = 1411311
and f.facility_type_code != 'P'
and (
pp.facility_address_id is not null
and a.state_code = 'NY'
and '10001-2382' = a.postal_code||'%'
and a.city = 'New York'
) or (
f.default_state_code = 'NY'
and '10001-2382' like f.default_postal_code||'%'
and f.default_city = 'New York'
)
limit 1

Limit  (cost=3899.18..32935.21 rows=1 width=8)
  ->  Hash Join  (cost=3899.18..91007.27 rows=3 width=8)
        Hash Cond: ("outer".address_id = "inner".address_id)
        Join Filter: ((("outer".provider_id = 1411311) AND
("outer".facility_type_code <> 'P'::bpchar) AND ("outer".facility_address_id
IS NOT NULL) AND (("inner".state_code)::text = 'NY'::text) AND
('10001-2382'::text = (("inner".postal_code)::text || '%'::text)) AND
(("inner".city)::text = 'New York'::text)) OR (("outer".default_state_code =
'NY'::bpchar) AND ('10001-2382'::text ~~
(("outer".default_postal_code)::text || '%'::text)) AND
(("outer".default_city)::text = 'New York'::text)))
        ->  Merge Join  (cost=0.00..50589.20 rows=695598 width=57)
              Merge Cond: ("outer".facility_id = "inner".facility_id)
              ->  Merge Join  (cost=0.00..16873.90 rows=128268 width=49)
                    Merge Cond: ("outer".facility_id = "inner".facility_id)
                    ->  Index Scan using facility_pkey on facility f
(cost=0.00..13590.18 rows=162525 width=41)
                    ->  Index Scan using facility_address_facility_idx on
facility_address fa  (cost=0.00..4254.46 rows=128268 width=8)
              ->  Index Scan using provider_practice_facility_idx on
provider_practice pp  (cost=0.00..28718.27 rows=452129 width=16)
        ->  Hash  (cost=3650.54..3650.54 rows=99454 width=36)
              ->  Seq Scan on address a  (cost=0.00..3650.54 rows=99454
width=36)

----------
Sample 2:
This one includes a call to a custom function which performs lexical
comparisons
and returns a rating on the likelihood that the company names refer to the
same
facility. Replacing the code:
    mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3 Inc', name) as
comp
with
  1 as comp
-- to avoid the function call only shaved a fragment off the execution time,
which leads me to believe my problem is in the SQL structure itself.

----------
select
mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3 Inc', name) as
comp,
facil.*
from (
select
f.facility_id,
fa.facility_address_id,
a.address_id,
f.facility_type_code,
f.name,
a.address,
a.city,
a.state_code,
a.postal_code,
a.country_code
from
mdx_core.facility as f
join mdx_core.facility_address as fa
on fa.facility_id = f.facility_id
join mdx_core.address as a
on a.address_id = fa.address_id
where
facility_address_id is not null
and a.country_code = 'US'
and a.state_code = 'IL'
and '60640-5759' like a.postal_code||'%'
union select
f.facility_id,
null as facility_address_id,
null as address_id,
f.facility_type_code,
f.name,
null as address,
f.default_city as city,
f.default_state_code as state_code,
f.default_postal_code as postal_code,
f.default_country_code as country_code
from
mdx_core.facility as f
left outer join mdx_core.facility_address as fa
on fa.facility_id = f.facility_id
where
facility_address_id is null
and f.default_country_code = 'US'
and '60640-5759' like f.default_postal_code||'%'
) as facil
order by comp

Sort  (cost=20595.92..20598.01 rows=834 width=236)
  Sort Key: mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3
Inc'::text, (name)::text)
  ->  Subquery Scan facil  (cost=20522.10..20555.46 rows=834 width=236)
        ->  Unique  (cost=20522.10..20545.03 rows=834 width=103)
              ->  Sort  (cost=20522.10..20524.18 rows=834 width=103)
                    Sort Key: facility_id, facility_address_id, address_id,
facility_type_code, name, address, city, state_code, postal_code,
country_code
                    ->  Append  (cost=4645.12..20481.63 rows=834 width=103)
                          ->  Nested Loop  (cost=4645.12..8381.36 rows=21
width=103)
                                ->  Hash Join  (cost=4645.12..8301.35
rows=21 width=72)
                                      Hash Cond: ("outer".address_id =
"inner".address_id)
                                      ->  Seq Scan on facility_address fa
(cost=0.00..3014.68 rows=128268 width=12)
                                            Filter: (facility_address_id IS
NOT NULL)
                                      ->  Hash  (cost=4645.08..4645.08
rows=16 width=64)
                                            ->  Seq Scan on address a
(cost=0.00..4645.08 rows=16 width=64)
                                                  Filter: ((country_code =
'US'::bpchar) AND ((state_code)::text = 'IL'::text) AND ('60640-5759'::text
~~ ((postal_code)::text || '%'::text)))
                                ->  Index Scan using facility_pkey on
facility f  (cost=0.00..3.80 rows=1 width=35)
                                      Index Cond: ("outer".facility_id =
f.facility_id)
                          ->  Subquery Scan "*SELECT* 2"
(cost=0.00..12100.07 rows=813 width=73)
                                ->  Nested Loop Left Join
(cost=0.00..12091.94 rows=813 width=73)
                                      Filter: ("inner".facility_address_id
IS NULL)
                                      ->  Seq Scan on facility f
(cost=0.00..8829.19 rows=813 width=73)
                                            Filter: ((default_country_code =
'US'::bpchar) AND ('60640-5759'::text ~~ ((default_postal_code)::text ||
'%'::text)))
                                      ->  Index Scan using
facility_address_facility_idx on facility_address fa  (cost=0.00..3.99
rows=2 width=8)
                                            Index Cond: (fa.facility_id =
"outer".facility_id)



pgsql-performance by date:

Previous
From: Markus Schaber
Date:
Subject: Re: Performace Optimization for Dummies
Next
From: "Carlo Stonebanks"
Date:
Subject: Re: Performace Optimization for Dummies