Re: Performance Optimization for Dummies 2 - the SQL - Mailing list pgsql-performance
From | Carlo Stonebanks |
---|---|
Subject | Re: Performance Optimization for Dummies 2 - the SQL |
Date | |
Msg-id | eg0u6e$hbr$1@news.hub.org Whole thread Raw |
In response to | Performance Optimization for Dummies 2 - the SQL ("Carlo Stonebanks" <stonec.register@sympatico.ca>) |
Responses |
Re: Performance Optimization for Dummies 2 - the SQL
|
List | pgsql-performance |
> can you do explain analyze on the two select queries on either side of > the union separatly? the subquery is correctly written and unlikely > to be a problem (in fact, good style imo). so lets have a look at > both sides of facil query and see where the problem is. Sorry for the delay, the server was down yesterday and couldn't get anything. I have modified the sub-queries a little, trying to get the index scans to fire - all the tables involved here are large enough to benefit from index scans over sequential scans. I am mystified as to why PART 1 is giving me: "Seq Scan on facility_address fa (cost=0.00..3014.68 rows=128268 width=12) (actual time=0.007..99.033 rows=128268 loops=1)" which I assume is for the: "join mdx_core.facility_address as fa on fa.facility_id = f.facility_id" Then again, I am not sure how to read the EXPLAIN ANALYSE performance numbers. The other part of the UNION (PART 2) I have also modified, I think it's working nicely. Let me know if I'm mistaken on thinking that! The one remaining problem is that the UNION of these two sub-queries has a column which is a call to a custom TCL function that does a lexical analysis on the results, ranking the result names by their proximity to the imported name. his definitely eats up the performance and I hope that my decision to call this function on the results of the union (assuming union deletes redundent rows) is the correct one. Thanks! Carlo /* PART 1. The redundant expression "facility_address_id is NULL" was removed because only an OUTER join would have made this meaningful. We use only INNER joins in this sub-query Both facility_address and address have seq scans, even though there is an index for facility_address(facility_id( and an index for address( country_code, postal_code, address). The "like" operator appears to be making things expensive. This is used because we have to take into account that perhaps the import row is using the 5-number US ZIP, not the 9-number USZIP+4 standard (although this is not the case in this sample). /* explain analyse select f.facility_id, fa.facility_address_id, a.address_id, f.facility_type_code, f.name, a.address, a.city, a.state_code, a.postal_code, a.country_code from mdx_core.facility as f join mdx_core.facility_address as fa on fa.facility_id = f.facility_id join mdx_core.address as a on a.address_id = fa.address_id where a.country_code = 'US' and a.state_code = 'IL' and a.postal_code like '60640-5759'||'%' order by facility_id "Sort (cost=6392.50..6392.50 rows=1 width=103) (actual time=189.133..189.139 rows=12 loops=1)" " Sort Key: f.facility_id" " -> Nested Loop (cost=2732.88..6392.49 rows=1 width=103) (actual time=14.006..188.967 rows=12 loops=1)" " -> Hash Join (cost=2732.88..6388.91 rows=1 width=72) (actual time=13.979..188.748 rows=12 loops=1)" " Hash Cond: ("outer".address_id = "inner".address_id)" " -> Seq Scan on facility_address fa (cost=0.00..3014.68 rows=128268 width=12) (actual time=0.004..98.867 rows=128268 loops=1)" " -> Hash (cost=2732.88..2732.88 rows=1 width=64) (actual time=6.430..6.430 rows=3 loops=1)" " -> Bitmap Heap Scan on address a (cost=62.07..2732.88 rows=1 width=64) (actual time=2.459..6.417 rows=3 loops=1)" " Recheck Cond: ((country_code = 'US'::bpchar) AND ((state_code)::text = 'IL'::text))" " Filter: ((postal_code)::text ~~ '60640-5759%'::text)" " -> Bitmap Index Scan on address_country_state_postal_code_address_idx (cost=0.00..62.07 rows=3846 width=0) (actual time=1.813..1.813 rows=3554 loops=1)" " Index Cond: ((country_code = 'US'::bpchar) AND ((state_code)::text = 'IL'::text))" " -> Index Scan using facility_pkey on facility f (cost=0.00..3.56 rows=1 width=35) (actual time=0.012..0.013 rows=1 loops=12)" " Index Cond: ("outer".facility_id = f.facility_id)" "Total runtime: 189.362 ms" /* PART 2 - can you see anything that could work faster? */ explain analyse select f.facility_id, null as facility_address_id, null as address_id, f.facility_type_code, f.name, null as address, f.default_city as city, f.default_state_code as state_code, f.default_postal_code as postal_code, f.default_country_code as country_code from mdx_core.facility as f left outer join mdx_core.facility_address as fa on fa.facility_id = f.facility_id where fa.facility_address_id is null and f.default_country_code = 'US' and f.default_state_code = 'IL' and '60640-5759' like f.default_postal_code||'%' "Nested Loop Left Join (cost=0.00..6042.41 rows=32 width=73) (actual time=14.923..14.923 rows=0 loops=1)" " Filter: ("inner".facility_address_id IS NULL)" " -> Index Scan using facility_country_state_postal_code_idx on facility f (cost=0.00..5914.69 rows=32 width=73) (actual time=10.118..14.773 rows=10 loops=1)" " Index Cond: ((default_country_code = 'US'::bpchar) AND (default_state_code = 'IL'::bpchar))" " Filter: ('60640-5759'::text ~~ ((default_postal_code)::text || '%'::text))" " -> Index Scan using facility_address_facility_idx on facility_address fa (cost=0.00..3.97 rows=2 width=8) (actual time=0.009..0.011 rows=1 loops=10)" " Index Cond: (fa.facility_id = "outer".facility_id)" "Total runtime: 15.034 ms"
pgsql-performance by date: