Re: Performance Optimization for Dummies 2 - the SQL - Mailing list pgsql-performance
From | Carlo Stonebanks |
---|---|
Subject | Re: Performance Optimization for Dummies 2 - the SQL |
Date | |
Msg-id | eg1dai$1tfa$1@news.hub.org Whole thread Raw |
In response to | Performance Optimization for Dummies 2 - the SQL ("Carlo Stonebanks" <stonec.register@sympatico.ca>) |
Responses |
Re: Performance Optimization for Dummies 2 - the SQL
index growth problem |
List | pgsql-performance |
Hi Merlin, Here are the results. The query returned more rows (65 vs 12) because of the vague postal_code. In reality, we would have to modify the postal_code logic to take advantage of full zip codes when they were avalable, not unconditionally truncate them. Carlo explain analyze select f.facility_id, fa.facility_address_id, a.address_id, f.facility_type_code, f.name, a.address, a.city, a.state_code, a.postal_code, a.country_code from mdx_core.facility as f join mdx_core.facility_address as fa on fa.facility_id = f.facility_id join mdx_core.address as a on a.address_id = fa.address_id where (a.country_code, a.state_code, mdx_core.zip_trunc(a.postal_code)) = ('US', 'IL', mdx_core.zip_trunc('60640-5759')) order by facility_id "Sort (cost=6474.78..6474.84 rows=25 width=103) (actual time=217.279..217.311 rows=65 loops=1)" " Sort Key: f.facility_id" " -> Nested Loop (cost=2728.54..6474.20 rows=25 width=103) (actual time=35.828..217.059 rows=65 loops=1)" " -> Hash Join (cost=2728.54..6384.81 rows=25 width=72) (actual time=35.801..216.117 rows=65 loops=1)" " Hash Cond: ("outer".address_id = "inner".address_id)" " -> Seq Scan on facility_address fa (cost=0.00..3014.68 rows=128268 width=12) (actual time=0.007..99.072 rows=128268 loops=1)" " -> Hash (cost=2728.50..2728.50 rows=19 width=64) (actual time=33.618..33.618 rows=39 loops=1)" " -> Bitmap Heap Scan on address a (cost=48.07..2728.50 rows=19 width=64) (actual time=2.569..33.491 rows=39 loops=1)" " Recheck Cond: ((country_code = 'US'::bpchar) AND ((state_code)::text = 'IL'::text))" " Filter: (mdx_core.zip_trunc(postal_code) = '60640'::text)" " -> Bitmap Index Scan on address_country_state_zip_trunc_idx (cost=0.00..48.07 rows=3846 width=0) (actual time=1.783..1.783 rows=3554 loops=1)" " Index Cond: ((country_code = 'US'::bpchar) AND ((state_code)::text = 'IL'::text))" " -> Index Scan using facility_pkey on facility f (cost=0.00..3.56 rows=1 width=35) (actual time=0.009..0.010 rows=1 loops=65)" " Index Cond: ("outer".facility_id = f.facility_id)" "Total runtime: 217.520 ms" ""Merlin Moncure"" <mmoncure@gmail.com> wrote in message news:b42b73150610041407y3554f311u1329c4f3bdc53999@mail.gmail.com... > On 10/4/06, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: >> > can you do explain analyze on the two select queries on either side of >> > the union separatly? the subquery is correctly written and unlikely >> > to be a problem (in fact, good style imo). so lets have a look at >> > both sides of facil query and see where the problem is. >> >> Sorry for the delay, the server was down yesterday and couldn't get >> anything. >> >> I have modified the sub-queries a little, trying to get the index scans >> to >> fire - all the tables involved here are large enough to benefit from >> index >> scans over sequential scans. I am mystified as to why PART 1 is giving >> me: >> > >> "Seq Scan on facility_address fa (cost=0.00..3014.68 rows=128268 >> width=12) >> (actual time=0.007..99.033 rows=128268 loops=1)" > > not sure on this, lets go back to that. > >> into account that perhaps the import row is using the 5-number US ZIP, >> not the 9-number USZIP+4 > > >> where >> a.country_code = 'US' >> and a.state_code = 'IL' >> and a.postal_code like '60640-5759'||'%' >> order by facility_id > > 1. create a small function, sql preferred which truncates the zip code > to 5 digits or reduces to so called 'fuzzy' matching criteria. lets > call it zip_trunc(text) and make it immutable which it is. write this > in sql, not tcl if possible (trust me). > > create index address_idx on address(country_code, state_code, > zip_trunc(postal_code)); > > rewrite above where clause as > > where (a.country_code, a.state_code, zip_trunc(postal_code)) = ('US', > 'IL', zip_trunc('60640-5759')); > > try it out, then lets see how it goes and then we can take a look at > any seqscan issues. > > merlin > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend >
pgsql-performance by date: