Home > mailing lists

Re: Performance Optimization for Dummies 2 - the SQL - Mailing list pgsql-performance

From	Carlo Stonebanks
Subject	Re: Performance Optimization for Dummies 2 - the SQL
Date	October 4, 2006 22:41:41
Msg-id	eg1dai$1tfa$1@news.hub.org Whole thread Raw
In response to	Performance Optimization for Dummies 2 - the SQL ("Carlo Stonebanks" <stonec.register@sympatico.ca>)
Responses	Re: Performance Optimization for Dummies 2 - the SQL index growth problem
List	pgsql-performance

Tree view

Hi Merlin,

Here are the results. The query returned more rows (65 vs 12) because of the
vague postal_code.

In reality, we would have to modify the postal_code logic to take advantage
of full zip codes when they were avalable, not unconditionally truncate
them.

Carlo

explain analyze select
      f.facility_id,
      fa.facility_address_id,
      a.address_id,
      f.facility_type_code,
      f.name,
      a.address,
      a.city,
      a.state_code,
      a.postal_code,
      a.country_code
   from
      mdx_core.facility as f
   join mdx_core.facility_address as fa
      on fa.facility_id = f.facility_id
   join mdx_core.address as a
      on a.address_id = fa.address_id
   where
      (a.country_code, a.state_code, mdx_core.zip_trunc(a.postal_code)) =
('US', 'IL', mdx_core.zip_trunc('60640-5759'))
   order by facility_id

"Sort  (cost=6474.78..6474.84 rows=25 width=103) (actual
time=217.279..217.311 rows=65 loops=1)"
"  Sort Key: f.facility_id"
"  ->  Nested Loop  (cost=2728.54..6474.20 rows=25 width=103) (actual
time=35.828..217.059 rows=65 loops=1)"
"        ->  Hash Join  (cost=2728.54..6384.81 rows=25 width=72) (actual
time=35.801..216.117 rows=65 loops=1)"
"              Hash Cond: ("outer".address_id = "inner".address_id)"
"              ->  Seq Scan on facility_address fa  (cost=0.00..3014.68
rows=128268 width=12) (actual time=0.007..99.072 rows=128268 loops=1)"
"              ->  Hash  (cost=2728.50..2728.50 rows=19 width=64) (actual
time=33.618..33.618 rows=39 loops=1)"
"                    ->  Bitmap Heap Scan on address a  (cost=48.07..2728.50
rows=19 width=64) (actual time=2.569..33.491 rows=39 loops=1)"
"                          Recheck Cond: ((country_code = 'US'::bpchar) AND
((state_code)::text = 'IL'::text))"
"                          Filter: (mdx_core.zip_trunc(postal_code) =
'60640'::text)"
"                          ->  Bitmap Index Scan on
address_country_state_zip_trunc_idx  (cost=0.00..48.07 rows=3846 width=0)
(actual time=1.783..1.783 rows=3554 loops=1)"
"                                Index Cond: ((country_code = 'US'::bpchar)
AND ((state_code)::text = 'IL'::text))"
"        ->  Index Scan using facility_pkey on facility f  (cost=0.00..3.56
rows=1 width=35) (actual time=0.009..0.010 rows=1 loops=65)"
"              Index Cond: ("outer".facility_id = f.facility_id)"
"Total runtime: 217.520 ms"



""Merlin Moncure"" <mmoncure@gmail.com> wrote in message
news:b42b73150610041407y3554f311u1329c4f3bdc53999@mail.gmail.com...
> On 10/4/06, Carlo Stonebanks <stonec.register@sympatico.ca> wrote:
>> > can you do explain analyze on the two select queries on either side of
>> > the union separatly?  the subquery is correctly written and unlikely
>> > to be a problem (in fact, good style imo).  so lets have a look at
>> > both sides of facil query and see where the problem is.
>>
>> Sorry for the delay, the server was down yesterday and couldn't get
>> anything.
>>
>> I have modified the sub-queries a little, trying to get the index scans
>> to
>> fire - all the tables involved here are large enough to benefit from
>> index
>> scans over sequential scans. I am mystified as to why PART 1 is giving
>> me:
>>
>
>>  "Seq Scan on facility_address fa  (cost=0.00..3014.68 rows=128268
>> width=12)
>> (actual time=0.007..99.033 rows=128268 loops=1)"
>
> not sure on this, lets go back to that.
>
>>    into account that perhaps the import row is using the 5-number US ZIP,
>> not the 9-number USZIP+4
>
>
>>    where
>>       a.country_code = 'US'
>>       and a.state_code = 'IL'
>>       and a.postal_code like '60640-5759'||'%'
>>    order by facility_id
>
> 1. create a small function, sql preferred which truncates the zip code
> to 5 digits or reduces to so called 'fuzzy' matching criteria.  lets
> call it zip_trunc(text) and make it immutable which it is. write this
> in sql, not tcl if possible (trust me).
>
> create index address_idx on address(country_code, state_code,
> zip_trunc(postal_code));
>
> rewrite above where clause as
>
> where (a.country_code, a.state_code, zip_trunc(postal_code)) = ('US',
> 'IL', zip_trunc('60640-5759'));
>
> try it out, then lets see how it goes and then we can take a look at
> any seqscan issues.
>
> merlin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

pgsql-performance by date:

From: Ben
Date: 04 October 2006, 21:42:20
Subject: Re: any hope for my big query?

From: Ben
Date: 05 October 2006, 01:35:27
Subject: pg_trgm indexes giving bad estimations?

Re: Performance Optimization for Dummies 2 - the SQL - Mailing list pgsql-performance

Previous

Next