Re: Help with tuning this query (Some musings) - Mailing list pgsql-performance

From Ken Egervari
Subject Re: Help with tuning this query (Some musings)
Date
Msg-id 002a01c51f9c$1066d0e0$6a01a8c0@KEN
Whole thread Raw
In response to Help with tuning this query  ("Ken Egervari" <ken@upfactor.com>)
List pgsql-performance
I took John's advice and tried to work with sub-selects.  I tried this
variation, which actually seems like it would make a difference conceptually
since it drives on the person table quickly.  But to my surprise, the query
runs at about 375 milliseconds.  I think it's because it's going over that
shipment table multiple times, which is where the results are coming from.

select s.*, ss.*

from shipment s
 inner join shipment_status ss on s.current_status_id=ss.id
 inner join release_code rc on ss.release_code_id=rc.id
 left outer join driver d on s.driver_id=d.id
 left outer join carrier_code cc on s.carrier_code_id=cc.id
where s.id in (
  select s.id
  from person p
   inner join carrier_to_person ctp on p.id=ctp.person_id
   inner join carrier c on ctp.carrier_id=c.id
   inner join carrier_code cc on cc.carrier_id = c.id
   inner join shipment s on s.carrier_code_id = cc.id
  where p.id = 355
      and s.current_status_id is not null
      and s.is_purged=false
 )
 and(rc.number='9' )
 and(ss.date>=current_date-31 )

order by ss.date desc

*** Musing 1
Also, "s.current_status_id is not null" is an important filter that I forgot
to mention.  In this example where p.id = 355, it filters out 90% of the
rows.  In general, that filter ratio is 0.46 though, which is not quite so
high.  However, this filter gets better over time because more and more
users will use a filter that will make this value null.  It's still not as
strong as person though and probably never will be.  But I thought I'd
mention it nonetheless.

*** Musing 2
I do think that the filter "ss.date>=current_date-31" is slowing this query
down.  I don't think it's the mention of "current_date" or even that it's
dynamic instead of static.  I think the range is just too big.  For example,
if I use:

and ss.date between '2005-02-01 00:00:00' and '2005-02-28 23:59:59'

The query still results in 250 milliseconds.  But if I make the range very
small - say Feb 22nd of 2005:

and ss.date between '2005-02-22 00:00:00' and '2005-02-22 23:59:59'

Now the entire query runs in 47 milliseconds on average.  If I can't make
this query perform any better, should I change the user interface to select
the date instead of showing the last 31 days to benefit from this single-day
filter?  This causes more clicks to select the day (like from a calendar),
but most users probably aren't interested in seeing the entire listing
anyway.  However, it's a very important requirement that users know that
shipment enteries exist in the last 31 days (because they are usually
sure-fire problems if they are still in this query after a few days).

I guess I'm wondering if tuning the query is futile and I should get the
requirements changed, or is there something I can do to really speed it up?

Thanks again,
Ken


pgsql-performance by date:

Previous
From: "Ken Egervari"
Date:
Subject: Re: Help with tuning this query (more musings)
Next
From: Mark Kirkwood
Date:
Subject: Re: Help with tuning this query