Re: Help with tuning this query (more musings) - Mailing list pgsql-performance

From John A Meinel
Subject Re: Help with tuning this query (more musings)
Date
Msg-id 42269AFB.9010700@arbash-meinel.com
Whole thread Raw
In response to Re: Help with tuning this query (more musings)  ("Ken Egervari" <ken@upfactor.com>)
List pgsql-performance
Ken Egervari wrote:

>> I took John's advice and tried to work with sub-selects.  I tried
>> this variation, which actually seems like it would make a difference
>> conceptually since it drives on the person table quickly.  But to my
>> surprise, the query runs at about 375 milliseconds.  I think it's
>> because it's going over that shipment table multiple times, which is
>> where the results are coming from.
>
>
> I also made a version that runs over shipment a single time, but it's
> exactly 250 milliseconds.  I guess the planner does the exact same thing.
>
Why are you now left joining driver and carrier code, but inner joining
shipment_status? I assume this is the *real* query that you are executing.

 From the earlier explain analyze, and your statements, the initial
person p should be the heavily selective portion.

And what does "driver" get you? It isn't in the return, and it isn't
part of a selectivity clause.
You are also double joining against carrier code, once as a left outer
join, and once in the inner join.

This query doesn't seem quite right. Are you sure it is generating the
rows you are expecting?

> select s.*, ss.*
>
> from shipment s
> inner join shipment_status ss on s.current_status_id=ss.id
> inner join release_code rc on ss.release_code_id=rc.id
> left outer join driver d on s.driver_id=d.id
> left outer join carrier_code cc on s.carrier_code_id=cc.id
> where s.carrier_code_id in (
>  select cc.id
>  from person p
>   inner join carrier_to_person ctp on p.id=ctp.person_id
>   inner join carrier c on ctp.carrier_id=c.id
>   inner join carrier_code cc on cc.carrier_id = c.id
>  where p.id = 355
> )
> and s.current_status_id is not null
> and s.is_purged=false
> and(rc.number='9' )
> and(ss.date>=current_date-31 )
>
> order by ss.date desc

You might want to post the explain analyze of this query to have a point
of reference, but what about something like this:
select s.*, ss.*

from shipment_status ss on s.current_status_id=ss.id
join (select s.* from shipment s
       where s.carrier_code_id in
            (select cc.id
               from person p
              inner join carrier_to_person ctp on p.id=ctp.person_id
              inner join carrier c on ctp.carrier_id=c.id
              inner join carrier_code cc on cc.carrier_id = c.id
             where p.id = 355
            )
         and s.current_status_id is not null
         and s.is_purged=false
) as i -- Just a name for the subselect since it is in a join
inner join release_code rc on ss.release_code_id=rc.id
where (rc.number='9' )
and(ss.date between current_date-31 and current_date())

order by ss.date desc

My idea with this query is to minimize the number of shipment rows that
need to be generated before joining with the other rows. My syntax is
probably a little bit off, since I can't actually run it against real
tables.
But looking at your *original* query, you were getting 15000 rows out of
shipment_status, and then 27700 rows out of shipment, which was then
being merge-joined down to only 300 rows, and then hash-joined down to 39.

I'm just trying to think of ways to prevent it from blossoming into 27k
rows to start with.

Please double check your query, because it seems to be grabbing
unnecessary rows with the left joins, and then post another explain
analyze with one (or several) different subselect forms.

John
=:->


Attachment

pgsql-performance by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: Help with tuning this query
Next
From: Josh Berkus
Date:
Subject: Re: Performance tradeoff