Re: Help with tuning this query - Mailing list pgsql-performance

From Ken Egervari
Subject Re: Help with tuning this query
Date
Msg-id 000901c51f55$aa4b7400$6a01a8c0@KEN
Whole thread Raw
In response to Help with tuning this query  ("Ken Egervari" <ken@upfactor.com>)
Responses Re: Help with tuning this query
List pgsql-performance
>> select s.*
>> from shipment s
>>     inner join carrier_code cc on s.carrier_code_id = cc.id
>>     inner join carrier c on cc.carrier_id = c.id
>>     inner join carrier_to_person ctp on ctp.carrier_id = c.id
>>     inner join person p on p.id = ctp.person_id
>>     inner join shipment_status cs on s.current_status_id = cs.id
>>     inner join release_code rc on cs.release_code_id = rc.id
>>     left join shipment_status ss on ss.shipment_id = s.id
>> where
>>     p.id = :personId and
>>     s.is_purged = false and
>>     rc.number = '9' and
>>     cs is not null and
>>     cs.date >= current_date - 31
>> order by cs.date desc
>> ...
>> shipment contains 40,000 rows
>> shipment_status contains 80,000 rows
>
> I may be missing something, but it looks like the second join
> on shipment_status (the left join) is not adding anything to your
> results, except more work. ss is not used for output, nor in the where
> clause, so what is its purpose ?

It does look strange doesn't it?  I would think the same thing if it were
the first time I looked at it.  But rest assured, it's done by design.  A
shipment relates to many shipment_status rows, but only 1 is the current
shipment_status for the shipment.  The first does queries on the current
status only and doesn't analyze the rest of the related items.  The second
left join is for eager loading so that I don't have to run a seperate query
to fetch the children for each shipment.  This really does improve
performance because otherwise you'll have to make N+1 queries to the
database, and that's just too much overhead.  Since I need all the
shipment_status children along with the shipment for the domain logic to
work on them, I have to load them all.

On average, a shipment will have 2 shipment_status rows.  So if the query
selects 100 shipments, the query returns 200 rows.  Hibernate is intelligent
enough to map the shipment_status children to the appropriate shipment
automatically.

> if cs.date has an upper limit, it might be helpful to change the
> condition to a BETWEEN

Well, I could create an upper limit. It would be the current date.  Would
adding in this redundant condition improve performance?  I've clustered the
shipment table so that the dates are together, which has improved
performance.  I'm not sure adding in this implicit condition will speed up
anything, but I will definately try it.

> in any case, i would think you might need an index on
>  shipment(carrier_code_id)
>  shipment(current_status_id)
>  shipment_status(id)

Unfortunately, I have indexes on all three (Postgres implicitly creates
indexes for unique keys).  Here are the other 2 that are already created:

CREATE INDEX shipment_carrier_code_id_idx ON shipment USING btree
(carrier_code_id);
CREATE INDEX shipment_current_status_id_idx ON shipment USING btree
(current_status_id);

So I guess we've been thinking the same thing.  Don't get me wrong.  These
indexes speed up the query from 1.6 seconds to 250 milliseconds.  I just
need to be around 30 milliseconds.

Another idea that had occured to me was trying to force postgres to driver
on the person table because that filter ratio is so great compared to
everything else, but I do remember looking at the explain days ago and it
was one of the last tables being filtered/joined.  Is there anyway to force
postgres to pick person?  The reason I ask is because this would really
reduce the number of rows it pulls out from the shipment table.

Thanks for comments.  I'll try making that date explicit and change the
query to use between to see if that does anything.

Regards and many thanks,
Ken


pgsql-performance by date:

Previous
From: Ragnar Hafstað
Date:
Subject: Re: Help with tuning this query
Next
From: Ragnar Hafstað
Date:
Subject: Re: Help with tuning this query