Re: Help with tuning this query (with explain analyze finally) - Mailing list pgsql-performance

From John Arbash Meinel
Subject Re: Help with tuning this query (with explain analyze finally)
Date
Msg-id 42289347.7040607@arbash-meinel.com
Whole thread Raw
In response to Re: Help with tuning this query (with explain analyze finally)  ("Ken" <ken@upfactor.com>)
Responses Re: Help with tuning this query (with explain analyze finally)
List pgsql-performance
Ken wrote:

> Richard,
>
> What do you mean by summary table?  Basically a cache of the query
> into a table with replicated column names of all the joins?  I'd
> probably have to whipe out the table every minute and re-insert the
> data for each carrier in the system.  I'm not sure how expensive this
> operation would be, but I'm guessing it would be fairly heavy-weight.
> And maintaince would be a lot harder because of the duplicated
> columns, making refactorings on the database more error-prone.  Am I
> understanding your suggestion correctly? Please correct me if I am.
>
>> Can you turn the problem around? Calculate what you want for all
>> users (once every 60 seconds) and stuff those results into a summary
>> table. Then let the users query the summary table as often as they
>> like (with the understanding that the figures aren't going to update
>> any faster than once a minute)
>
It's the same idea of a materialized view, or possibly just a lazy cache.

Just try this query:

CREATE TABLE cachedview AS
select p.id as person_id, s.*, ss.*

from shipment s
inner join shipment_status ss on s.current_status_id=ss.id
inner join release_code rc on ss.release_code_id=rc.id
left outer join driver d on s.driver_id=d.id
left outer join carrier_code cc on s.carrier_code_id=cc.id
where s.carrier_code_id in (
 select cc.id
 from person p
  inner join carrier_to_person ctp on p.id=ctp.person_id
  inner join carrier c on ctp.carrier_id=c.id
  inner join carrier_code cc on cc.carrier_id = c.id
)
and s.current_status_id is not null
and s.is_purged=false
and(rc.number='9' )
and(ss.date>=current_date-31 )

order by ss.date desc ;

Notice that I took out the internal p.id = blah.
Then you can do:

CREATE INDEX cachedview_person_id_idx ON cachedview(person_id);

Then from the client side, you can just run:
SELECT * from cachedview WHERE person_id = <id>;

Now, this assumes that rc.number='9' is what you always want. If that
isn't the case, you could refactor a little bit.

This unrolls all of the work, a table which should be really fast to
query. If this query takes less than 10s to generate, than just have a
service run it every 60s. I think for refreshing, it is actually faster
to drop the table and recreate it, rather than deleteing the entries.
Dropping also has the advantage that if you ever add more rows to s or
ss, then the table automatically gets the new entries.

Another possibility, is to have the "cachedview" not use "s.*, ss.*",
but instead just include whatever the primary keys are for those tables.
Then your final query becomes:

SELECT s.*, ss.* FROM cachedview cv, s, ss WHERE cv.person_id = <id>,
cv.s_id = s.<pkey>, cv.ss_id = ss.<pkey>;

Again, this should be really fast, because you should have an index on
cv.person_id and only have say 300 rows there, and then you are just
fetching a few rows from s and ss. You can also use this time to do some
of your left joins against other tables.

Does this make sense? The biggest advantage you have is your "60s"
statement. With that in hand, I think you can do a lot of caching
optimizations.

John
=:->


Attachment

pgsql-performance by date:

Previous
From: "Ken"
Date:
Subject: Re: Help with tuning this query (with explain analyze finally)
Next
From: John Arbash Meinel
Date:
Subject: Re: Help with tuning this query (with explain analyze finally)