Thread: Help with tuning this query

From:
"Ken Egervari"
Date:

I've tried to use Dan Tow's tuning method and created all the right indexes from his diagraming method, but the query still performs quite slow both inside the application and just inside pgadmin III.  Can anyone be kind enough to help me tune it so that it performs better in postgres?  I don't think it's using the right indexes, or maybe postgres needs special treatment.
 
I've converted the below query to SQL from a Hibernate query, so the syntax is probably not perfect but it's semantics are exactly the same.  I've done so by looking at the source code, but I can't run it to get the exact SQL since I don't have the database on my home machine.
 
select s.*
from shipment s
    inner join carrier_code cc on s.carrier_code_id = cc.id
    inner join carrier c on cc.carrier_id = c.id
    inner join carrier_to_person ctp on ctp.carrier_id = c.id
    inner join person p on p.id = ctp.person_id
    inner join shipment_status cs on s.current_status_id = cs.id
    inner join release_code rc on cs.release_code_id = rc.id
    left join shipment_status ss on ss.shipment_id = s.id
where
    p.id = :personId and
    s.is_purged = false and
    rc.number = '9' and
    cs is not null and
    cs.date >= current_date - 31
order by cs.date desc
Just assume I have no indexes for the moment because while some of the indexes I made make it work faster, it's still around 250 milliseconds and under heavy load, the query performs very badly (6-7 seconds).
 
For your information:
 
shipment contains 40,000 rows
shipment_status contains 80,000 rows
release_code contains 8 rows
person contains 300 rows
carrier contains 60 rows
carrier_code contains 70 rows
 
The filter ratios are:
 
rc.number = '9' (0.125)
cs.date >= current_date - 31 (.10)
p.id = ? (0.003)
s.is_purged = false (.98)
 
I really hope someone can help since I'm pretty much stuck.
 
Best regards and many thanks,
Ken
From:
Richard Huxton
Date:

Ken Egervari wrote:
> I've tried to use Dan Tow's tuning method

Who? What?

 > and created all the right
> indexes from his diagraming method, but the query still performs
> quite slow both inside the application and just inside pgadmin III.
> Can anyone be kind enough to help me tune it so that it performs
> better in postgres?  I don't think it's using the right indexes, or
> maybe postgres needs special treatment.
>
> I've converted the below query to SQL from a Hibernate query, so the
> syntax is probably not perfect but it's semantics are exactly the
> same.  I've done so by looking at the source code, but I can't run it
> to get the exact SQL since I don't have the database on my home
> machine.

Hibernate is a java thing, no? It'd be helpful to have the actual SQL
the hibernate class (or whatever) generates. One of the problems with
SQL is that you can have multiple ways to get the same results and it's
not always possible for the planner to convert from one to the other.

Anyway, people will want to see EXPLAIN ANALYSE for the query in
question. Obviously, make sure you've vacuumed and analysed the tables
in question recently. Oh, and make sure yousay what version of PG you're
running.

> select s.* from shipment s inner join carrier_code cc on
> s.carrier_code_id = cc.id inner join carrier c on cc.carrier_id =
> c.id inner join carrier_to_person ctp on ctp.carrier_id = c.id inner
> join person p on p.id = ctp.person_id inner join shipment_status cs
> on s.current_status_id = cs.id inner join release_code rc on
> cs.release_code_id = rc.id left join shipment_status ss on
> ss.shipment_id = s.id where p.id = :personId and s.is_purged = false
> and rc.number = '9' and cs is not null and cs.date >= current_date -
> 31 order by cs.date desc

1. Why are you quoting the 9 when checking against rc.number?
2. The "cs is not null" doesn't appear to be qualified - which table?

> Just assume I have no indexes for the moment because while some of
> the indexes I made make it work faster, it's still around 250
> milliseconds and under heavy load, the query performs very badly (6-7
> seconds).

3. If you rewrite the "current_date - 31" as a suitable ago(31) function
then you can use an index on cs.date
4. Are you familiar with the configuration setting "join_collapse_limit"?

--
   Richard Huxton
   Archonet Ltd

From:
Bricklen Anderson
Date:

Richard Huxton wrote:
> Ken Egervari wrote:
>
>> I've tried to use Dan Tow's tuning method
> Who? What?

http://www.singingsql.com/
Dan has written some remarkable papers on sql tuning. Some of it is pretty complex, but his book
"SQL Tuning" is an excellent resource.

--
_______________________________

This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.
_______________________________

From:
Bruce Momjian
Date:

Bricklen Anderson wrote:
> Richard Huxton wrote:
> > Ken Egervari wrote:
> >
> >> I've tried to use Dan Tow's tuning method
> > Who? What?
>
> http://www.singingsql.com/

That URL is invalid for me.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
                 |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

From:
John Arbash Meinel
Date:

Ken Egervari wrote:

> I've tried to use Dan Tow's tuning method and created all the right
> indexes from his diagraming method, but the query still performs quite
> slow both inside the application and just inside pgadmin III.  Can
> anyone be kind enough to help me tune it so that it performs better in
> postgres?  I don't think it's using the right indexes, or maybe
> postgres needs special treatment.
>

First, what version of postgres, and have you run VACUUM ANALYZE recently?
Also, please attach the result of running EXPLAIN ANALYZE.
(eg, explain analyze select s.* from shipment ...)

It's very possible that you don't have up-to-date statistics, which
causes postgres to make a bad estimate of what the fastest plan is.

Also, if you are using an older version of postgres (like 7.1) you
really should upgrade. There are quite a few performance and real bug fixes.

> I've converted the below query to SQL from a Hibernate query, so the
> syntax is probably not perfect but it's semantics are exactly the
> same.  I've done so by looking at the source code, but I can't run it
> to get the exact SQL since I don't have the database on my home machine.

I don't know how to make Hibernate do what you want, but if you change
the query to using subselects (not all databases support this, so
hibernate might not let you), you can see a performance improvement.
Also sometimes using explicit joins can be worse than just letting the
query manager figure it out. So something like
select s.* from shipment s, carrier_code cc, carrier c, ...
    where s.carrier_code_id = cc.id and c.id = cc.carrier_id and ....

But again, since this is generated from another program (Hibernate), I
really don't know how you tell it how to tune the SQL. Probably the
biggest "non-bug" performance improvements are from tuning the SQL.
But if postgres isn't using the right indexes, etc, you can probably fix
that.

John
=:->

>
> select s.*
> from shipment s
>     inner join carrier_code cc on s.carrier_code_id = cc.id
>     inner join carrier c on cc.carrier_id = c.id
>     inner join carrier_to_person ctp on ctp.carrier_id = c.id
>     inner join person p on p.id = ctp.person_id
>     inner join shipment_status cs on s.current_status_id = cs.id
>     inner join release_code rc on cs.release_code_id = rc.id
>     left join shipment_status ss on ss.shipment_id = s.id
> where
>     p.id = :personId and
>     s.is_purged = false and
>     rc.number = '9' and
>     cs is not null and
>     cs.date >= current_date - 31
> order by cs.date desc
> Just assume I have no indexes for the moment because while some of the
> indexes I made make it work faster, it's still around 250 milliseconds
> and under heavy load, the query performs very badly (6-7 seconds).
>
> For your information:
>
> shipment contains 40,000 rows
> shipment_status contains 80,000 rows
> release_code contains 8 rows
> person contains 300 rows
> carrier contains 60 rows
> carrier_code contains 70 rows
>
> The filter ratios are:
>
> rc.number = '9' (0.125)
> cs.date >= current_date - 31 (.10)
> p.id = ? (0.003)
> s.is_purged = false (.98)
>
> I really hope someone can help since I'm pretty much stuck.
>
> Best regards and many thanks,
> Ken



From:
"Ken Egervari"
Date:

>First, what version of postgres, and have you run VACUUM ANALYZE recently?
>Also, please attach the result of running EXPLAIN ANALYZE.
>(eg, explain analyze select s.* from shipment ...)

I'm using postgres 8.0.  I wish I could paste explain analyze, but I won't
be at work for a few days.  I was hoping some Postgres/SQL experts here
would be able to simply look at the query and make recommendations because
it's not a very difficult or unique query.

>It's very possible that you don't have up-to-date statistics, which
>causes postgres to make a bad estimate of what the fastest plan is.

I run VACUUM ANALYZE religiously.  I even dumped the production database and
used it as my test database after a full vacuum analyze.  It's really as
fresh as it can be.

>I don't know how to make Hibernate do what you want, but if you change
>the query to using subselects (not all databases support this, so
>hibernate might not let you), you can see a performance improvement.

Yes, Hibernate supports sub-selects.  In fact, I can even drop down to JDBC
explicitly, so whatever SQL tricks out there I can use will work on
Hibernate.  In what way will sub-selects improve this query?

>Also sometimes using explicit joins can be worse than just letting the
>query manager figure it out. So something like
>select s.* from shipment s, carrier_code cc, carrier c, ...
>    where s.carrier_code_id = cc.id and c.id = cc.carrier_id and ....

I think I can avoid using joins in Hibernate, but it makes the query harder
to maintain.  How much of a performance benefit are we talking with this
change?  Since hibernate is an object language, you don't actually have to
specify many joins.  You can use the "dot" notation.

     Query query = session.createQuery(
         "select shipment " +
         "from Shipment shipment " +
         "   inner join
shipment.cargoControlNumber.carrierCode.carrier.persons person " +
         "   inner join shipment.currentStatus currentStatus " +
         "   inner join currentStatus.releaseCode releaseCode " +
         "   left join fetch shipment.currentStatus " +
         "where " +
         "   person.id = :personId and " +
         "   shipment.isPurged = false and " +
         "   releaseCode.number = '9' and " +
         "   currentStatus is not null and " +
         "   currentStatus.date >= current_date - 31 " +
         "order by currentStatus.date desc"
     );

     query.setParameter( "personId", personId );

     query.setFirstResult( firstResult );
     query.setMaxResults( maxResults );

     return query.list();

As you can see, it's fairly elegant language and maps to SQL quite well.

>But again, since this is generated from another program (Hibernate), I
>really don't know how you tell it how to tune the SQL. Probably the
>biggest "non-bug" performance improvements are from tuning the SQL.

I agree, but the ones I've tried aren't good enough.  I have made these
indexes that apply to this query as well as others in my from looking at my
SQL scripts.  Many of my queries have really sped up to 14 milliseconds from
these indexes.  But I can't make this query run any faster.

CREATE INDEX carrier_to_person_person_id_idx ON carrier_to_person USING
btree (person_id);
CREATE INDEX carrier_to_person_carrier_id_idx ON carrier_to_person USING
btree (carrier_id);
CREATE INDEX carrier_code_carrier_id_idx ON carrier_code USING btree
(carrier_id);
CREATE INDEX shipment_carrier_code_id_idx ON shipment USING btree
(carrier_code_id);
CREATE INDEX current_status_date_idx ON shipment_status USING btree (date);
CREATE INDEX shipment_current_status_id_idx ON shipment USING btree
(current_status_id);
CREATE INDEX shipment_status_shipment_id_idx ON shipment_status USING btree
(shipment_id);

Thanks for your responses everyone.  I'll try and get you that explain
analyze.  I'm just not at work at the moment but this is a problem that I'm
simply puzzled and worried about.  I'm getting all of this from CVS on my
work server.

Ken


From:
John Arbash Meinel
Date:

Ken Egervari wrote:

>> First, what version of postgres, and have you run VACUUM ANALYZE
>> recently?
>> Also, please attach the result of running EXPLAIN ANALYZE.
>> (eg, explain analyze select s.* from shipment ...)
>
>
> I'm using postgres 8.0.  I wish I could paste explain analyze, but I
> won't be at work for a few days.  I was hoping some Postgres/SQL
> experts here would be able to simply look at the query and make
> recommendations because it's not a very difficult or unique query.
>
That's the problem. Without explain analyze, it's hard to say why it is
performing weird, because it *does* look like a straightforward query.

>> It's very possible that you don't have up-to-date statistics, which
>> causes postgres to make a bad estimate of what the fastest plan is.
>
>
> I run VACUUM ANALYZE religiously.  I even dumped the production
> database and used it as my test database after a full vacuum analyze.
> It's really as fresh as it can be.
>
Good. Again, this is just the first precaution, as not everyone is as
careful as you. And without the explain analyze, you can't tell what the
planner estimates are.

>> I don't know how to make Hibernate do what you want, but if you change
>> the query to using subselects (not all databases support this, so
>> hibernate might not let you), you can see a performance improvement.
>
>
> Yes, Hibernate supports sub-selects.  In fact, I can even drop down to
> JDBC explicitly, so whatever SQL tricks out there I can use will work
> on Hibernate.  In what way will sub-selects improve this query?
>
When doing massive joins across multiple tables (as you are doing) it is
frequently faster to do a couple of small joins where you only need a
couple of rows as input to the rest. Something like:

select * from shipment s
where s.carrier_code_id in
        (select cc.id from carrier_code cc join carrier c on
cc.carrier_id = c.id)
and s.current_status_id in (select cs.id from shipment_status cs where ...)

Again it's something that you can try. I have found quite a few of my
queries performed much better with subselects.
I'm guessing it's because with big queries it has a harder time figuring
out how to refactor (the decision tree becomes big). But I'm not really
sure. I just know it can work.

>> Also sometimes using explicit joins can be worse than just letting the
>> query manager figure it out. So something like
>> select s.* from shipment s, carrier_code cc, carrier c, ...
>>    where s.carrier_code_id = cc.id and c.id = cc.carrier_id and ....
>
>
> I think I can avoid using joins in Hibernate, but it makes the query
> harder to maintain.  How much of a performance benefit are we talking
> with this change?  Since hibernate is an object language, you don't
> actually have to specify many joins.  You can use the "dot" notation.
>
I'm not saying this *will* improve performance. It is just something to
try. It very easily could not be worth the overhead.

>     Query query = session.createQuery(
>         "select shipment " +
>         "from Shipment shipment " +
>         "   inner join
> shipment.cargoControlNumber.carrierCode.carrier.persons person " +
>         "   inner join shipment.currentStatus currentStatus " +
>         "   inner join currentStatus.releaseCode releaseCode " +
>         "   left join fetch shipment.currentStatus " +
>         "where " +
>         "   person.id = :personId and " +
>         "   shipment.isPurged = false and " +
>         "   releaseCode.number = '9' and " +
>         "   currentStatus is not null and " +
>         "   currentStatus.date >= current_date - 31 " +
>         "order by currentStatus.date desc"
>     );
>
>     query.setParameter( "personId", personId );
>
>     query.setFirstResult( firstResult );
>     query.setMaxResults( maxResults );
>
>     return query.list();
>
> As you can see, it's fairly elegant language and maps to SQL quite well.
>
>> But again, since this is generated from another program (Hibernate), I
>> really don't know how you tell it how to tune the SQL. Probably the
>> biggest "non-bug" performance improvements are from tuning the SQL.
>
>
> I agree, but the ones I've tried aren't good enough.  I have made
> these indexes that apply to this query as well as others in my from
> looking at my SQL scripts.  Many of my queries have really sped up to
> 14 milliseconds from these indexes.  But I can't make this query run
> any faster.
>
> CREATE INDEX carrier_to_person_person_id_idx ON carrier_to_person
> USING btree (person_id);
> CREATE INDEX carrier_to_person_carrier_id_idx ON carrier_to_person
> USING btree (carrier_id);
> CREATE INDEX carrier_code_carrier_id_idx ON carrier_code USING btree
> (carrier_id);
> CREATE INDEX shipment_carrier_code_id_idx ON shipment USING btree
> (carrier_code_id);
> CREATE INDEX current_status_date_idx ON shipment_status USING btree
> (date);
> CREATE INDEX shipment_current_status_id_idx ON shipment USING btree
> (current_status_id);
> CREATE INDEX shipment_status_shipment_id_idx ON shipment_status USING
> btree (shipment_id);
>
> Thanks for your responses everyone.  I'll try and get you that explain
> analyze.  I'm just not at work at the moment but this is a problem
> that I'm simply puzzled and worried about.  I'm getting all of this
> from CVS on my work server.
>
> Ken

There is also the possibility that you are having problems with
cross-column correlation, or poor distribution of a column. Postgres
doesn't keep cross-column statistics, so if 2 columns are correlated,
then it mis-estimates selectivity, and might pick the wrong plan.

In general your query looks decent, we just need to figure out what is
going on.

John
=:->


From:
Ragnar Hafstað
Date:

On Wed, 2005-03-02 at 01:51 -0500, Ken Egervari wrote:
>
> select s.*
> from shipment s
>     inner join carrier_code cc on s.carrier_code_id = cc.id
>     inner join carrier c on cc.carrier_id = c.id
>     inner join carrier_to_person ctp on ctp.carrier_id = c.id
>     inner join person p on p.id = ctp.person_id
>     inner join shipment_status cs on s.current_status_id = cs.id
>     inner join release_code rc on cs.release_code_id = rc.id
>     left join shipment_status ss on ss.shipment_id = s.id
> where
>     p.id = :personId and
>     s.is_purged = false and
>     rc.number = '9' and
>     cs is not null and
>     cs.date >= current_date - 31
> order by cs.date desc
> ...
> shipment contains 40,000 rows
> shipment_status contains 80,000 rows

I may be missing something, but it looks like the second join
on shipment_status (the left join) is not adding anything to your
results, except more work. ss is not used for output, nor in the where
clause, so what is its purpose ?

if cs.date has an upper limit, it might be helpful to change the
condition to a BETWEEN

in any case, i would think you might need an index on
  shipment(carrier_code_id)
  shipment(current_status_id)
  shipment_status(id)

gnari




From:
"Ken Egervari"
Date:

>> select s.*
>> from shipment s
>>     inner join carrier_code cc on s.carrier_code_id = cc.id
>>     inner join carrier c on cc.carrier_id = c.id
>>     inner join carrier_to_person ctp on ctp.carrier_id = c.id
>>     inner join person p on p.id = ctp.person_id
>>     inner join shipment_status cs on s.current_status_id = cs.id
>>     inner join release_code rc on cs.release_code_id = rc.id
>>     left join shipment_status ss on ss.shipment_id = s.id
>> where
>>     p.id = :personId and
>>     s.is_purged = false and
>>     rc.number = '9' and
>>     cs is not null and
>>     cs.date >= current_date - 31
>> order by cs.date desc
>> ...
>> shipment contains 40,000 rows
>> shipment_status contains 80,000 rows
>
> I may be missing something, but it looks like the second join
> on shipment_status (the left join) is not adding anything to your
> results, except more work. ss is not used for output, nor in the where
> clause, so what is its purpose ?

It does look strange doesn't it?  I would think the same thing if it were
the first time I looked at it.  But rest assured, it's done by design.  A
shipment relates to many shipment_status rows, but only 1 is the current
shipment_status for the shipment.  The first does queries on the current
status only and doesn't analyze the rest of the related items.  The second
left join is for eager loading so that I don't have to run a seperate query
to fetch the children for each shipment.  This really does improve
performance because otherwise you'll have to make N+1 queries to the
database, and that's just too much overhead.  Since I need all the
shipment_status children along with the shipment for the domain logic to
work on them, I have to load them all.

On average, a shipment will have 2 shipment_status rows.  So if the query
selects 100 shipments, the query returns 200 rows.  Hibernate is intelligent
enough to map the shipment_status children to the appropriate shipment
automatically.

> if cs.date has an upper limit, it might be helpful to change the
> condition to a BETWEEN

Well, I could create an upper limit. It would be the current date.  Would
adding in this redundant condition improve performance?  I've clustered the
shipment table so that the dates are together, which has improved
performance.  I'm not sure adding in this implicit condition will speed up
anything, but I will definately try it.

> in any case, i would think you might need an index on
>  shipment(carrier_code_id)
>  shipment(current_status_id)
>  shipment_status(id)

Unfortunately, I have indexes on all three (Postgres implicitly creates
indexes for unique keys).  Here are the other 2 that are already created:

CREATE INDEX shipment_carrier_code_id_idx ON shipment USING btree
(carrier_code_id);
CREATE INDEX shipment_current_status_id_idx ON shipment USING btree
(current_status_id);

So I guess we've been thinking the same thing.  Don't get me wrong.  These
indexes speed up the query from 1.6 seconds to 250 milliseconds.  I just
need to be around 30 milliseconds.

Another idea that had occured to me was trying to force postgres to driver
on the person table because that filter ratio is so great compared to
everything else, but I do remember looking at the explain days ago and it
was one of the last tables being filtered/joined.  Is there anyway to force
postgres to pick person?  The reason I ask is because this would really
reduce the number of rows it pulls out from the shipment table.

Thanks for comments.  I'll try making that date explicit and change the
query to use between to see if that does anything.

Regards and many thanks,
Ken


From:
Ragnar Hafstað
Date:

On Wed, 2005-03-02 at 13:28 -0500, Ken Egervari wrote:
> >> select s.*
> >> from shipment s
> >>     inner join carrier_code cc on s.carrier_code_id = cc.id
> >>     inner join carrier c on cc.carrier_id = c.id
> >>     inner join carrier_to_person ctp on ctp.carrier_id = c.id
> >>     inner join person p on p.id = ctp.person_id
> >>     inner join shipment_status cs on s.current_status_id = cs.id
> >>     inner join release_code rc on cs.release_code_id = rc.id
> >>     left join shipment_status ss on ss.shipment_id = s.id
> >> where
> >>     p.id = :personId and
> >>     s.is_purged = false and
> >>     rc.number = '9' and
> >>     cs is not null and
> >>     cs.date >= current_date - 31
> >> order by cs.date desc
> >
> > I may be missing something, but it looks like the second join
> > on shipment_status (the left join) is not adding anything to your
> > results, except more work. ss is not used for output, nor in the where
> > clause, so what is its purpose ?
> ...  The second
> left join is for eager loading so that I don't have to run a seperate query
> to fetch the children for each shipment.  This really does improve
> performance because otherwise you'll have to make N+1 queries to the
> database, and that's just too much overhead.

are you saying that you are actually doing a
  select s.*,ss.* ...
?

> > if cs.date has an upper limit, it might be helpful to change the
> > condition to a BETWEEN
>
> Well, I could create an upper limit. It would be the current date.  Would
> adding in this redundant condition improve performance?

it might help the planner estimate better the number of cs rows
affected. whether this improves performance depends on whether
the best plans are sensitive to this.

an EXPLAIN ANALYSE might reduce the guessing.

gnari



From:
"Ken Egervari"
Date:

>> left join is for eager loading so that I don't have to run a seperate
>> query
>> to fetch the children for each shipment.  This really does improve
>> performance because otherwise you'll have to make N+1 queries to the
>> database, and that's just too much overhead.
>
> are you saying that you are actually doing a
>  select s.*,ss.* ...
> ?

Yes, this is how the SQL should be written.  When I manually converted the
query, I forgot to include this detail.  In hibernate, you don't need to
specifiy the ss.* because you are dealing with objects, so you just say
shipment.  The ss.* is indicated in the "fetch" part of the Hibernate query.
That was my mistake.

> it might help the planner estimate better the number of cs rows
> affected. whether this improves performance depends on whether
> the best plans are sensitive to this.

This sounds like a good idea since cs rows are quite large.  shipment and
shipment_status are the largest tables in the database and they will grow
very large over time.


From:
"Ken Egervari"
Date:

>it might help the planner estimate better the number of cs rows
>affected. whether this improves performance depends on whether
>the best plans are sensitive to this.

I managed to try this and see if it did anything.  Unfortunately, it made no
difference.  It's still 250 milliseconds.  It was a good suggestion though.
I believed it work too.

> an EXPLAIN ANALYSE might reduce the guessing.

Okay, here is the explain analyze I managed to get from work.  It came out
to 312ms here, but without the analyze it actually runs at ~250ms.  It is
using indexes, so my guess is that there are too many joins or it's not
driving on person fast enough.  Release code is such a small table that I
dont think that sequencial scan matters.  Thanks for taking the time to
analyze this.

Sort  (cost=1902.27..1902.31 rows=17 width=91) (actual time=312.000..312.000
rows=39 loops=1)
  Sort Key: ss.date
  ->  Hash Join  (cost=617.07..1901.92 rows=17 width=91) (actual
time=234.000..312.000 rows=39 loops=1)
        Hash Cond: ("outer".carrier_code_id = "inner".id)
        ->  Merge Join  (cost=602.54..1882.73 rows=870 width=91) (actual
time=234.000..312.000 rows=310 loops=1)
              Merge Cond: ("outer".current_status_id = "inner".id)
              ->  Index Scan using shipment_current_status_id_idx on
shipment s  (cost=0.00..2552.13 rows=60327 width=66) (actual
time=0.000..61.000 rows=27711 loops=1)
                    Filter: (is_purged = false)
              ->  Sort  (cost=602.54..607.21 rows=1866 width=25) (actual
time=125.000..125.000 rows=6934 loops=1)
                    Sort Key: ss.id
                    ->  Hash Join  (cost=1.11..501.17 rows=1866 width=25)
(actual time=0.000..78.000 rows=6934 loops=1)
                          Hash Cond: ("outer".release_code_id = "inner".id)
                          ->  Index Scan using current_status_date_idx on
shipment_status ss  (cost=0.00..406.78 rows=14924 width=25) (actual
time=0.000..47.000 rows=15053 loops=1)
                                Index Cond: (date >= (('now'::text)::date -
31))
                                Filter: (id IS NOT NULL)
                          ->  Hash  (cost=1.10..1.10 rows=1 width=4) (actual
time=0.000..0.000 rows=0 loops=1)
                                ->  Seq Scan on release_code rc
(cost=0.00..1.10 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
                                      Filter: ((number)::text = '9'::text)
        ->  Hash  (cost=14.53..14.53 rows=2 width=4) (actual
time=0.000..0.000 rows=0 loops=1)
              ->  Nested Loop  (cost=4.92..14.53 rows=2 width=4) (actual
time=0.000..0.000 rows=2 loops=1)
                    ->  Index Scan using person_pkey on person p
(cost=0.00..5.75 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
                          Index Cond: (id = 355)
                    ->  Hash Join  (cost=4.92..8.75 rows=2 width=8) (actual
time=0.000..0.000 rows=2 loops=1)
                          Hash Cond: ("outer".id = "inner".carrier_id)
                          ->  Seq Scan on carrier c  (cost=0.00..3.54
rows=54 width=4) (actual time=0.000..0.000 rows=54 loops=1)
                          ->  Hash  (cost=4.92..4.92 rows=2 width=16)
(actual time=0.000..0.000 rows=0 loops=1)
                                ->  Hash Join  (cost=3.04..4.92 rows=2
width=16) (actual time=0.000..0.000 rows=2 loops=1)
                                      Hash Cond: ("outer".carrier_id =
"inner".carrier_id)
                                      ->  Seq Scan on carrier_code cc
(cost=0.00..1.57 rows=57 width=8) (actual time=0.000..0.000 rows=57 loops=1)
                                      ->  Hash  (cost=3.04..3.04 rows=1
width=8) (actual time=0.000..0.000 rows=0 loops=1)
                                            ->  Index Scan using
carrier_to_person_person_id_idx on carrier_to_person ctp  (cost=0.00..3.04
rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=1)
                                                  Index Cond: (355 =
person_id)
Total runtime: 312.000 ms

Ken


From:
Tom Lane
Date:

"Ken Egervari" <> writes:
> Okay, here is the explain analyze I managed to get from work.

What platform is this on?  It seems very strange/fishy that all the
actual-time values are exact integral milliseconds.

            regards, tom lane

From:
"Ken Egervari"
Date:

> "Ken Egervari" <> writes:
>> Okay, here is the explain analyze I managed to get from work.
>
> What platform is this on?  It seems very strange/fishy that all the
> actual-time values are exact integral milliseconds.
>
> regards, tom lane

My machine is WinXP professional, athon xp 2100, but I get similar results
on my Intel P4 3.0Ghz as well (which is also running WinXP).  Why do you
ask?


From:
Tom Lane
Date:

"Ken Egervari" <> writes:
>> What platform is this on?  It seems very strange/fishy that all the
>> actual-time values are exact integral milliseconds.

> My machine is WinXP professional, athon xp 2100, but I get similar results
> on my Intel P4 3.0Ghz as well (which is also running WinXP).  Why do you
> ask?

Well, what it suggests is that gettimeofday() is only returning a result
good to the nearest millisecond.  (Win32 hackers, does that sound right?)

If so, I'd have to take the EXPLAIN ANALYZE results with a big grain of
salt, because what it's trying to do is add up a lot of
mostly-sub-millisecond intervals.  What would essentially happen is that
whichever plan node had control at a particular millisecond boundary
would get charged for the whole preceding millisecond, and any other
nodes (which might have actually eaten most of the millisecond) would
get charged nothing.

Over a sufficiently long query run, the errors would average out, but
this wasn't that long --- 312 milliseconds, so in essence we are trying
to estimate the query's behavior from only 312 samples of where it was
at the millisecond boundaries.  I don't trust profiles based on less
than a few thousand samples ...

Most modern machines seem to have clocks that can count elapsed time
down to near the microsecond level.  Anyone know if it's possible to get
such numbers out of Windows, or are we stuck with milliseconds?

            regards, tom lane

From:
John A Meinel
Date:

Tom Lane wrote:

>"Ken Egervari" <> writes:
>
>
>>Okay, here is the explain analyze I managed to get from work.
>>
>>
>
>What platform is this on?  It seems very strange/fishy that all the
>actual-time values are exact integral milliseconds.
>
>
>
I always get round milliseconds on running. In fact, I think I've seen
cases where it was actually rounding to 15/16ms. Which is the resolution
of the "clock()" call (IIRC).

This is the function I have for returning time better than clock(), but
it looks like it is still stuck no better than 1ms.
/*
 * MSVC has a function called _ftime64, which is in
 * "sys/timeb.h", which should be accurate to milliseconds
 */

#include <sys/types.h>
#include <sys/timeb.h>

double mf::getTime()
{
    struct __timeb64 timeNow;
    _ftime64(&timeNow);
    return timeNow.time + timeNow.millitm / 1000.0;
}

I did, however, find this page:
http://www.wideman-one.com/gw/tech/dataacq/wintiming.htm

Which talks about the high performance counter, which is supposed to be
able to get better than 1us resolution.

GetSystemTimes() returns the idle/kernel/user times, and seems to have a
resolution of about 100ns (.1us) GetLocalTime()/GetSystemTime() only has
a resolution of milliseconds.

In my simple test, I was actually getting timings with a resolution of
.3us for the QueryPerformanceCounter(). That was the overhead of just
the call, since it was called either in a bare loop, or just one after
the other.

So probably we just need to switch to QueryPerformanceCounter()
[/Frequency].

John
=:->


From:
"Ken Egervari"
Date:

> If so, I'd have to take the EXPLAIN ANALYZE results with a big grain of
> salt, because what it's trying to do is add up a lot of
> mostly-sub-millisecond intervals.  What would essentially happen is that
> whichever plan node had control at a particular millisecond boundary
> would get charged for the whole preceding millisecond, and any other
> nodes (which might have actually eaten most of the millisecond) would
> get charged nothing.

Well, we do know that it's at least 75% accurate.  I'm only looking for a
relative increase in performance.  My goal is to try and get this query down
to 30 milliseconds. But even 125 or 75 would be an improvement.  Any
improvement, even based on fuzzy data, is still an improvement.  Being
precise isn't really that important, at least not to me or the people using
the application.  I can see how rounding can throw off results in the inner
parts of the plan though, but I think we should try and work with the
explain as it is.  If there is anything else I can give you to help me out,
please ask and I will kindly do it.  I want to make this easy for you.

> Over a sufficiently long query run, the errors would average out, but
> this wasn't that long --- 312 milliseconds, so in essence we are trying
> to estimate the query's behavior from only 312 samples of where it was
> at the millisecond boundaries.  I don't trust profiles based on less
> than a few thousand samples ...

I'm just using data from the production database, which only has 5 digits
worth of rows in the main tables.  I don't think I can get millions of rows
in these tables, although I wish I could.  I'd have to write a program to
insert the data randomly and try to make it distributed the way a real
production database might look in a few years if I wanted the most accurate
results.  I would try to make the dates bunched up correctly and add more
carriers and shipments over time (as more customers would use the system)
expoentially.

But I'm trying to be practical too.  This query is too slow for 5 digits of
rows in the database.  Imagine how bad it would be with millions!
Unfortunately, this query gets ran by hundreds of people logged in every 60
seconds on average.  It must be as fast as possible.  During peak times,
people have to wait 5 or 6 seconds just to see the results of this query.

I understand the app may be at fault too, but if this query performed
faster, I'm sure that would solve that problem because it's inheritly slow
and the app is very well layered.  It makes good use of frameworks like
Spring, Hibernate and database pooling, which have been used on many
applications and have been running very well for us.   The fact that the
query is slow in PgAdmin III or phpPgAdmin speaks that the query can be
tuned better.

I am no master tuner.  I have read as much as I could about database tuning
in general, about the proper use of Hibernate and so on.  Frankly, I am not
experienced enough to solve this problem and I wish to learn from the
experts, like you Tom, John, Ragnar and others that have responded kindly to
my request.

> Most modern machines seem to have clocks that can count elapsed time
> down to near the microsecond level.  Anyone know if it's possible to get
> such numbers out of Windows, or are we stuck with milliseconds?

These results came from PgAdmin III directly.  I'm not sure how I can get
different results even if I knew of a way.


From:
"Ken Egervari"
Date:

>I took John's advice and tried to work with sub-selects.  I tried this
>variation, which actually seems like it would make a difference
>conceptually since it drives on the person table quickly.  But to my
>surprise, the query runs at about 375 milliseconds.  I think it's because
>it's going over that shipment table multiple times, which is where the
>results are coming from.

I also made a version that runs over shipment a single time, but it's
exactly 250 milliseconds.  I guess the planner does the exact same thing.

select s.*, ss.*

from shipment s
 inner join shipment_status ss on s.current_status_id=ss.id
 inner join release_code rc on ss.release_code_id=rc.id
 left outer join driver d on s.driver_id=d.id
 left outer join carrier_code cc on s.carrier_code_id=cc.id
where s.carrier_code_id in (
  select cc.id
  from person p
   inner join carrier_to_person ctp on p.id=ctp.person_id
   inner join carrier c on ctp.carrier_id=c.id
   inner join carrier_code cc on cc.carrier_id = c.id
  where p.id = 355
 )
 and s.current_status_id is not null
 and s.is_purged=false
 and(rc.number='9' )
 and(ss.date>=current_date-31 )

order by ss.date desc


From:
"Ken Egervari"
Date:

I took John's advice and tried to work with sub-selects.  I tried this
variation, which actually seems like it would make a difference conceptually
since it drives on the person table quickly.  But to my surprise, the query
runs at about 375 milliseconds.  I think it's because it's going over that
shipment table multiple times, which is where the results are coming from.

select s.*, ss.*

from shipment s
 inner join shipment_status ss on s.current_status_id=ss.id
 inner join release_code rc on ss.release_code_id=rc.id
 left outer join driver d on s.driver_id=d.id
 left outer join carrier_code cc on s.carrier_code_id=cc.id
where s.id in (
  select s.id
  from person p
   inner join carrier_to_person ctp on p.id=ctp.person_id
   inner join carrier c on ctp.carrier_id=c.id
   inner join carrier_code cc on cc.carrier_id = c.id
   inner join shipment s on s.carrier_code_id = cc.id
  where p.id = 355
      and s.current_status_id is not null
      and s.is_purged=false
 )
 and(rc.number='9' )
 and(ss.date>=current_date-31 )

order by ss.date desc

*** Musing 1
Also, "s.current_status_id is not null" is an important filter that I forgot
to mention.  In this example where p.id = 355, it filters out 90% of the
rows.  In general, that filter ratio is 0.46 though, which is not quite so
high.  However, this filter gets better over time because more and more
users will use a filter that will make this value null.  It's still not as
strong as person though and probably never will be.  But I thought I'd
mention it nonetheless.

*** Musing 2
I do think that the filter "ss.date>=current_date-31" is slowing this query
down.  I don't think it's the mention of "current_date" or even that it's
dynamic instead of static.  I think the range is just too big.  For example,
if I use:

and ss.date between '2005-02-01 00:00:00' and '2005-02-28 23:59:59'

The query still results in 250 milliseconds.  But if I make the range very
small - say Feb 22nd of 2005:

and ss.date between '2005-02-22 00:00:00' and '2005-02-22 23:59:59'

Now the entire query runs in 47 milliseconds on average.  If I can't make
this query perform any better, should I change the user interface to select
the date instead of showing the last 31 days to benefit from this single-day
filter?  This causes more clicks to select the day (like from a calendar),
but most users probably aren't interested in seeing the entire listing
anyway.  However, it's a very important requirement that users know that
shipment enteries exist in the last 31 days (because they are usually
sure-fire problems if they are still in this query after a few days).

I guess I'm wondering if tuning the query is futile and I should get the
requirements changed, or is there something I can do to really speed it up?

Thanks again,
Ken


From:
Mark Kirkwood
Date:

Ken Egervari wrote:
> I've tried to use Dan Tow's tuning method and created all the right indexes from his diagraming method, but the query
stillperforms quite slow both inside the application and just inside pgadmin III.  Can anyone be kind enough to help me
tuneit so that it performs better in postgres?  I don't think it's using the right indexes, or maybe postgres needs
specialtreatment. 
>
> I've converted the below query to SQL from a Hibernate query, so the syntax is probably not perfect but it's
semanticsare exactly the same.  I've done so by looking at the source code, but I can't run it to get the exact SQL
sinceI don't have the database on my home machine. 
>
> select s.*
> from shipment s
>     inner join carrier_code cc on s.carrier_code_id = cc.id
>     inner join carrier c on cc.carrier_id = c.id
>     inner join carrier_to_person ctp on ctp.carrier_id = c.id
>     inner join person p on p.id = ctp.person_id
>     inner join shipment_status cs on s.current_status_id = cs.id
>     inner join release_code rc on cs.release_code_id = rc.id
>     left join shipment_status ss on ss.shipment_id = s.id
> where
>     p.id = :personId and
>     s.is_purged = false and
>     rc.number = '9' and
>     cs is not null and
>     cs.date >= current_date - 31
> order by cs.date desc
>

You might be able to coerce the planner to drive off person by
rearranging the join orders, plus a few other bits... hopefully I have
not brutalized the query to the point where it does not work :-)  :

select p.id, s*, ss.*
from person p
     inner join carrier_to_person ctp on p.id = ctp.person_id
     inner join carrier c on ctp.carrier_id = c.id
     inner join carrier_code cc on cc.carrier_id = c.id
     inner join shipment s on s.carrier_code_id = cc.id
     inner join shipment_status cs on s.current_status_id = cs.id
     inner join release_code rc on cs.release_code_id = rc.id
     left join shipment_status ss on ss.shipment_id = s.id
where
     p.id = :personId and
     s.is_purged = false and
     rc.number = 9 and
     cs is not null and
     cs.date between current_date - 31 and current_date
order by cs.date desc


I have added the 'p.id' in the select list in the hope that that might
encourage the planner to take seriously the idea of getting the person
row(?) first. In addition I made 9 a number and closed the inequality
(just in case it helps a bit).






From:
John A Meinel
Date:

Ken Egervari wrote:

>> I took John's advice and tried to work with sub-selects.  I tried
>> this variation, which actually seems like it would make a difference
>> conceptually since it drives on the person table quickly.  But to my
>> surprise, the query runs at about 375 milliseconds.  I think it's
>> because it's going over that shipment table multiple times, which is
>> where the results are coming from.
>
>
> I also made a version that runs over shipment a single time, but it's
> exactly 250 milliseconds.  I guess the planner does the exact same thing.
>
Why are you now left joining driver and carrier code, but inner joining
shipment_status? I assume this is the *real* query that you are executing.

 From the earlier explain analyze, and your statements, the initial
person p should be the heavily selective portion.

And what does "driver" get you? It isn't in the return, and it isn't
part of a selectivity clause.
You are also double joining against carrier code, once as a left outer
join, and once in the inner join.

This query doesn't seem quite right. Are you sure it is generating the
rows you are expecting?

> select s.*, ss.*
>
> from shipment s
> inner join shipment_status ss on s.current_status_id=ss.id
> inner join release_code rc on ss.release_code_id=rc.id
> left outer join driver d on s.driver_id=d.id
> left outer join carrier_code cc on s.carrier_code_id=cc.id
> where s.carrier_code_id in (
>  select cc.id
>  from person p
>   inner join carrier_to_person ctp on p.id=ctp.person_id
>   inner join carrier c on ctp.carrier_id=c.id
>   inner join carrier_code cc on cc.carrier_id = c.id
>  where p.id = 355
> )
> and s.current_status_id is not null
> and s.is_purged=false
> and(rc.number='9' )
> and(ss.date>=current_date-31 )
>
> order by ss.date desc

You might want to post the explain analyze of this query to have a point
of reference, but what about something like this:
select s.*, ss.*

from shipment_status ss on s.current_status_id=ss.id
join (select s.* from shipment s
       where s.carrier_code_id in
            (select cc.id
               from person p
              inner join carrier_to_person ctp on p.id=ctp.person_id
              inner join carrier c on ctp.carrier_id=c.id
              inner join carrier_code cc on cc.carrier_id = c.id
             where p.id = 355
            )
         and s.current_status_id is not null
         and s.is_purged=false
) as i -- Just a name for the subselect since it is in a join
inner join release_code rc on ss.release_code_id=rc.id
where (rc.number='9' )
and(ss.date between current_date-31 and current_date())

order by ss.date desc

My idea with this query is to minimize the number of shipment rows that
need to be generated before joining with the other rows. My syntax is
probably a little bit off, since I can't actually run it against real
tables.
But looking at your *original* query, you were getting 15000 rows out of
shipment_status, and then 27700 rows out of shipment, which was then
being merge-joined down to only 300 rows, and then hash-joined down to 39.

I'm just trying to think of ways to prevent it from blossoming into 27k
rows to start with.

Please double check your query, because it seems to be grabbing
unnecessary rows with the left joins, and then post another explain
analyze with one (or several) different subselect forms.

John
=:->


From:
Josh Berkus
Date:

Ken,

> I've tried to use Dan Tow's tuning method and created all the right indexes
> from his diagraming method, but the query still performs quite slow both
> inside the application and just inside pgadmin III.  Can anyone be kind
> enough to help me tune it so that it performs better in postgres?  I don't
> think it's using the right indexes, or maybe postgres needs special
> treatment.

FWIW, I picked up Dan Tow's book to give it a read, and they guy isn't
qualified to author "SQL Tuning".   You should chuck that book, it won't help
you -- not with Oracle or SQL Server, and certainly not with PostgreSQL.
O'Reilly continues to have trouble turning out quality database books.

Also, if you *were* using Dan's method, you'd be driving off Person, not
Shipment.

--
Josh Berkus
Aglio Database Solutions
San Francisco

From:
Josh Berkus
Date:

Ken,

>         ->  Merge Join  (cost=602.54..1882.73 rows=870 width=91) (actual
> time=234.000..312.000 rows=310 loops=1)
>               Merge Cond: ("outer".current_status_id = "inner".id)

Hmmm ... this merge join appears to be the majority of your execution
time .... at least within the resolution that PGWin allows us.    Please try
two things, and give us Explain Analyzes:

1) To determine your query order ala Dan Tow and drive off of person, please
SET JOIN_COLLAPSE_LIMIT = 1 and then run Mark Kirkwood's version of the
query.  (Not that I believe in Dan Tow ... see previous message ... but it
would be interesting to see the results.

2) Force PG to drop the merge join via SET ENABLE_MERGEJOIN = FALSE;

Also, please let us know some about the server you're using and your
configuration parameters, particularly:
shared_buffers
work_mem
effective_cache_size
random_page_cost

--
Josh Berkus
Aglio Database Solutions
San Francisco

From:
"Ken Egervari"
Date:

John,

>Why are you now left joining driver and carrier code, but inner joining
>shipment_status? I assume this is the *real* query that you are executing.

Well, the old and new versions are real queries.  I changed the query a bit
because I noticed for some users, the listing was pulling out many different
drivers.  Each separate query on the driver took about 10 milliseconds.  For
a listing of 39 results, that's a possible 390 milliseconds assuming all the
drivers are different and none of them are cached.  So, I just left joined
the driver and it added about 5 milliseconds of overhead to this query.  I
apoligize for not communicating this change, but I had to make it to speed
this stuff up during the day until I could fix the root of the problem.  One
thing that I learned is that left joining and including lots of columns
rarely slows the query.  The same was done for the carrier_code, although
this only saved 15 milliseconds.

The end result is still high because the query we are talking about is very
expensive, but at least the following queries that appeared after are
eliminated altogether.  The overhead and separate queries really places a
hamper on overall performance.  For the person 355, the overhead was about
300 milliseconds since 10 of the drivers were null.  I hope this makes
sense.

From the earlier explain analyze, and your statements, the initial
>person p should be the heavily selective portion.

I totally agree.  I just never really figured out how to tell postgres my
intentions.

>You are also double joining against carrier code, once as a left outer
>join, and once in the inner join.

Yes, that was my mistake since Hibernate didn't generate that - I manually
put in those sub-selects.

>This query doesn't seem quite right. Are you sure it is generating the
>rows you are expecting?

Yes, the results are the same with the left joins.  I didn't include d.* and
cc.* in the select, which again, is my mistake.  The main problem is when I
make changes to the query, I don't think about it in terms of how SQL does
it.  I think about Hibernate does it.  Earger loading rows is different from
selecting the main row at the top of the query.  I bet this comes as very
strange, but in Hibernate they are two-different things.  I've been using
Hibernate for so long that working with SQL is not so natural for me.  This
is my mistake and I apologize.

>You might want to post the explain analyze of this query to have a point
>of reference, but what about something like this:
>select s.*, ss.*

Okay.  Here is syntax-corrected version of your very creative query.  I
wouldn't have thought of doing something like this at all.  It makes perfect
sense that you are commanding the database to do what it should be doing,
which is something I really like since the concept of a planner picking
stuff for me makes me unsettled (even if it is doing it right).

select i.*, ss.*
from shipment_status ss
 inner join release_code rc on ss.release_code_id=rc.id,
 (
  select s.*
  from shipment s
  where s.current_status_id is not null
   and s.is_purged=false
   and s.carrier_code_id in (
    select cc.id
    from person p
     inner join carrier_to_person ctp on p.id=ctp.person_id
     inner join carrier c on ctp.carrier_id=c.id
     inner join carrier_code cc on cc.carrier_id = c.id
    where p.id = 355
   )
 ) as i
where (rc.number='9' )
 and(i.current_status_id = ss.id)
 and(ss.date between current_date-31 and current_date);

When running this on my production database, the speed is 265 milliseconds
on average running it 20 times (lowest was 250, highest was 281).  Not quite
what we want, but I'm sure the tuning of this new query hasn't really
started.  Here is the EXPLAIN ANALYZE.  It seems very similiar to the one
postgres picked out but it's a bit shorter.

Hash IN Join  (cost=676.15..1943.11 rows=14 width=91) (actual
time=250.000..328.000 rows=39 loops=1)
  Hash Cond: ("outer".carrier_code_id = "inner".id)
  ->  Merge Join  (cost=661.65..1926.51 rows=392 width=91) (actual
time=250.000..328.000 rows=310 loops=1)
        Merge Cond: ("outer".current_status_id = "inner".id)
        ->  Index Scan using shipment_current_status_id_idx on shipment s
(cost=0.00..2702.56 rows=27257 width=66) (actual time=0.000..110.000
rows=27711 loops=1)
              Filter: ((current_status_id IS NOT NULL) AND (is_purged =
false))
        ->  Sort  (cost=661.65..666.46 rows=1922 width=25) (actual
time=140.000..172.000 rows=6902 loops=1)
              Sort Key: ss.id
              ->  Hash Join  (cost=1.11..556.82 rows=1922 width=25) (actual
time=0.000..94.000 rows=6902 loops=1)
                    Hash Cond: ("outer".release_code_id = "inner".id)
                    ->  Index Scan using current_status_date_idx on
shipment_status ss  (cost=0.01..459.64 rows=15372 width=25) (actual
time=0.000..94.000 rows=14925 loops=1)
                          Index Cond: ((date >= (('now'::text)::date - 31))
AND (date <= ('now'::text)::date))
                    ->  Hash  (cost=1.10..1.10 rows=1 width=4) (actual
time=0.000..0.000 rows=0 loops=1)
                          ->  Seq Scan on release_code rc  (cost=0.00..1.10
rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
                                Filter: ((number)::text = '9'::text)
  ->  Hash  (cost=14.49..14.49 rows=2 width=4) (actual time=0.000..0.000
rows=0 loops=1)
        ->  Nested Loop  (cost=6.87..14.49 rows=2 width=4) (actual
time=0.000..0.000 rows=2 loops=1)
              ->  Index Scan using person_pkey on person p  (cost=0.00..5.73
rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
                    Index Cond: (id = 355)
              ->  Hash Join  (cost=6.87..8.74 rows=2 width=8) (actual
time=0.000..0.000 rows=2 loops=1)
                    Hash Cond: ("outer".carrier_id = "inner".carrier_id)
                    ->  Seq Scan on carrier_code cc  (cost=0.00..1.57
rows=57 width=8) (actual time=0.000..0.000 rows=57 loops=1)
                    ->  Hash  (cost=6.86..6.86 rows=1 width=12) (actual
time=0.000..0.000 rows=0 loops=1)
                          ->  Hash Join  (cost=3.04..6.86 rows=1 width=12)
(actual time=0.000..0.000 rows=1 loops=1)
                                Hash Cond: ("outer".id = "inner".carrier_id)
                                ->  Seq Scan on carrier c  (cost=0.00..3.54
rows=54 width=4) (actual time=0.000..0.000 rows=54 loops=1)
                                ->  Hash  (cost=3.04..3.04 rows=1 width=8)
(actual time=0.000..0.000 rows=0 loops=1)
                                      ->  Index Scan using
carrier_to_person_person_id_idx on carrier_to_person ctp  (cost=0.00..3.04
rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=1)
                                            Index Cond: (355 = person_id)
Total runtime: 344.000 ms

>My idea with this query is to minimize the number of shipment rows that
>need to be generated before joining with the other rows. My syntax is
>probably a little bit off, since I can't actually run it against real
>tables.

Yes, I tried adding redundant 'from clauses' with the shipment or
shipment_status tables and each caluse adds 100 milliseconds.  I wish they
weren't so expensive.

>But looking at your *original* query, you were getting 15000 rows out of
>shipment_status, and then 27700 rows out of shipment, which was then
>being merge-joined down to only 300 rows, and then hash-joined down to 39.
>I'm just trying to think of ways to prevent it from blossoming into 27k
>rows to start with.

Yes, nothing has changed from the original query.  By the looks of things,
the sub-select version returns slightly less rows but not much
unfortunately.  I'm trying to figure out how to minimize the rows
traversals.  Maybe I should explain a bit about the app so you can get an
idea on why the shipment rows are so big?

You see, the app keeps track of custom status for shipments.  Either the
status comes in, so the shipment row is created along with 1 or more
shipment_status rows, or the shipments are prepared in advance (so no
shipment_status rows are assigned to them immediately).

In the case of p.id = 355, there are ~27000 shipments.  But most of these
are prepared in advance, which don't concern this query at all and should be
filtered out.  That's why the "s.current_status is not null" is important.
This filter will reduce the rows from 27000 to about 3500, which is all the
real shipments with customs status.  The others will gain rows in
shipment_status over time, but new shipment rows will be created in advance
as well.

At some point, it will probably balance out, but since the features to
prepare shipments in advance are new, only some carriers will have more
shipments than shipment_status rows.  In some cases, there are no prepared
shipments.  When this happens, there is usually a 1:2 ratio between shipment
and shipment_status.  I think this weird distribution makes queries like
this kind of hard to predict the performance of.  Anyway, I think it's
better to assume that previous case where shipment rows > shipment_status
will tend to be the norm over time.

If the query won't perform properly, I'm wondering if the requirements
should really change.  For example, there is another table called
release_office that is also associated with shipment.  I could filter by
that too.  I could then offer a screen to select the release office first
and only show the shipments with that release office.  The will reduce the
number of shipments for some users, but not all.  Some users use only one or
two release offices, so it wouldn't be a big help.

I could also make the query select a certain day instead of a range.  Like I
said in a previous post, this makes the query run at 47 milliseconds.
However, this might make it harder for users to access the information...
and if they clicked 31 days on the calendar, that's really 47*31
milliseconds total.  I guess I'd have to ask for usability studies or
something to figure out what people really hope to gain from these listings
in the first place and how they'd want to work with them.  Maybe it's not a
performance problem - maybe it's a usability problem.  However, even if that
were the case, I'd still want to know how to fix something like this for my
own knowledge since I'm still learning.

I also know others are using postgres quite successfully with tables
containing millions of rows, in applications far more riskier than mine.
I'm not sure why this query is any different.  Is there a configuration
setting I can use to make things speed up perhaps?

Anyhow, thanks for taking the time helping me out John.  I'm going to play
with more sub-selects and see if I find a combination that works a bit
better.  I'll post my results in a bit.  If we do figure this out, it might
be worthwhile for me to make a case-study and make it available over
www.postgres.org so other people can benefit from this experience too.

Many thanks!

Ken


From:
"Ken Egervari"
Date:

Josh,

>1) To determine your query order ala Dan Tow and drive off of person,
>please
>SET JOIN_COLLAPSE_LIMIT = 1 and then run Mark Kirkwood's version of the
>query.  (Not that I believe in Dan Tow ... see previous message ... but it
>would be interesting to see the results.

Unfortunately, the query still takes 250 milliseconds.  I tried it with
other queries and the results are the same as before.  Here is the explain
analayze anyway:

Sort  (cost=2036.83..2036.87 rows=16 width=103) (actual
time=328.000..328.000 rows=39 loops=1)
  Sort Key: cs.date
  ->  Nested Loop Left Join  (cost=620.61..2036.51 rows=16 width=103)
(actual time=250.000..328.000 rows=39 loops=1)
        ->  Hash Join  (cost=620.61..1984.90 rows=16 width=78) (actual
time=250.000..328.000 rows=39 loops=1)
              Hash Cond: ("outer".carrier_code_id = "inner".id)
              ->  Merge Join  (cost=606.11..1965.99 rows=825 width=74)
(actual time=250.000..328.000 rows=310 loops=1)
                    Merge Cond: ("outer".current_status_id = "inner".id)
                    ->  Index Scan using shipment_current_status_id_idx on
shipment s  (cost=0.00..2701.26 rows=60307 width=66) (actual
time=0.000..77.000 rows=27711 loops=1)
                          Filter: (is_purged = false)
                    ->  Sort  (cost=606.11..610.50 rows=1756 width=12)
(actual time=141.000..141.000 rows=6902 loops=1)
                          Sort Key: cs.id
                          ->  Hash Join  (cost=1.11..511.48 rows=1756
width=12) (actual time=0.000..109.000 rows=6902 loops=1)
                                Hash Cond: ("outer".release_code_id =
"inner".id)
                                ->  Index Scan Backward using
current_status_date_idx on shipment_status cs  (cost=0.01..422.58 rows=14047
width=16) (actual time=0.000..78.000 rows=14925 loops=1)
                                      Index Cond: ((date >=
(('now'::text)::date - 31)) AND (date <= ('now'::text)::date))
                                      Filter: (cs.* IS NOT NULL)
                                ->  Hash  (cost=1.10..1.10 rows=1 width=4)
(actual time=0.000..0.000 rows=0 loops=1)
                                      ->  Seq Scan on release_code rc
(cost=0.00..1.10 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
                                            Filter: ((number)::text =
'9'::text)
              ->  Hash  (cost=14.49..14.49 rows=2 width=8) (actual
time=0.000..0.000 rows=0 loops=1)
                    ->  Nested Loop  (cost=6.87..14.49 rows=2 width=8)
(actual time=0.000..0.000 rows=2 loops=1)
                          ->  Index Scan using person_pkey on person p
(cost=0.00..5.73 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
                                Index Cond: (id = 355)
                          ->  Hash Join  (cost=6.87..8.74 rows=2 width=8)
(actual time=0.000..0.000 rows=2 loops=1)
                                Hash Cond: ("outer".carrier_id =
"inner".carrier_id)
                                ->  Seq Scan on carrier_code cc
(cost=0.00..1.57 rows=57 width=8) (actual time=0.000..0.000 rows=57 loops=1)
                                ->  Hash  (cost=6.86..6.86 rows=1 width=12)
(actual time=0.000..0.000 rows=0 loops=1)
                                      ->  Hash Join  (cost=3.04..6.86 rows=1
width=12) (actual time=0.000..0.000 rows=1 loops=1)
                                            Hash Cond: ("outer".id =
"inner".carrier_id)
                                            ->  Seq Scan on carrier c
(cost=0.00..3.54 rows=54 width=4) (actual time=0.000..0.000 rows=54 loops=1)
                                            ->  Hash  (cost=3.04..3.04
rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=1)
                                                  ->  Index Scan using
carrier_to_person_person_id_idx on carrier_to_person ctp  (cost=0.00..3.04
rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=1)
                                                        Index Cond: (355 =
person_id)
        ->  Index Scan using shipment_status_shipment_id_idx on
shipment_status ss  (cost=0.00..3.20 rows=2 width=25) (actual
time=0.000..0.000 rows=1 loops=39)
              Index Cond: (ss.shipment_id = "outer".id)
Total runtime: 328.000 ms

>2) Force PG to drop the merge join via SET ENABLE_MERGEJOIN = FALSE;

Setting this option had no effect either  In fact, the query is a bit slower
(266 milliseconds but 250 came up once in 20 executions).

>Also, please let us know some about the server you're using and your
>configuration parameters, particularly:
>shared_buffers
>work_mem
>effective_cache_size
>random_page_cost

Well, I'm on a test machine so the settings haven't changed one bit from the
defaults.  This may sound embarrassing, but I bet the production server is
not custom configured either.  The computer I'm running these queries on is
just a simple Athon XP 2100+ on WinXP with 1GB of RAM.  The production
server is a faster P4, but the rest is the same.  Here are the 4 values in
my configuration, but 3 of them were commented:

shared_buffers = 1000
#work_mem = 1024
#effective_cache_size = 1000
#random_page_cost = 4

I'm not sure what these do, but I'm guessing the last 2 affect the planner
to do different things with the statistics.  Should I increase the first
two?

Regards,
Ken


From:
Richard Huxton
Date:

Ken Egervari wrote:
>
> Hash IN Join  (cost=676.15..1943.11 rows=14 width=91) (actual
> time=250.000..328.000 rows=39 loops=1)
>  Hash Cond: ("outer".carrier_code_id = "inner".id)
>  ->  Merge Join  (cost=661.65..1926.51 rows=392 width=91) (actual
> time=250.000..328.000 rows=310 loops=1)
>        Merge Cond: ("outer".current_status_id = "inner".id)
>        ->  Index Scan using shipment_current_status_id_idx on shipment s
> (cost=0.00..2702.56 rows=27257 width=66) (actual time=0.000..110.000
> rows=27711 loops=1)
>              Filter: ((current_status_id IS NOT NULL) AND (is_purged =
> false))

There's a feature in PG called partial indexes - see CREATE INDEX
reference for details. Basically you can do something like:

CREATE INDEX foo_idx ON shipment (carrier_code_id)
WHERE current_status_id IS NOT NULL
AND is_purged = FALSE;

Something similar may be a win here, although the above index might not
be quite right - sorry, bit tired at moment.

--
   Richard Huxton
   Archonet Ltd

From:
"Ken Egervari"
Date:

>2) Force PG to drop the merge join via SET ENABLE_MERGEJOIN = FALSE;

Actually, it was 312 milliseconds, so it got worse.

From:
Josh Berkus
Date:

Ken,

Well, I'm a bit stumped on troubleshooting the actual query since Windows'
poor time resolution makes it impossible to trust the actual execution times.
Obviously this is something we need to look into for the Win32 port for
8.1 ..

> shared_buffers = 1000

This may be slowing up that merge join.  Try resetting it to 6000.    I'm not
sure what system settings you might have to do on Windows to get it to
support higher shared buffers; see the docs.

> #work_mem = 1024

Up this to 4096 for testing purposes; your production value will vary
depending on several factors; see link below.

> #effective_cache_size = 1000

Increase this to the actual amount of RAM you have available, about 750MB (you
do the math)

> #random_page_cost = 4

Leave this for now.

See www.powerpostgresql.com/PerfList for more information.

--
Josh Berkus
Aglio Database Solutions
San Francisco

From:
"Ken Egervari"
Date:

Josh,

I did everything you said and my query does perform a bit better.  I've been
getting speeds from 203 to 219 to 234 milliseconds now.   I tried increasing
the work mem and the effective cache size from the values you provided, but
I didn't see any more improvement.  I've tried to looking into setting the
shared buffers for Windows XP, but I'm not sure how to do it.  I'm looking
in the manual at:
http://www.postgresql.org/docs/8.0/interactive/kernel-resources.html#SYSVIPC-PARAMETERS

It doesn't mention windows at all.  Does anyone have any ideas on have to
fix this?

Here is the new explain analyze.

Sort  (cost=1996.21..1996.26 rows=17 width=165) (actual
time=297.000..297.000 rows=39 loops=1)
  Sort Key: ss.date
  ->  Merge Right Join  (cost=1951.26..1995.87 rows=17 width=165) (actual
time=297.000..297.000 rows=39 loops=1)
        Merge Cond: ("outer".id = "inner".driver_id)
        ->  Index Scan using driver_pkey on driver d  (cost=0.00..42.16
rows=922 width=43) (actual time=0.000..0.000 rows=922 loops=1)
        ->  Sort  (cost=1951.26..1951.30 rows=17 width=122) (actual
time=297.000..297.000 rows=39 loops=1)
              Sort Key: s.driver_id
              ->  Hash Join  (cost=586.48..1950.91 rows=17 width=122)
(actual time=219.000..297.000 rows=39 loops=1)
                    Hash Cond: ("outer".carrier_code_id = "inner".id)
                    ->  Merge Join  (cost=571.97..1931.95 rows=830 width=87)
(actual time=219.000..297.000 rows=310 loops=1)
                          Merge Cond: ("outer".current_status_id =
"inner".id)
                          ->  Index Scan using
shipment_current_status_id_idx on shipment s  (cost=0.00..2701.26 rows=60307
width=66) (actual time=0.000..62.000 rows=27711 loops=1)
                                Filter: (is_purged = false)
                          ->  Sort  (cost=571.97..576.38 rows=1766 width=21)
(actual time=125.000..156.000 rows=6902 loops=1)
                                Sort Key: ss.id
                                ->  Hash Join  (cost=1.11..476.72 rows=1766
width=21) (actual time=0.000..93.000 rows=6902 loops=1)
                                      Hash Cond: ("outer".release_code_id =
"inner".id)
                                      ->  Index Scan Backward using
current_status_date_idx on shipment_status ss  (cost=0.00..387.35 rows=14122
width=21) (actual time=0.000..16.000 rows=14925 loops=1)
                                            Index Cond: (date >=
(('now'::text)::date - 31))
                                      ->  Hash  (cost=1.10..1.10 rows=1
width=4) (actual time=0.000..0.000 rows=0 loops=1)
                                            ->  Seq Scan on release_code rc
(cost=0.00..1.10 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
                                                  Filter: ((number)::text =
'9'::text)
                    ->  Hash  (cost=14.51..14.51 rows=2 width=35) (actual
time=0.000..0.000 rows=0 loops=1)
                          ->  Nested Loop  (cost=4.92..14.51 rows=2
width=35) (actual time=0.000..0.000 rows=2 loops=1)
                                ->  Index Scan using person_pkey on person p
(cost=0.00..5.73 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
                                      Index Cond: (id = 355)
                                ->  Hash Join  (cost=4.92..8.75 rows=2
width=39) (actual time=0.000..0.000 rows=2 loops=1)
                                      Hash Cond: ("outer".id =
"inner".carrier_id)
                                      ->  Seq Scan on carrier c
(cost=0.00..3.54 rows=54 width=4) (actual time=0.000..0.000 rows=54 loops=1)
                                      ->  Hash  (cost=4.92..4.92 rows=2
width=43) (actual time=0.000..0.000 rows=0 loops=1)
                                            ->  Hash Join  (cost=3.04..4.92
rows=2 width=43) (actual time=0.000..0.000 rows=2 loops=1)
                                                  Hash Cond:
("outer".carrier_id = "inner".carrier_id)
                                                  ->  Seq Scan on
carrier_code cc  (cost=0.00..1.57 rows=57 width=35) (actual
time=0.000..0.000 rows=57 loops=1)
                                                  ->  Hash  (cost=3.04..3.04
rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=1)
                                                        ->  Index Scan using
carrier_to_person_person_id_idx on carrier_to_person ctp  (cost=0.00..3.04
rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=1)
                                                              Index Cond:
(355 = person_id)
Total runtime: 297.000 ms


From:
John Arbash Meinel
Date:

Ken Egervari wrote:

> Josh,
>
> I did everything you said and my query does perform a bit better.
> I've been getting speeds from 203 to 219 to 234 milliseconds now.   I
> tried increasing the work mem and the effective cache size from the
> values you provided, but I didn't see any more improvement.  I've
> tried to looking into setting the shared buffers for Windows XP, but
> I'm not sure how to do it.  I'm looking in the manual at:
> http://www.postgresql.org/docs/8.0/interactive/kernel-resources.html#SYSVIPC-PARAMETERS
>
>
You probably don't need to change anything for Windows. If you set
shared_buffers too high, then postgres won't start. If it is starting,
then you don't need to modify the OS to get more shared buffers. (For
instance, on my Mac, I can't get shared_buffers > 500 without changing
things, but on windows I run with 3000 and no modification).

> It doesn't mention windows at all.  Does anyone have any ideas on have
> to fix this?
>
Do you need the interior sort? It's taking ~93ms to get 7k rows from
shipment_status, and then another 30ms to sort them. This isn't a lot,
so it might be fine.

Also, did you ever try CLUSTER current_status_date_idx ON shipment_status.
This groups the rows in shipment_status by their status date, which
helps put items with the same date next to eachother. This may effect
other portions of the query, or other queries. Also, if you are
inserting sequentially, it would seem that the items would already be
naturally near eachother based on date.

The next big cost is having to merge the 28k rows with the fast hash
plan, which takes about 80ms.

I guess the biggest issue is that you are doing a lot of work, and it
takes time to do it. Also, I've noticed that this query is being run
with exactly the same data. Which is good to compare two methods. But
remember to test on multiple potential values. You might be better off
one way with this query, but much worse for a different dataset. I
noticed that this seems to have fewer rows than what postgres thinks the
*average* number would be. (It predicts 60k and you only get 28k rows).

If this query is performed a lot, and you can be okay with a slight
delay in updating, you could always switch to some sort of lazy
materialized view.

You could also always throw more hardware at it. :) If the
shipment_status is one of the bottlenecks, create a 4-disk raid10 and
move the table over.
I don't remember what your hardware is, but I don't remember it being a
quad opteron with 16GB ram, and 20 15k SCSI disks, with the transaction
log on a solid state disk. :)

Why do you need the query to be 30ms? ~250ms is still pretty fast. If
you are needing updates faster than that, you might look more into *why*
and then handle it from a higher level.

And naturally, the most important this is to test it under load. 250ms
is pretty good, but if under load it goes back to 6s, then we probably
should look for different alternatives. Also, what is the load that is
causing the problem? Is it that you have some other big seqscans which
are causing all of your tables to go out of cache?

Also, I believe I remember you saying that your production server is a
P4, is that a single P4? Because I know postgres prefers Opterons to
Pentium Xeons when in a multiprocessor machine. Look through the
archives about spinlocks and the context switch bug. (context storm,
etc). Plus, since opterons are 64-bit, you can throw a lot more RAM at
them. I believe opterons outperform xeons for the same cost, *and* you
can scale them up with extra ram.

But remember, the biggest bottleneck is almost *always* the I/O. So put
more & faster disks into the system first.

John
=:->

> Here is the new explain analyze.
>
> Sort  (cost=1996.21..1996.26 rows=17 width=165) (actual
> time=297.000..297.000 rows=39 loops=1)
>  Sort Key: ss.date
>  ->  Merge Right Join  (cost=1951.26..1995.87 rows=17 width=165)
> (actual time=297.000..297.000 rows=39 loops=1)
>        Merge Cond: ("outer".id = "inner".driver_id)
>        ->  Index Scan using driver_pkey on driver d  (cost=0.00..42.16
> rows=922 width=43) (actual time=0.000..0.000 rows=922 loops=1)
>        ->  Sort  (cost=1951.26..1951.30 rows=17 width=122) (actual
> time=297.000..297.000 rows=39 loops=1)
>              Sort Key: s.driver_id
>              ->  Hash Join  (cost=586.48..1950.91 rows=17 width=122)
> (actual time=219.000..297.000 rows=39 loops=1)
>                    Hash Cond: ("outer".carrier_code_id = "inner".id)
>                    ->  Merge Join  (cost=571.97..1931.95 rows=830
> width=87) (actual time=219.000..297.000 rows=310 loops=1)
>                          Merge Cond: ("outer".current_status_id =
> "inner".id)
>                          ->  Index Scan using
> shipment_current_status_id_idx on shipment s  (cost=0.00..2701.26
> rows=60307 width=66) (actual time=0.000..62.000 rows=27711 loops=1)
>                                Filter: (is_purged = false)
>                          ->  Sort  (cost=571.97..576.38 rows=1766
> width=21) (actual time=125.000..156.000 rows=6902 loops=1)
>                                Sort Key: ss.id
>                                ->  Hash Join  (cost=1.11..476.72
> rows=1766 width=21) (actual time=0.000..93.000 rows=6902 loops=1)
>                                      Hash Cond:
> ("outer".release_code_id = "inner".id)
>                                      ->  Index Scan Backward using
> current_status_date_idx on shipment_status ss  (cost=0.00..387.35
> rows=14122 width=21) (actual time=0.000..16.000 rows=14925 loops=1)
>                                            Index Cond: (date >=
> (('now'::text)::date - 31))
>                                      ->  Hash  (cost=1.10..1.10 rows=1
> width=4) (actual time=0.000..0.000 rows=0 loops=1)
>                                            ->  Seq Scan on
> release_code rc (cost=0.00..1.10 rows=1 width=4) (actual
> time=0.000..0.000 rows=1 loops=1)
>                                                  Filter:
> ((number)::text = '9'::text)
>                    ->  Hash  (cost=14.51..14.51 rows=2 width=35)
> (actual time=0.000..0.000 rows=0 loops=1)
>                          ->  Nested Loop  (cost=4.92..14.51 rows=2
> width=35) (actual time=0.000..0.000 rows=2 loops=1)
>                                ->  Index Scan using person_pkey on
> person p (cost=0.00..5.73 rows=1 width=4) (actual time=0.000..0.000
> rows=1 loops=1)
>                                      Index Cond: (id = 355)
>                                ->  Hash Join  (cost=4.92..8.75 rows=2
> width=39) (actual time=0.000..0.000 rows=2 loops=1)
>                                      Hash Cond: ("outer".id =
> "inner".carrier_id)
>                                      ->  Seq Scan on carrier c
> (cost=0.00..3.54 rows=54 width=4) (actual time=0.000..0.000 rows=54
> loops=1)
>                                      ->  Hash  (cost=4.92..4.92 rows=2
> width=43) (actual time=0.000..0.000 rows=0 loops=1)
>                                            ->  Hash Join
> (cost=3.04..4.92 rows=2 width=43) (actual time=0.000..0.000 rows=2
> loops=1)
>                                                  Hash Cond:
> ("outer".carrier_id = "inner".carrier_id)
>                                                  ->  Seq Scan on
> carrier_code cc  (cost=0.00..1.57 rows=57 width=35) (actual
> time=0.000..0.000 rows=57 loops=1)
>                                                  ->  Hash
> (cost=3.04..3.04 rows=1 width=8) (actual time=0.000..0.000 rows=0
> loops=1)
>                                                        ->  Index Scan
> using carrier_to_person_person_id_idx on carrier_to_person ctp
> (cost=0.00..3.04 rows=1 width=8) (actual time=0.000..0.000 rows=1
> loops=1)
>                                                              Index
> Cond: (355 = person_id)
> Total runtime: 297.000 ms
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>      subscribe-nomail command to  so that your
>      message can get through to the mailing list cleanly



From:
"Ken Egervari"
Date:

Josh,

Thanks so much for your comments.  They are incredibly insightful and you
clearly know your stuff.  It's so great that I'm able to learn so much from
you.  I really appreciate it.

>Do you need the interior sort? It's taking ~93ms to get 7k rows from
>shipment_status, and then another 30ms to sort them. This isn't a lot,
>so it might be fine.

Running the query without the sort doesn't actually improve performance
unfortunately, which I find strange.  I think the analyze is giving bad
feedback because taking all sorts out completely makes no difference in
performance.  Dan Tow's book actually said the same thing... how sorting
rarely takes up the bulk of the work.  Although I know you didn't like his
book much, but I had observed that in my experience too.

>Also, did you ever try CLUSTER current_status_date_idx ON shipment_status.
>This groups the rows in shipment_status by their status date, which
>helps put items with the same date next to eachother. This may effect
>other portions of the query, or other queries. Also, if you are
>inserting sequentially, it would seem that the items would already be
>naturally near eachother based on date.

Yes, this was one of the first things I tried actually and it is currently
clustered.  Since shipment status comes into our system at real time, the
dates are more or less in order as well.

>The next big cost is having to merge the 28k rows with the fast hash
>plan, which takes about 80ms.
>
>I guess the biggest issue is that you are doing a lot of work, and it
>takes time to do it. Also, I've noticed that this query is being run
>with exactly the same data. Which is good to compare two methods. But
>remember to test on multiple potential values. You might be better off
>one way with this query, but much worse for a different dataset. I
>noticed that this seems to have fewer rows than what postgres thinks the
>*average* number would be. (It predicts 60k and you only get 28k rows).

Well, the example where p.id = 355 is an above normal case where performance
is typically bad.  If a user's company has very few shipments and
shipment_status rows, performance isn't going to matter much and those
queries usually perform much faster.  I really needed to tune this for the
larger customers who do have thousands of rows for their entire company and
will probably reach 6 digits by the end of next year.  For the person 355,
they've only been on the system for 3 months and they already have 27700
rows.  Even if this makes the smaller customers a bit slower, I think it's
worth it if I can speed up cases like this, who all have very similar data
distribution.

>If this query is performed a lot, and you can be okay with a slight
>delay in updating, you could always switch to some sort of lazy
>materialized view.

I thought about this, but it's very important since shipment and
shipment_status are both updated in real time 24/7/365.  I think I might be
able to cache it within the application for 60 seconds at most, but it would
make little difference since people tend to refresh within that time anyway.
It's very important that real-time inforamtion exists though.

>You could also always throw more hardware at it. :) If the
>shipment_status is one of the bottlenecks, create a 4-disk raid10 and
>move the table over.
>I don't remember what your hardware is, but I don't remember it being a
>quad opteron with 16GB ram, and 20 15k SCSI disks, with the transaction
>log on a solid state disk. :)

That sounds like an awesome system.  I loved to have something like that.
Unfortunately, the production server is just a single processor machine with
1 GB ram.  I think throwing more disks at it is probably the best bet,
moving the shipment and shipment_status tables over as you suggested.
That's great advice.

>Why do you need the query to be 30ms? ~250ms is still pretty fast. If
>you are needing updates faster than that, you might look more into *why*
>and then handle it from a higher level.

30ms is a good target, although I guess I was naive for setting that goal
perhaps.  I've just taken queries that ran at 600ms and with 1 or 2 indexes,
they went down to 15ms.

Let's say we have 200 users signed into the application at the same time.
The application refreshes their shipment information automatically to make
sure it's up to date on the user's screen.  The application will execute the
query we are trying to tune every 60 seconds for most of these users.  Users
can set the refresh time to be higher, but 60 is the lowest amount so I'm
just assuming everyone has it at 60.

Anyway, if you have 200 users logged in, that's 200 queries in the 60 second
period, which is about 3-4 queries every second.  As you can see, it's
getting maxed out, and because of bad luck, the queries are bunched together
and are being called at the same time, making 8-9 queries in the same second
and that's where the performance is starting to degrade.  I just know that
if I could get this down to 30 ms, or even 100, we'd be okay for a few
months without throwing hardware at the problem.   Also keep in mind that
other application logic and Hibernate mapping is occuring to, so 3-4 queries
a second is already no good when everything is running on a single machine.

This isn't the best setup, but it's the best we can afford.  We are just a
new startup company.  Cheaper servers and open source keep our costs low.
But money is starting to come in after 10 months of hard work, so we'll be
able to replace our server within the next 2 months.  It'll be a neccessity
because we are signing on some big clients now and they'll have 40 or 50
users for a single company.  If they are all logged in at the same time,
that's a lot of queries.

>And naturally, the most important this is to test it under load. 250ms
>is pretty good, but if under load it goes back to 6s, then we probably
>should look for different alternatives. Also, what is the load that is
>causing the problem? Is it that you have some other big seqscans which
>are causing all of your tables to go out of cache?

No, this query and another very close to it are probably the most executed
in the system.  In fact, even checking the page stats on the web server
tells us that the pages that use these queries are 80% of the pages viewed
in our application.  If I can fix this problem, I've fixed our performance
problems period.  The statistics queries are very slow too, but I don't care
about that since nobody goes to them much (maybe once a month.  People don't
mind waiting for that sort of information anyway).

I'm very interested in those other alternatives since I may have to
experiment with them.  I'm under the impression that this query is actually
performing quite well for what I'm throwing at it and the work that it's
doing.

>Also, I believe I remember you saying that your production server is a
>P4, is that a single P4? Because I know postgres prefers Opterons to
>Pentium Xeons when in a multiprocessor machine. Look through the
>archives about spinlocks and the context switch bug. (context storm,
>etc). Plus, since opterons are 64-bit, you can throw a lot more RAM at
>them. I believe opterons outperform xeons for the same cost, *and* you
>can scale them up with extra ram.

Yeah, we have nothing of that sort.  It's really just a P4 3.0 Ghz
processor.  Like I mentioned before, we just put computers together from
what we had and built our application on them.  Our business is new, we
don't have a lot of money and we're just starting to actually have a good
client base.  It's finally growing after all of this time but we are still
using the servers we started with.

>But remember, the biggest bottleneck is almost *always* the I/O. So put
>more & faster disks into the system first.

I will price that raid setup you recommended.  That will probably be the
first adjustment to our server if we don't just replace the entire thing.

Thanks again,
Ken


From:
Richard Huxton
Date:

Ken Egervari wrote:
> Let's say we have 200 users signed into the application at the same
> time. The application refreshes their shipment information automatically
> to make sure it's up to date on the user's screen.  The application will
> execute the query we are trying to tune every 60 seconds for most of
> these users.  Users can set the refresh time to be higher, but 60 is the
> lowest amount so I'm just assuming everyone has it at 60.
>
> Anyway, if you have 200 users logged in, that's 200 queries in the 60
> second period, which is about 3-4 queries every second.

Can you turn the problem around? Calculate what you want for all users
(once every 60 seconds) and stuff those results into a summary table.
Then let the users query the summary table as often as they like (with
the understanding that the figures aren't going to update any faster
than once a minute)
--
   Richard Huxton
   Archonet Ltd

From:
"Ken"
Date:

Richard,

What do you mean by summary table?  Basically a cache of the query into a
table with replicated column names of all the joins?  I'd probably have to
whipe out the table every minute and re-insert the data for each carrier in
the system.  I'm not sure how expensive this operation would be, but I'm
guessing it would be fairly heavy-weight.  And maintaince would be a lot
harder because of the duplicated columns, making refactorings on the
database more error-prone.  Am I understanding your suggestion correctly?
Please correct me if I am.

> Can you turn the problem around? Calculate what you want for all users
> (once every 60 seconds) and stuff those results into a summary table. Then
> let the users query the summary table as often as they like (with the
> understanding that the figures aren't going to update any faster than once
> a minute)


From:
John Arbash Meinel
Date:

Ken wrote:

> Richard,
>
> What do you mean by summary table?  Basically a cache of the query
> into a table with replicated column names of all the joins?  I'd
> probably have to whipe out the table every minute and re-insert the
> data for each carrier in the system.  I'm not sure how expensive this
> operation would be, but I'm guessing it would be fairly heavy-weight.
> And maintaince would be a lot harder because of the duplicated
> columns, making refactorings on the database more error-prone.  Am I
> understanding your suggestion correctly? Please correct me if I am.
>
>> Can you turn the problem around? Calculate what you want for all
>> users (once every 60 seconds) and stuff those results into a summary
>> table. Then let the users query the summary table as often as they
>> like (with the understanding that the figures aren't going to update
>> any faster than once a minute)
>
It's the same idea of a materialized view, or possibly just a lazy cache.

Just try this query:

CREATE TABLE cachedview AS
select p.id as person_id, s.*, ss.*

from shipment s
inner join shipment_status ss on s.current_status_id=ss.id
inner join release_code rc on ss.release_code_id=rc.id
left outer join driver d on s.driver_id=d.id
left outer join carrier_code cc on s.carrier_code_id=cc.id
where s.carrier_code_id in (
 select cc.id
 from person p
  inner join carrier_to_person ctp on p.id=ctp.person_id
  inner join carrier c on ctp.carrier_id=c.id
  inner join carrier_code cc on cc.carrier_id = c.id
)
and s.current_status_id is not null
and s.is_purged=false
and(rc.number='9' )
and(ss.date>=current_date-31 )

order by ss.date desc ;

Notice that I took out the internal p.id = blah.
Then you can do:

CREATE INDEX cachedview_person_id_idx ON cachedview(person_id);

Then from the client side, you can just run:
SELECT * from cachedview WHERE person_id = <id>;

Now, this assumes that rc.number='9' is what you always want. If that
isn't the case, you could refactor a little bit.

This unrolls all of the work, a table which should be really fast to
query. If this query takes less than 10s to generate, than just have a
service run it every 60s. I think for refreshing, it is actually faster
to drop the table and recreate it, rather than deleteing the entries.
Dropping also has the advantage that if you ever add more rows to s or
ss, then the table automatically gets the new entries.

Another possibility, is to have the "cachedview" not use "s.*, ss.*",
but instead just include whatever the primary keys are for those tables.
Then your final query becomes:

SELECT s.*, ss.* FROM cachedview cv, s, ss WHERE cv.person_id = <id>,
cv.s_id = s.<pkey>, cv.ss_id = ss.<pkey>;

Again, this should be really fast, because you should have an index on
cv.person_id and only have say 300 rows there, and then you are just
fetching a few rows from s and ss. You can also use this time to do some
of your left joins against other tables.

Does this make sense? The biggest advantage you have is your "60s"
statement. With that in hand, I think you can do a lot of caching
optimizations.

John
=:->


From:
John Arbash Meinel
Date:

Ken Egervari wrote:

> Josh,
>
...

> I thought about this, but it's very important since shipment and
> shipment_status are both updated in real time 24/7/365.  I think I
> might be able to cache it within the application for 60 seconds at
> most, but it would make little difference since people tend to refresh
> within that time anyway. It's very important that real-time
> inforamtion exists though.
>
Is 60s real-time enough for you? That's what it sounds like. It would be
nice if you could have 1hr, but there's still a lot of extra work you
can do in 60s.

>> You could also always throw more hardware at it. :) If the
>> shipment_status is one of the bottlenecks, create a 4-disk raid10 and
>> move the table over.
>> I don't remember what your hardware is, but I don't remember it being a
>> quad opteron with 16GB ram, and 20 15k SCSI disks, with the transaction
>> log on a solid state disk. :)
>
>
> That sounds like an awesome system.  I loved to have something like
> that. Unfortunately, the production server is just a single processor
> machine with 1 GB ram.  I think throwing more disks at it is probably
> the best bet, moving the shipment and shipment_status tables over as
> you suggested. That's great advice.
>
Well, disk I/O is one side, but probably sticking another 1GB (2GB
total) also would be a fairly economical upgrade for performance.

You are looking for query performance, not really update performance,
right? So buy a 4-port SATA controller, and some WD Raptor 10k SATA
disks. With this you can create a RAID10 for < $2k (probably like $1k).

> 30ms is a good target, although I guess I was naive for setting that
> goal perhaps.  I've just taken queries that ran at 600ms and with 1 or
> 2 indexes, they went down to 15ms.

It all depends on your query. If you have a giant table (1M rows), and
you are doing a seqscan for only 5 rows, then adding an index will give
you enormous productivity gains. But you are getting 30k rows, and
combining them with 6k rows, plus a bunch of other stuff. I think we've
tuned the query about as far as we can.

>
> Let's say we have 200 users signed into the application at the same
> time. The application refreshes their shipment information
> automatically to make sure it's up to date on the user's screen.  The
> application will execute the query we are trying to tune every 60
> seconds for most of these users.  Users can set the refresh time to be
> higher, but 60 is the lowest amount so I'm just assuming everyone has
> it at 60.
>
> Anyway, if you have 200 users logged in, that's 200 queries in the 60
> second period, which is about 3-4 queries every second.  As you can
> see, it's getting maxed out, and because of bad luck, the queries are
> bunched together and are being called at the same time, making 8-9
> queries in the same second and that's where the performance is
> starting to degrade.  I just know that if I could get this down to 30
> ms, or even 100, we'd be okay for a few months without throwing
> hardware at the problem.   Also keep in mind that other application
> logic and Hibernate mapping is occuring to, so 3-4 queries a second is
> already no good when everything is running on a single machine.
>
The other query I just sent, where you do the query for all users at
once, and then cache the result, *might* be cheaper than doing a bunch
of different queries.
However, you may find that doing the query for *all* users takes to
long. So you could keep another table indicating who the most recent
people logged in are, and then only cache the info for those people.
This does start getting a little more involved, so see if you can do all
users before heading down this road.

> This isn't the best setup, but it's the best we can afford.  We are
> just a new startup company.  Cheaper servers and open source keep our
> costs low. But money is starting to come in after 10 months of hard
> work, so we'll be able to replace our server within the next 2
> months.  It'll be a neccessity because we are signing on some big
> clientsnow and they'll have 40 or 50 users for a single company.  If
> they are all logged in at the same time, that's a lot of queries.
>
Sure. Just realize you can't really support 200 concurrent connections
with a single P4 and 1GB of ram.

John
=:->


From:
Josh Berkus
Date:

Ken,

> I did everything you said and my query does perform a bit better.  I've
> been getting speeds from 203 to 219 to 234 milliseconds now.   I tried
> increasing the work mem and the effective cache size from the values you
> provided, but I didn't see any more improvement.  I've tried to looking
> into setting the shared buffers for Windows XP, but I'm not sure how to do
> it.  I'm looking in the manual at:

Now that you know how to change the shared_buffers, want to go ahead and run
the query again?

I'm pretty concerned about your case, because based on your description I
would expect < 100ms on a Linux machine.    So I'm wondering if this is a
problem with WindowsXP performance, or if it's something we can fix through
tuning.

--
Josh Berkus
Aglio Database Solutions
San Francisco

From:
John A Meinel
Date:

John Arbash Meinel wrote:

> Ken wrote:
>
>> Richard,
>>
>> What do you mean by summary table?  Basically a cache of the query
>> into a table with replicated column names of all the joins?  I'd
>> probably have to whipe out the table every minute and re-insert the
>> data for each carrier in the system.  I'm not sure how expensive this
>> operation would be, but I'm guessing it would be fairly heavy-weight.
>> And maintaince would be a lot harder because of the duplicated
>> columns, making refactorings on the database more error-prone.  Am I
>> understanding your suggestion correctly? Please correct me if I am.
>>
>>> Can you turn the problem around? Calculate what you want for all
>>> users (once every 60 seconds) and stuff those results into a summary
>>> table. Then let the users query the summary table as often as they
>>> like (with the understanding that the figures aren't going to update
>>> any faster than once a minute)
>>
>>
> It's the same idea of a materialized view, or possibly just a lazy cache.
>
...

> This unrolls all of the work, a table which should be really fast to
> query. If this query takes less than 10s to generate, than just have a
> service run it every 60s. I think for refreshing, it is actually faster
> to drop the table and recreate it, rather than deleteing the entries.
> Dropping also has the advantage that if you ever add more rows to s or
> ss, then the table automatically gets the new entries.
>
Just as a small update. If completely regenerating the cache takes to
long, the other way to do it, is to create insert and update triggers on
s and ss, such that as they change, they also update the cachedview table.

Something like

CREATE TRIGGER on_ss_ins AFTER INSERT ON ss FOR EACH ROW EXECUTE
    INSERT INTO cached_view SELECT p.id as person_id, s.*, ss.* FROM
<the big stuff> WHERE s.id = NEW.id;

This runs the same query, but notice that the WHERE means it only allows
the new row. So this query should run fast. It is a little bit of
overhead on each of your inserts, but it should keep the cache
up-to-date. With something like this, I would have the final client
query still include the date restriction, since you accumulate older
rows into the cached view. But you can run a daily process that prunes
out everything older than 31 days, which keeps the cachedview from
getting really large.

John
=:->