Re: Help with tuning this query (with explain analyze finally) - Mailing list pgsql-performance

From Ken Egervari
Subject Re: Help with tuning this query (with explain analyze finally)
Date
Msg-id 001a01c51f8f$32bba5b0$6a01a8c0@KEN
Whole thread Raw
In response to Help with tuning this query  ("Ken Egervari" <ken@upfactor.com>)
List pgsql-performance
> If so, I'd have to take the EXPLAIN ANALYZE results with a big grain of
> salt, because what it's trying to do is add up a lot of
> mostly-sub-millisecond intervals.  What would essentially happen is that
> whichever plan node had control at a particular millisecond boundary
> would get charged for the whole preceding millisecond, and any other
> nodes (which might have actually eaten most of the millisecond) would
> get charged nothing.

Well, we do know that it's at least 75% accurate.  I'm only looking for a
relative increase in performance.  My goal is to try and get this query down
to 30 milliseconds. But even 125 or 75 would be an improvement.  Any
improvement, even based on fuzzy data, is still an improvement.  Being
precise isn't really that important, at least not to me or the people using
the application.  I can see how rounding can throw off results in the inner
parts of the plan though, but I think we should try and work with the
explain as it is.  If there is anything else I can give you to help me out,
please ask and I will kindly do it.  I want to make this easy for you.

> Over a sufficiently long query run, the errors would average out, but
> this wasn't that long --- 312 milliseconds, so in essence we are trying
> to estimate the query's behavior from only 312 samples of where it was
> at the millisecond boundaries.  I don't trust profiles based on less
> than a few thousand samples ...

I'm just using data from the production database, which only has 5 digits
worth of rows in the main tables.  I don't think I can get millions of rows
in these tables, although I wish I could.  I'd have to write a program to
insert the data randomly and try to make it distributed the way a real
production database might look in a few years if I wanted the most accurate
results.  I would try to make the dates bunched up correctly and add more
carriers and shipments over time (as more customers would use the system)
expoentially.

But I'm trying to be practical too.  This query is too slow for 5 digits of
rows in the database.  Imagine how bad it would be with millions!
Unfortunately, this query gets ran by hundreds of people logged in every 60
seconds on average.  It must be as fast as possible.  During peak times,
people have to wait 5 or 6 seconds just to see the results of this query.

I understand the app may be at fault too, but if this query performed
faster, I'm sure that would solve that problem because it's inheritly slow
and the app is very well layered.  It makes good use of frameworks like
Spring, Hibernate and database pooling, which have been used on many
applications and have been running very well for us.   The fact that the
query is slow in PgAdmin III or phpPgAdmin speaks that the query can be
tuned better.

I am no master tuner.  I have read as much as I could about database tuning
in general, about the proper use of Hibernate and so on.  Frankly, I am not
experienced enough to solve this problem and I wish to learn from the
experts, like you Tom, John, Ragnar and others that have responded kindly to
my request.

> Most modern machines seem to have clocks that can count elapsed time
> down to near the microsecond level.  Anyone know if it's possible to get
> such numbers out of Windows, or are we stuck with milliseconds?

These results came from PgAdmin III directly.  I'm not sure how I can get
different results even if I knew of a way.


pgsql-performance by date:

Previous
From: John A Meinel
Date:
Subject: Re: Help with tuning this query (with explain analyze finally)
Next
From: "Ken Egervari"
Date:
Subject: Re: Help with tuning this query (more musings)