Re: [HACKERS] EXPLAIN ANALYZE on 8.2 - Mailing list pgsql-performance

From Martijn van Oosterhout
Subject Re: [HACKERS] EXPLAIN ANALYZE on 8.2
Date
Msg-id 20061215154845.GL958@svana.org
Whole thread Raw
In response to Re: [HACKERS] EXPLAIN ANALYZE on 8.2  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Fri, Dec 15, 2006 at 09:56:57AM -0500, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > On Fri, Dec 15, 2006 at 12:20:46PM +0000, Simon Riggs wrote:
> >> Maybe sampling every 10 rows will bring things down to an acceptable
> >> level (after the first N). You tried less than 10 didn't you?
>
> > Yeah, it reduced the number of calls as the count got larger. It broke
> > somewhere, though I don't quite remember why.
>
> The fundamental problem with it was the assumption that different
> executions of a plan node will have the same timing.  That's not true,
> in fact not even approximately true.  IIRC the patch did realize
> that first-time-through is not a predictor for the rest, but some of
> our plan nodes have enormous variance even after the first time.
> I think the worst case is batched hash joins.

It didn't assume that because that's obviously bogus. It assumed the
durations would be spread as a normal distribution. Which meant that
over time the average of the measured iterations would approch the
actual average. It tried to take enough measurements to try and keep
expected error small, but it's statistics, you can only say "this will
give the right answer >95% of the time".

You are correct though, the error was caused by unexpectedly large
variations, or more likely, an unexpected distribution curve.
Statistically, we took enough samples to not be affected significantly
by large variations. Even if it looked more like a gamma distribution
it should not have been as far off as it was.

Looking at alternative approaches, like sampling with a timer, you end
up with the same problem: sometimes the calculations will fail and
produce something strange. The simplest example being than a 100Hz
timer is not going to produce any useful information for queries in the
millisecond range. A higher frequency timer than that is not going to
be available portably.

You could probably throw more effort into refining the statistics behind
it, but at some point we're going to have to draw a line and say: it's
going to be wrong X% of the time, deal with it. If we're not willing to
say that, then there's no point going ahead with any statistical
approach.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] EXPLAIN ANALYZE on 8.2
Next
From: "Merlin Moncure"
Date:
Subject: Re: New to PostgreSQL, performance considerations