Thread: EXPLAIN ANALYZE on 8.2

EXPLAIN ANALYZE on 8.2

From

Evgeny Gridasov

Date:

14 December 2006, 10:04:12

Hi, everybody!

Running the same query on pg 8.2 through EXPLAIN ANALYZE takes 4x-10x time as running it without it.
Is it ok?


Example:

testing=> select count(*) from auth_user;
  count
---------
 2575675
(1 row)

Time: 1450,829 ms
testing=> explain analyze select count(*) from auth_user;
                                                         QUERY PLAN
    

----------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=89814.87..89814.88 rows=1 width=0) (actual time=18460.436..18460.439 rows=1 loops=1)
   ->  Seq Scan on auth_user  (cost=0.00..83373.89 rows=2576389 width=0) (actual time=0.424..9871.520 rows=2575675
loops=1)
 Total runtime: 18460.535 ms
(3 rows)

Time: 18461,194 ms

Re: EXPLAIN ANALYZE on 8.2

From

Tom Lane

Date:

14 December 2006, 12:12:02

Evgeny Gridasov <eugrid@fpm.kubsu.ru> writes:
> Running the same query on pg 8.2 through EXPLAIN ANALYZE takes 4x-10x time as running it without it.

If your machine has slow gettimeofday() this is not surprising.  8.2 is
no worse (or better) than any prior version.

Some quick arithmetic from your results suggests that gettimeofday() is
taking about 3.3 microseconds, which is indeed pretty awful.  What sort
of machine is this exactly?

            regards, tom lane

Re: EXPLAIN ANALYZE on 8.2

From

Evgeny Gridasov

Date:

14 December 2006, 12:30:18

Tom,
Hello.

This is a Linux Debian 3.1 ontop of 2x XEON 3.4 Ghz.
PostgreSQL is 8.2 checked out from CVS REL8_2_STABLE yesterday.
I'm running the same Postgres on another machine,
with Debian Etch and have the same results.

On Thu, 14 Dec 2006 11:11:42 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Evgeny Gridasov <eugrid@fpm.kubsu.ru> writes:
> > Running the same query on pg 8.2 through EXPLAIN ANALYZE takes 4x-10x time as running it without it.
>
> If your machine has slow gettimeofday() this is not surprising.  8.2 is
> no worse (or better) than any prior version.
>
> Some quick arithmetic from your results suggests that gettimeofday() is
> taking about 3.3 microseconds, which is indeed pretty awful.  What sort
> of machine is this exactly?
>
>             regards, tom lane
>

Re: EXPLAIN ANALYZE on 8.2

From

Tom Lane

Date:

14 December 2006, 12:41:29

Evgeny Gridasov <eugrid@fpm.kubsu.ru> writes:
> This is a Linux Debian 3.1 ontop of 2x XEON 3.4 Ghz.
> PostgreSQL is 8.2 checked out from CVS REL8_2_STABLE yesterday.
> I'm running the same Postgres on another machine,
> with Debian Etch and have the same results.

Hmph.  With 8.2 on Fedora 5 on a 2.8Ghz dual Xeon, I get this:


regression=# create table foo as select x from generate_series(1,2500000) x;
SELECT
regression=# vacuum foo;
VACUUM
regression=# checkpoint;
CHECKPOINT
regression=# \timing
Timing is on.
regression=# select count(*) from foo;
  count
---------
 2500000
(1 row)

Time: 666.639 ms
regression=# select count(*) from foo;
  count
---------
 2500000
(1 row)

Time: 609.514 ms
regression=# explain analyze select count(*) from foo;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=44764.00..44764.01 rows=1 width=0) (actual time=1344.812..1344.813 rows=1 loops=1)
   ->  Seq Scan on foo  (cost=0.00..38514.00 rows=2500000 width=0) (actual time=0.031..748.571 rows=2500000 loops=1)
 Total runtime: 1344.891 ms
(3 rows)

Time: 1345.755 ms
regression=# explain analyze select count(*) from foo;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=44764.00..44764.01 rows=1 width=0) (actual time=1324.846..1324.847 rows=1 loops=1)
   ->  Seq Scan on foo  (cost=0.00..38514.00 rows=2500000 width=0) (actual time=0.046..748.582 rows=2500000 loops=1)
 Total runtime: 1324.902 ms
(3 rows)

Time: 1325.591 ms
regression=#

which works out to about 0.14 microsec per gettimeofday call, on a
machine that ought to be slower than yours.  So I think you've got
either a crummy motherboard, or a kernel that doesn't know the best
way to read the clock on that hardware.  There is some discussion
of this in the archives (probably in pgsql-hackers); look back around
May or so when we were unsuccessfully trying to hack EXPLAIN to use
fewer gettimeofday calls.

            regards, tom lane

Re: EXPLAIN ANALYZE on 8.2

From

"Kelly Burkhart"

Date:

14 December 2006, 19:43:57

On 12/14/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Evgeny Gridasov <eugrid@fpm.kubsu.ru> writes:
> > This is a Linux Debian 3.1 ontop of 2x XEON 3.4 Ghz.
> > PostgreSQL is 8.2 checked out from CVS REL8_2_STABLE yesterday.
> > I'm running the same Postgres on another machine,
> > with Debian Etch and have the same results.
>
> Hmph.  With 8.2 on Fedora 5 on a 2.8Ghz dual Xeon, I get this:
>
<snip>
> regression=# explain analyze select count(*) from foo;
>                                                      QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=44764.00..44764.01 rows=1 width=0) (actual time=1324.846..1324.847 rows=1 loops=1)
>    ->  Seq Scan on foo  (cost=0.00..38514.00 rows=2500000 width=0) (actual time=0.046..748.582 rows=2500000 loops=1)
>  Total runtime: 1324.902 ms
> (3 rows)
>
> Time: 1325.591 ms
> regression=#
>
> which works out to about 0.14 microsec per gettimeofday call, on a
> machine that ought to be slower than yours.  So I think you've got
> either a crummy motherboard, or a kernel that doesn't know the best
> way to read the clock on that hardware.  There is some discussion
> of this in the archives (probably in pgsql-hackers); look back around
> May or so when we were unsuccessfully trying to hack EXPLAIN to use
> fewer gettimeofday calls.

Yow!  I notice the same thing on our HP BL25p blades w/2*opteron 270
(four total cores, AMD 8111 or 8131 chipset).  1.25 microsec/call vs
my new desktop (Intel Core2 6300) 0.16 microsec/call.

I hope this isn't a "crummy mainboard" but I can't seem to affect
things by changing clock source (kernel 2.6.16 SLES10).  I tried
kernel command option clock=XXX where XXX in
(cyclone,hpet,pmtmr,tsc,pit), no option was significantly better than
the default.

Anyone know how this might be improved (short of replacing hardware)?

-K

Re: EXPLAIN ANALYZE on 8.2

From

Mark Kirkwood

Date:

14 December 2006, 19:59:54

Kelly Burkhart wrote:
>
> I hope this isn't a "crummy mainboard" but I can't seem to affect
> things by changing clock source (kernel 2.6.16 SLES10).  I tried
> kernel command option clock=XXX where XXX in
> (cyclone,hpet,pmtmr,tsc,pit), no option was significantly better than
> the default.
>
> Anyone know how this might be improved (short of replacing hardware)?
>

Updating the BIOS might be worth investigating, and then bugging your
Linux distro mailing list/support etc for more help. (What sort of
motherboard is it?)

Cheers

Mark

Re: EXPLAIN ANALYZE on 8.2

From

Tom Lane

Date:

14 December 2006, 20:05:47

"Kelly Burkhart" <kelly.burkhart@gmail.com> writes:
> I hope this isn't a "crummy mainboard" but I can't seem to affect
> things by changing clock source (kernel 2.6.16 SLES10).  I tried
> kernel command option clock=XXX where XXX in
> (cyclone,hpet,pmtmr,tsc,pit), no option was significantly better than
> the default.

I believe that on machines where gettimeofday is really nice and fast,
it doesn't require entry to the kernel at all; there's some hack that
makes the clock readable from userspace.  (Obviously a switch to kernel
mode would set you back a lot of the cycles involved here.)  So it's not
so much the kernel that you need to teach as glibc.  How you do that is
beyond my expertise, but maybe that will help you google for the right
thing ...

            regards, tom lane

Re: EXPLAIN ANALYZE on 8.2

From

"Simon Riggs"

Date:

15 December 2006, 06:29:01

On Thu, 2006-12-14 at 19:05 -0500, Tom Lane wrote:
> "Kelly Burkhart" <kelly.burkhart@gmail.com> writes:
> > I hope this isn't a "crummy mainboard" but I can't seem to affect
> > things by changing clock source (kernel 2.6.16 SLES10).  I tried
> > kernel command option clock=XXX where XXX in
> > (cyclone,hpet,pmtmr,tsc,pit), no option was significantly better than
> > the default.
>
> I believe that on machines where gettimeofday is really nice and fast,
> it doesn't require entry to the kernel at all; there's some hack that
> makes the clock readable from userspace.  (Obviously a switch to kernel
> mode would set you back a lot of the cycles involved here.)  So it's not
> so much the kernel that you need to teach as glibc.  How you do that is
> beyond my expertise, but maybe that will help you google for the right
> thing ...

Until we work out a better solution we can fix this in two ways:

1. EXPLAIN ANALYZE [ [ WITH | WITHOUT ] TIME STATISTICS ] ...

2. enable_analyze_timer = off | on (default) (USERSET)

A performance drop of 4x-10x is simply unacceptable when trying to tune
queries where the current untuned time is already too high. Tying down
production servers for hours on end when we know for certain all they
are doing is calling gettimeofday millions of times is not good. This
quickly leads to the view from objective people that PostgreSQL doesn't
have a great optimizer, whatever we say in its defence. I don't want to
leave this alone, but I don't want to spend a month fixing it either.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Re: [HACKERS] EXPLAIN ANALYZE on 8.2

From

Martijn van Oosterhout

Date:

15 December 2006, 06:50:41

On Fri, Dec 15, 2006 at 10:28:08AM +0000, Simon Riggs wrote:
> Until we work out a better solution we can fix this in two ways:
>
> 1. EXPLAIN ANALYZE [ [ WITH | WITHOUT ] TIME STATISTICS ] ...
>
> 2. enable_analyze_timer = off | on (default) (USERSET)

What exactly would this do? Only count actual rows or something? I
wrote a patch that tried statistical sampling, but the figures were too
far off for people's liking.

> A performance drop of 4x-10x is simply unacceptable when trying to tune
> queries where the current untuned time is already too high. Tying down
> production servers for hours on end when we know for certain all they
> are doing is calling gettimeofday millions of times is not good. This
> quickly leads to the view from objective people that PostgreSQL doesn't
> have a great optimizer, whatever we say in its defence. I don't want to
> leave this alone, but I don't want to spend a month fixing it either.

I think the best option is setitimer(), but it's not POSIX so
platform support is going to be patchy.

BTW, doing gettimeofday() without kernel entry is not really possible.
You could use the cycle counter but it has the problem that if you have
multiple CPUs you need to calibrate the result. If the CPU goes to
sleep, there's is no way for the userspace process to know. Only the
kernel has all the relevent information about what "time" is to get a
reasonable result.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

signature.asc

Re: [HACKERS] EXPLAIN ANALYZE on 8.2

From

Dimitri Fontaine

Date:

15 December 2006, 07:24:57

Hi list,

Le vendredi 15 décembre 2006 11:50, Martijn van Oosterhout a écrit :
> BTW, doing gettimeofday() without kernel entry is not really possible.
> You could use the cycle counter but it has the problem that if you have
> multiple CPUs you need to calibrate the result. If the CPU goes to
> sleep, there's is no way for the userspace process to know. Only the
> kernel has all the relevent information about what "time" is to get a
> reasonable result.

I remember having played with intel RDTSC (time stamp counter) for some timing
measurement, but just read from several sources (including linux kernel
hackers considering its usage for gettimeofday()  implementation) that TSC is
not an accurate method to have elapsed time information.

May be some others method than gettimeofday() are available (Lamport
Timestamps, as PGDG may have to consider having a distributed processing
ready EA in some future), cheaper and accurate?
After all, the discussion, as far as I understand it, is about having a
accurate measure of duration of events, knowing when they occurred in the day
does not seem to be the point.

My 2¢, hoping this could be somehow helpfull,
--
Dimitri Fontaine
http://www.dalibo.com/

Attachment

msg-22763-23756.dat

Re: [HACKERS] EXPLAIN ANALYZE on 8.2

From

"Simon Riggs"

Date:

15 December 2006, 08:21:40

On Fri, 2006-12-15 at 11:50 +0100, Martijn van Oosterhout wrote:
> On Fri, Dec 15, 2006 at 10:28:08AM +0000, Simon Riggs wrote:
> > Until we work out a better solution we can fix this in two ways:
> >
> > 1. EXPLAIN ANALYZE [ [ WITH | WITHOUT ] TIME STATISTICS ] ...
> >
> > 2. enable_analyze_timer = off | on (default) (USERSET)
>
> What exactly would this do? Only count actual rows or something?

Yes. It's better to have this than nothing at all.

> I
> wrote a patch that tried statistical sampling, but the figures were too
> far off for people's liking.

Well, I like your ideas, so if you have any more...

Maybe sampling every 10 rows will bring things down to an acceptable
level (after the first N). You tried less than 10 didn't you?

Maybe we can count how many real I/Os were required to perform each
particular row, so we can adjust the time per row based upon I/Os. ISTM
that sampling at too low a rate means we can't spot the effects of cache
and I/O which can often be low frequency but high impact.

> I think the best option is setitimer(), but it's not POSIX so
> platform support is going to be patchy.

Don't understand that. I thought that was to do with alarms and signals.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Re: [HACKERS] EXPLAIN ANALYZE on 8.2

From

Martijn van Oosterhout

Date:

15 December 2006, 08:25:20

On Fri, Dec 15, 2006 at 12:15:59PM +0000, Gregory Stark wrote:
> There are various attempts at providing better timing infrastructure at low
> overhead but I'm not sure what's out there currently. I expect to do this what
> we'll have to do is invent a pg_* abstraction that has various implementations
> on different architectures. On Solaris it can use DTrace internally, on Linux
> it might have something else (or more likely several different options
> depending on the age and config options of the kernel).

I think we need to move to a sampling approach. setitimer is good,
except it doesn't tell you if signals have been lost. Given they are
most likely to be lost during high disk I/O, they're actually
significant. I'm trying to think of a way around that. Then you don't
need a cheap gettimeofday at all...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: [HACKERS] EXPLAIN ANALYZE on 8.2

From

Martijn van Oosterhout

Date:

15 December 2006, 08:31:17

On Fri, Dec 15, 2006 at 12:20:46PM +0000, Simon Riggs wrote:
> > I
> > wrote a patch that tried statistical sampling, but the figures were too
> > far off for people's liking.
>
> Well, I like your ideas, so if you have any more...
>
> Maybe sampling every 10 rows will bring things down to an acceptable
> level (after the first N). You tried less than 10 didn't you?

Yeah, it reduced the number of calls as the count got larger. It broke
somewhere, though I don't quite remember why.

> Maybe we can count how many real I/Os were required to perform each
> particular row, so we can adjust the time per row based upon I/Os. ISTM
> that sampling at too low a rate means we can't spot the effects of cache
> and I/O which can often be low frequency but high impact.

One idea is to sample at fixed interval. Where the I/O cost is high,
there'll be a lot of sampling points. The issue being that the final
result are not totally accurate anymore.

> > I think the best option is setitimer(), but it's not POSIX so
> > platform support is going to be patchy.
>
> Don't understand that. I thought that was to do with alarms and signals.

On my system the manpage say setitimer() conforms to: SVr4, 4.4BSD. I'm
unsure how many of the supported platforms fall under that
catagorisation.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

signature.asc

Re: [HACKERS] EXPLAIN ANALYZE on 8.2

From

Florian Weimer

Date:

15 December 2006, 08:32:54

* Simon Riggs:

>> I think the best option is setitimer(), but it's not POSIX so
>> platform support is going to be patchy.
>
> Don't understand that. I thought that was to do with alarms and
> signals.

You could use it for sampling.  Every few milliseconds, you record
which code is executing (possibly using a global variable which is set
and reset accordingly).  But this might lead to tons of interrupted
system calls, and not all systems mask them, so this might not be an
option for the PostgreSQL code base.

On the other hand, if performance in more recent Linux releases (from
kernel.org) are acceptable, you should assume that the problem will
eventually fix itself.  FWIW, I see the 9x overhead on something that
is close to 2.6.17 (on AMD64/Opteron), so this could be wishful
thinking. 8-(

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

Re: [HACKERS] EXPLAIN ANALYZE on 8.2

From

Tom Lane

Date:

15 December 2006, 10:57:19

Martijn van Oosterhout <kleptog@svana.org> writes:
> On Fri, Dec 15, 2006 at 12:20:46PM +0000, Simon Riggs wrote:
>> Maybe sampling every 10 rows will bring things down to an acceptable
>> level (after the first N). You tried less than 10 didn't you?

> Yeah, it reduced the number of calls as the count got larger. It broke
> somewhere, though I don't quite remember why.

The fundamental problem with it was the assumption that different
executions of a plan node will have the same timing.  That's not true,
in fact not even approximately true.  IIRC the patch did realize
that first-time-through is not a predictor for the rest, but some of
our plan nodes have enormous variance even after the first time.
I think the worst case is batched hash joins.

            regards, tom lane

Re: [HACKERS] EXPLAIN ANALYZE on 8.2

From

Tom Lane

Date:

15 December 2006, 11:46:27

Gregory Stark <stark@enterprisedb.com> writes:
> There are various attempts at providing better timing infrastructure at low
> overhead but I'm not sure what's out there currently. I expect to do this what
> we'll have to do is invent a pg_* abstraction that has various implementations
> on different architectures.

You've got to be kidding.  Surely it's glibc's responsibility, not ours,
to implement gettimeofday correctly for the hardware.

            regards, tom lane

Re: [HACKERS] EXPLAIN ANALYZE on 8.2

From

Martijn van Oosterhout

Date:

15 December 2006, 11:49:08

On Fri, Dec 15, 2006 at 09:56:57AM -0500, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > On Fri, Dec 15, 2006 at 12:20:46PM +0000, Simon Riggs wrote:
> >> Maybe sampling every 10 rows will bring things down to an acceptable
> >> level (after the first N). You tried less than 10 didn't you?
>
> > Yeah, it reduced the number of calls as the count got larger. It broke
> > somewhere, though I don't quite remember why.
>
> The fundamental problem with it was the assumption that different
> executions of a plan node will have the same timing.  That's not true,
> in fact not even approximately true.  IIRC the patch did realize
> that first-time-through is not a predictor for the rest, but some of
> our plan nodes have enormous variance even after the first time.
> I think the worst case is batched hash joins.

It didn't assume that because that's obviously bogus. It assumed the
durations would be spread as a normal distribution. Which meant that
over time the average of the measured iterations would approch the
actual average. It tried to take enough measurements to try and keep
expected error small, but it's statistics, you can only say "this will
give the right answer >95% of the time".

You are correct though, the error was caused by unexpectedly large
variations, or more likely, an unexpected distribution curve.
Statistically, we took enough samples to not be affected significantly
by large variations. Even if it looked more like a gamma distribution
it should not have been as far off as it was.

Looking at alternative approaches, like sampling with a timer, you end
up with the same problem: sometimes the calculations will fail and
produce something strange. The simplest example being than a 100Hz
timer is not going to produce any useful information for queries in the
millisecond range. A higher frequency timer than that is not going to
be available portably.

You could probably throw more effort into refining the statistics behind
it, but at some point we're going to have to draw a line and say: it's
going to be wrong X% of the time, deal with it. If we're not willing to
say that, then there's no point going ahead with any statistical
approach.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: [HACKERS] EXPLAIN ANALYZE on 8.2

From

Tom Lane

Date:

15 December 2006, 11:57:46

Peter Eisentraut <peter_e@gmx.net> writes:
> Am Freitag, 15. Dezember 2006 11:28 schrieb Simon Riggs:
>> Until we work out a better solution we can fix this in two ways:
>>
>> 1. EXPLAIN ANALYZE [ [ WITH | WITHOUT ] TIME STATISTICS ] ...
>> 2. enable_analyze_timer = off | on (default) (USERSET)

> The second one is enough in my mind.

I don't see any point in either one.  If you're not going to collect
timing data then the only useful info EXPLAIN ANALYZE could provide is
knowledge of which rowcount estimates are badly off ... and to get that,
you have to wait for the query to finish, which may well be impractical
even without the gettimeofday overhead.  We had discussed upthread the
idea of having an option to issue a NOTICE as soon as any actual
rowcount exceeds the estimate by some-configurable-percentage, and that
seems to me to be a much more useful response to the problem of
"E.A. takes too long" than removing gettimeofday.

One thing that's not too clear to me though is how the NOTICE would
identify the node at which the rowcount was exceeded...

            regards, tom lane

Re: [HACKERS] EXPLAIN ANALYZE on 8.2

From

Gregory Stark

Date:

15 December 2006, 12:10:42

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
> > There are various attempts at providing better timing infrastructure at low
> > overhead but I'm not sure what's out there currently. I expect to do this what
> > we'll have to do is invent a pg_* abstraction that has various implementations
> > on different architectures.
>
> You've got to be kidding.  Surely it's glibc's responsibility, not ours,
> to implement gettimeofday correctly for the hardware.

Except for two things:

a) We don't really need gettimeofday. That means we don't need something
sensitive to adjustments made by ntp etc. In fact that would be actively bad.
Currently if the user runs "date" to reset his clock back a few days I bet
interesting things happen to a large explain analyze that's running.

In fact we don't need something that represents any absolute time, only time
elapsed since some other point we choose. That might be easier to implement
than what glibc has to do to implement gettimeofday fully.

b) glibc may not want to endure an overhead on every syscall and context
switch to make gettimeofday faster on the assumption that gettimeofday is a
rare call and it should pay the price rather than imposing an overhead on
everything else.

Postgres knows when it's running an explain analyze and a 1% overhead would be
entirely tolerable, especially if it affected the process pretty much evenly
unlike the per-gettimeofday-overhead which can get up as high as 100% on some
types of subplans and is negligible on others. And more to the point Postgres
wouldn't have to endure this overhead at all when it's not needed whereas
glibc has no idea when you're going to need gettimeofday next.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: [HACKERS] EXPLAIN ANALYZE on 8.2

From

Ron

Date:

15 December 2006, 12:56:04

At 10:45 AM 12/15/2006, Tom Lane wrote:
>Gregory Stark <stark@enterprisedb.com> writes:
> > There are various attempts at providing better timing infrastructure at low
> > overhead but I'm not sure what's out there currently. I expect to
> do this what
> > we'll have to do is invent a pg_* abstraction that has various
> implementations
> > on different architectures.
>
>You've got to be kidding.  Surely it's glibc's responsibility, not ours,
>to implement gettimeofday correctly for the hardware.
>
>                         regards, tom lane

I agree with Tom on this.  Perhaps the best compromise is for the pg
community to make thoughtful suggestions to the glibc community?

Ron Peacetree

Re: [HACKERS] EXPLAIN ANALYZE on 8.2

From

"Simon Riggs"

Date:

18 December 2006, 18:34:35

On Fri, 2006-12-15 at 10:57 -0500, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > Am Freitag, 15. Dezember 2006 11:28 schrieb Simon Riggs:
> >> Until we work out a better solution we can fix this in two ways:
> >>
> >> 1. EXPLAIN ANALYZE [ [ WITH | WITHOUT ] TIME STATISTICS ] ...
> >> 2. enable_analyze_timer = off | on (default) (USERSET)
>
> > The second one is enough in my mind.
>
> I don't see any point in either one.  If you're not going to collect
> timing data then the only useful info EXPLAIN ANALYZE could provide is
> knowledge of which rowcount estimates are badly off ... and to get that,
> you have to wait for the query to finish, which may well be impractical
> even without the gettimeofday overhead.

On a different part of this thread, you say:

On Fri, 2006-12-15 at 09:56 -0500, Tom Lane wrote:

> The fundamental problem with it was the assumption that different
> executions of a plan node will have the same timing.  That's not true,
> in fact not even approximately true.

It doesn't make sense to me to claim that the timing is so important
that we cannot do without it, at the same time as saying it isn't even
approximately true that is highly variable.

>  We had discussed upthread the
> idea of having an option to issue a NOTICE as soon as any actual
> rowcount exceeds the estimate by some-configurable-percentage, and that
> seems to me to be a much more useful response to the problem of
> "E.A. takes too long" than removing gettimeofday.

> One thing that's not too clear to me though is how the NOTICE would
> identify the node at which the rowcount was exceeded...

We'd have to output the whole EXPLAIN as a NOTICE for it to make any
sense. If we can't do without the timings, then half an EXPLAIN would be
even worse.

We'd need to take account of non-linear nodes. Hash nodes react badly
beyond a certain point, HashAgg even worse. Sort performs poorly after
the end of memory, as does Materialize. Other nodes are more linear so
would need a different percentage. I don't like the sound of a whole
gaggle of GUCs to describe that. Any ideas?

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Re: [HACKERS] EXPLAIN ANALYZE on 8.2

From

Tom Lane

Date:

19 December 2006, 00:43:21

"Simon Riggs" <simon@2ndquadrant.com> writes:
> On Fri, 2006-12-15 at 09:56 -0500, Tom Lane wrote:
>> The fundamental problem with it was the assumption that different
>> executions of a plan node will have the same timing.  That's not true,
>> in fact not even approximately true.

> It doesn't make sense to me to claim that the timing is so important
> that we cannot do without it, at the same time as saying it isn't even
> approximately true that is highly variable.

Huh?  What I said was that successive executions of the same plan node
may take considerably different amounts of time, and the proposed
sampling patch failed to handle that situation with acceptable accuracy.

            regards, tom lane