Thread: New EXPLAIN ANALYZE statement

New EXPLAIN ANALYZE statement

From

Martijn van Oosterhout

Date:

20 July 2001, 14:18:17

Well, I don't know how many people will have seen this one already so I'm
posting it here. This patch probably needs more feedback before being
applied.

- What it does:
Adds a new EXPLAIN ANALYZE statement which displays the plan tree, just like
EXPLAIN, but also executes the query and collects statistics on what
actually happened. The current statistics collected are:

Time till first tuple is returned
Time till all tuples are returned
Total number of tuples returned
Number of times Rescan is called

I wrote it to help analyze why certain queries were taking much longer than
the planner expected. I think for debugging purposes it is useful because by
asking people to post the output of EXPLAIN ANALYZE you can immediately see
the actual statistics and the planner's estimates next to each other.

- How it does it:
It adds a few calls in a few strategic places in the executor. Each node in
the query tree is accounted for separately. Only time processing the node is
counted, not the time before and after. It also adds two new files.

- Remaining issues:
Currently only select statements are allowed. This is because I can't work
out how to supress the actual effect of an UPDATE or DELETE query. I'm also
assuming that SELECT queries can't be rewritten into something that actually
changes something. Ouch! I just realised that functions can be called which
could do anything, so I truly need something to suppress the results.

Documentation, but I'll only do that if people are interested.

Do people like the name? The other thing I was thinking of was just ANALYZE
[query] but I thought that might be to much overloading of that name.

Anything else you people come up with.

- Example
# explain analyse select sum(cost) from dailycalls where band = 1 and billid is null;
NOTICE: QUERY PLAN:

Aggregate (cost=53723.88..53723.88 rows=1 width=8) (actual time=24956.02..24956.02 rows=1 loops=1)
-> Seq Scan on dailycalls (cost=0.00..53708.34 rows=6216 width=8) (actual time=463.16..24941.63 rows=5418 loops=1)

EXPLAIN

- Where to get it
It's attached, but also available at:
http://svana.org/kleptog/pgsql/timing-explain.patch

Please CC any replies, as I'm subscribed nopost.
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> It would be nice if someone came up with a certification system that
> actually separated those who can barely regurgitate what they crammed over
> the last few weeks from those who command secret ninja networking powers.

Attachment

timing-explain.patch

Re: New EXPLAIN ANALYZE statement

From

Tom Lane

Date:

20 July 2001, 16:55:36

Martijn van Oosterhout <kleptog@svana.org> writes:
> It adds a few calls in a few strategic places in the executor. Each node in
> the query tree is accounted for separately. Only time processing the node is
> counted, not the time before and after. It also adds two new files.

I'm still unconvinced that this approach will yield values that can
meaningfully be compared to the planner's cost estimates.  However,
the thing I *really* object to about this patch is that the statistics-
collection overhead is paid by every query whether it's being EXPLAINed
or not.  Kindly set it up so that that's not true.  (One way to do that
is to have EXPLAIN be responsible for traversing the tree and attaching
instrumentation structs to the nodes; then at runtime, the work is done
iff the struct links are not NULL.)

> Currently only select statements are allowed. This is because I can't work
> out how to supress the actual effect of an UPDATE or DELETE query. I'm also
> assuming that SELECT queries can't be rewritten into something that actually
> changes something. Ouch! I just realised that functions can be called which
> could do anything, so I truly need something to suppress the results.

I think this is wrongheaded; one of the things that might be interesting
is to know how much of the runtime is actually spent in the executor
toplevel, doing the update/delete/trigger firing/whatever.  And if your
query is calling expensive functions, why would you not want to know
that?  The correct approach is to indeed do the query, full up, as-is.
(And you should probably include the end-to-end time, user, system and
wall-clock, in the printout.)  What's needed is to design the user
interface so that it does not surprise people that the query is
executed.  This suggests to me that it should not be called EXPLAIN,
but something else.  (No, not ANALYZE, that's taken.)

Random other comments:

> +    InstrStopNode( node->instrument );
> +
> +    if( TupIsNull(result) )
> +        InstrEndLoop( node->instrument );

I think this isn't gonna work right for Group nodes.

> +  int tuplecount;           /* Tuples so far this loop */

Call me silly, but I think the tuple and iteration counts should be
doubles, not ints, to avoid overflow issues.

> +    bzero( i, sizeof(Instrumentation) );

Don't use bzero; use the ANSI-standard routines (memset, etc).

> +InstrStartNode( Instrumentation *i )    /* When we enter the node */
> +{
> +    if( !i )
> +        return;

This is just a personal thing, perhaps, but I *never* use a name like
"i" for anything except loop counters.  A struct argument ought to have
a less anonymous name.

> +    while( i->counter.tv_usec < 0 )
> +    {
> +        i->counter.tv_usec += 1000000;
> +        i->counter.tv_sec--;
> +    }

Don't you need to also reduce tv_usec to less than 1000000?  Else you
risk overflow of the usec field over successive executions.

> +    if( !timerisset( &i->counter ) )     /* Skip if nothing has happend */

Are these "timerisset" and "timerclear" thingies portable?  I can't find
anything about them in the man pages.  Since they're not used anywhere
in Postgres at the moment, I'd just as soon not see you add a new
platform dependency.  Just write out the clearing/testing of tv_sec and
tv_usec instead --- it's not like your code will work if those fields
don't exist with those names...

A general rule when dealing with system-defined stuff is to look
around and see how similar stuff is coded elsewhere in the Postgres
backend, and do it the same way.  That way you can take advantage of
literally years' worth of portability knowledge that we have
accumulated, rather than repeating old mistakes that we have fixed.

            regards, tom lane

Re: New EXPLAIN ANALYZE statement

From

Martijn van Oosterhout

Date:

21 July 2001, 00:44:05

On Fri, Jul 20, 2001 at 01:39:34PM -0400, Tom Lane wrote:
> I'm still unconvinced that this approach will yield values that can
> meaningfully be compared to the planner's cost estimates.  However,
> the thing I *really* object to about this patch is that the statistics-
> collection overhead is paid by every query whether it's being EXPLAINed
> or not.  Kindly set it up so that that's not true.  (One way to do that
> is to have EXPLAIN be responsible for traversing the tree and attaching
> instrumentation structs to the nodes; then at runtime, the work is done
> iff the struct links are not NULL.)

Why do you think the results cannot be compared to the estimates? The
estimates are after all an estimate of how long it would take. The code
already handles having NULLs in the structure so it doesn't do anything. I
was thinking if just having a flag in the top level query node and use that
to decide whether to allocate the structure in ExecInitNode.

> I think this is wrongheaded; one of the things that might be interesting
> is to know how much of the runtime is actually spent in the executor
> toplevel, doing the update/delete/trigger firing/whatever.  And if your
> query is calling expensive functions, why would you not want to know
> that?  The correct approach is to indeed do the query, full up, as-is.
> (And you should probably include the end-to-end time, user, system and
> wall-clock, in the printout.)  What's needed is to design the user
> interface so that it does not surprise people that the query is
> executed.  This suggests to me that it should not be called EXPLAIN,
> but something else.  (No, not ANALYZE, that's taken.)

Unfortunatly (as far as I can tell) calls to functions don't have nodes in
the tree so I don't see anywhere to store the actual data. Maybe there is
some other non-shared structure I can use. I'll look.

What you are suggesting would make the explain output very much more
verbose. If that's what you're insterested in. I'm not sure adding all those
other time measurements would be particularly helpful. I agree with the
renaming but no-one seems to have any suggestions. Maybe TIME [query], like
at a command prompt.

> Random other comments:
>
> > +    InstrStopNode( node->instrument );
> > +
> > +    if( TupIsNull(result) )
> > +        InstrEndLoop( node->instrument );
>
> I think this isn't gonna work right for Group nodes.

I noticed that Group node act strange, the tuplecount becomes the whole
number of tuples involved and the loop count is the number of groups. I was
going to look into it but am not convinced it is wrong as is.

> > +  int tuplecount;           /* Tuples so far this loop */
>
> Call me silly, but I think the tuple and iteration counts should be
> doubles, not ints, to avoid overflow issues.

I'd prefer a 64-bit int actually. Floating point is accurate but no
precise.

> > +    bzero( i, sizeof(Instrumentation) );
>
> Don't use bzero; use the ANSI-standard routines (memset, etc).

OK.

> > +InstrStartNode( Instrumentation *i )    /* When we enter the node */
> > +{
> > +    if( !i )
> > +        return;
>
> This is just a personal thing, perhaps, but I *never* use a name like
> "i" for anything except loop counters.  A struct argument ought to have
> a less anonymous name.

s/i/instr/ then

> > +    while( i->counter.tv_usec < 0 )
> > +    {
> > +        i->counter.tv_usec += 1000000;
> > +        i->counter.tv_sec--;
> > +    }
>
> Don't you need to also reduce tv_usec to less than 1000000?  Else you
> risk overflow of the usec field over successive executions.

I think the way the calculation is written, tv_usec can't be greater than
1000000.

> > +    if( !timerisset( &i->counter ) )     /* Skip if nothing has happend */
>
> Are these "timerisset" and "timerclear" thingies portable?  I can't find
> anything about them in the man pages.  Since they're not used anywhere
> in Postgres at the moment, I'd just as soon not see you add a new
> platform dependency.  Just write out the clearing/testing of tv_sec and
> tv_usec instead --- it's not like your code will work if those fields
> don't exist with those names...

I'm not sure. On my system (Debian Linux) they are described in the manpage
for gettimeofday. They are just one line #defines, not a major drama. I
can copy them in if you like.

> A general rule when dealing with system-defined stuff is to look
> around and see how similar stuff is coded elsewhere in the Postgres
> backend, and do it the same way.  That way you can take advantage of
> literally years' worth of portability knowledge that we have
> accumulated, rather than repeating old mistakes that we have fixed.

Hmm, ok. gettimeofday is used elsewhere. There are no other system calls
involved.

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> It would be nice if someone came up with a certification system that
> actually separated those who can barely regurgitate what they crammed over
> the last few weeks from those who command secret ninja networking powers.

Re: New EXPLAIN ANALYZE statement

From

Martijn van Oosterhout

Date:

21 July 2001, 00:56:45

On Sat, Jul 21, 2001 at 02:18:45AM +1000, Justin Clift wrote:
> Hi Martijn,
>
> Why would you want to suppress the actual effect of non-select
> statements?
>
> If people want to time them, I think they should.  If they don't, they
> shouldn't run the query.  :)

Well, it depends if most of the time is taken working out what to enter or
if it's taken actually inserting the data (for an INSERT statement). This
patch is more aimed toward helping optimise the query, and just like the
planner doesn't take into account the time taken to send the data to the
frontend, this doesn't care how long it takes to insert the data, since that
is not something you can change.

If you want overall resource usage and the time taken to execute the whole
query, that already exists. If you want to find bottlenecks in the
processing then you need to get a profiler. This patch is aimed at the
planner because that's the *only* area that can be tuned and adjusted by
people trying to improve performance of thier queries.

> Is it harder to create or manage?

To change the way this code is called is easy, just fiddle the grammer. Once
people agree on a name it'll be done.

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> It would be nice if someone came up with a certification system that
> actually separated those who can barely regurgitate what they crammed over
> the last few weeks from those who command secret ninja networking powers.

Re: New EXPLAIN ANALYZE statement

From

Philip Warner

Date:

21 July 2001, 01:06:58

At 00:49 21/07/01 +1000, Martijn van Oosterhout wrote:
>- What it does:
>Adds a new EXPLAIN ANALYZE statement which displays the plan tree, just like
>EXPLAIN, but also executes the query and collects statistics on what
>actually happened. The current statistics collected are:
>
>Time till first tuple is returned
>Time till all tuples are returned
>Total number of tuples returned
>Number of times Rescan is called
>

This is a great idea; it's a good first step in getting some stats on how
accurate the optimizer settings are, as well as how good the assumptions
are. But you probably need to collect information about IOs & buffer hits
as well. Perhaps you should look at Jan's stats collection patches and see
if you can piggyback on them somehow?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|
                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

Re: New EXPLAIN ANALYZE statement

From

Martijn van Oosterhout

Date:

21 July 2001, 01:28:01

On Sat, Jul 21, 2001 at 12:04:47PM +1000, Philip Warner wrote:
> At 00:49 21/07/01 +1000, Martijn van Oosterhout wrote:
> >- What it does:
> >Adds a new EXPLAIN ANALYZE statement which displays the plan tree, just like
> >EXPLAIN, but also executes the query and collects statistics on what
> >actually happened. The current statistics collected are:
> >
> >Time till first tuple is returned
> >Time till all tuples are returned
> >Total number of tuples returned
> >Number of times Rescan is called
> >
>
> This is a great idea; it's a good first step in getting some stats on how
> accurate the optimizer settings are, as well as how good the assumptions
> are. But you probably need to collect information about IOs & buffer hits
> as well. Perhaps you should look at Jan's stats collection patches and see
> if you can piggyback on them somehow?

Well, the above statistics are done because they can be calculated on each
node. The buffer hits and IOs are harder because they only apply to certain
nodes.

As for Jan's stats sollection patches, where are they? Google finds nothing.

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> It would be nice if someone came up with a certification system that
> actually separated those who can barely regurgitate what they crammed over
> the last few weeks from those who command secret ninja networking powers.

Re: New EXPLAIN ANALYZE statement

From

Philip Warner

Date:

21 July 2001, 23:11:37

At 12:29 21/07/01 +1000, Martijn van Oosterhout wrote:
>
>As for Jan's stats sollection patches, where are they? Google finds nothing.
>

I *think* they are part of current CVS.

----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|
                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/