Thread: WIP: explain analyze with 'rows' but not timing

WIP: explain analyze with 'rows' but not timing

From
Tomas Vondra
Date:
Hi all,

most of the time I use auto_explain, all I need is duration of the query
and the plan with estimates and actual row counts. And it would be handy
to be able to catch long running queries with estimates that are
significantly off (say 100x lower or higher compared to actual row numbers).

The gettimeofday() calls are not exactly cheap in some cases, so why to
pay that price when all you need is the number of rows?

The patch attached does this:

1) adds INSTRUMENT_ROWS, a new InstrumentOption

   - counts rows without timing (no gettimeofday() callse)
   - if you want timing info, use INSTRUMENT_TIMER

2) adds new option "TIMING" to EXPLAIN, i.e.

    EXPLAIN (ANALYZE ON, TIMING ON) SELECT ...

3) adds auto_explain.log_rows_only (false by default)

   - if you set this to 'true', then the instrumentation will just
     count rows, without calling gettimeofday()


It works quite well, except one tiny issue - when the log_rows_only is
set to false (so that auto_explain requires timing), it silently
overrides the EXPLAIN option. So that even when the user explicitly
disables timing (TIMING OFF), it's overwritten and the explain collects
the timing data.

I could probably hide the timing info, but that'd make the issue even
worse (the user would not notice that the timing was actually enabled).

Maybe the right thing would be to explicitly disable timing for queries
executed with "EXPLAIN (TIMING OFF)". Any other ideas how to make this
work reasonably?

The patch does not implement any checks (how far is the estimate from
the reality) yet, that'll be round two.

regards
Tomas

Attachment

Re: WIP: explain analyze with 'rows' but not timing

From
Pavel Stehule
Date:
Hello

2011/12/23 Tomas Vondra <tv@fuzzy.cz>:
> Hi all,
>
> most of the time I use auto_explain, all I need is duration of the query
> and the plan with estimates and actual row counts. And it would be handy
> to be able to catch long running queries with estimates that are
> significantly off (say 100x lower or higher compared to actual row numbers).
>
> The gettimeofday() calls are not exactly cheap in some cases, so why to
> pay that price when all you need is the number of rows?
>
> The patch attached does this:
>
> 1) adds INSTRUMENT_ROWS, a new InstrumentOption
>
>   - counts rows without timing (no gettimeofday() callse)
>   - if you want timing info, use INSTRUMENT_TIMER
>
> 2) adds new option "TIMING" to EXPLAIN, i.e.
>
>    EXPLAIN (ANALYZE ON, TIMING ON) SELECT ...
>
> 3) adds auto_explain.log_rows_only (false by default)
>
>   - if you set this to 'true', then the instrumentation will just
>     count rows, without calling gettimeofday()
>
>
> It works quite well, except one tiny issue - when the log_rows_only is
> set to false (so that auto_explain requires timing), it silently
> overrides the EXPLAIN option. So that even when the user explicitly
> disables timing (TIMING OFF), it's overwritten and the explain collects
> the timing data.
>
> I could probably hide the timing info, but that'd make the issue even
> worse (the user would not notice that the timing was actually enabled).
>
> Maybe the right thing would be to explicitly disable timing for queries
> executed with "EXPLAIN (TIMING OFF)". Any other ideas how to make this
> work reasonably?
>
> The patch does not implement any checks (how far is the estimate from
> the reality) yet, that'll be round two.

It is interesting idea - but maybe we can have a have a different
metric than time - this is very unstable quantity - mainly on
production overloaded servers.

It is good idea - we need a tool for bad statistic searching that is
relative cheap.

Regards

Pavel

>
> regards
> Tomas
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: WIP: explain analyze with 'rows' but not timing

From
Robert Haas
Date:
2011/12/22 Tomas Vondra <tv@fuzzy.cz>:
> The gettimeofday() calls are not exactly cheap in some cases, so why to
> pay that price when all you need is the number of rows?

Fair point.

> The patch attached does this:
>
> 1) adds INSTRUMENT_ROWS, a new InstrumentOption
>
>   - counts rows without timing (no gettimeofday() callse)
>   - if you want timing info, use INSTRUMENT_TIMER
>
> 2) adds new option "TIMING" to EXPLAIN, i.e.
>
>    EXPLAIN (ANALYZE ON, TIMING ON) SELECT ...
>
> 3) adds auto_explain.log_rows_only (false by default)
>
>   - if you set this to 'true', then the instrumentation will just
>     count rows, without calling gettimeofday()

This seems like an unnecessarily confusing interface, because you've
named the auto_explain option differently from the EXPLAIN option and
given it (almost) the opposite sense: timing=off means the same thing
as log_rows_only=on.

I think the EXPLAIN (TIMING) option is good the way you have it, but
then just have auto_explain.log_timing, with a default value of on.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: WIP: explain analyze with 'rows' but not timing

From
Tomas Vondra
Date:
On 23.12.2011 14:57, Robert Haas wrote:
> 2011/12/22 Tomas Vondra <tv@fuzzy.cz>:
>> The gettimeofday() calls are not exactly cheap in some cases, so why to
>> pay that price when all you need is the number of rows?
>
> Fair point.
>
>> The patch attached does this:
>>
>> 1) adds INSTRUMENT_ROWS, a new InstrumentOption
>>
>>   - counts rows without timing (no gettimeofday() callse)
>>   - if you want timing info, use INSTRUMENT_TIMER
>>
>> 2) adds new option "TIMING" to EXPLAIN, i.e.
>>
>>    EXPLAIN (ANALYZE ON, TIMING ON) SELECT ...
>>
>> 3) adds auto_explain.log_rows_only (false by default)
>>
>>   - if you set this to 'true', then the instrumentation will just
>>     count rows, without calling gettimeofday()
>
> This seems like an unnecessarily confusing interface, because you've
> named the auto_explain option differently from the EXPLAIN option and
> given it (almost) the opposite sense: timing=off means the same thing
> as log_rows_only=on.
>
> I think the EXPLAIN (TIMING) option is good the way you have it, but
> then just have auto_explain.log_timing, with a default value of on.

Makes sense. I've updated the patch to reflect this, so the option is
called auto_explain.log_timing and is true by default. I'll add the
patch to the next commit fest.

One thing I'm wondering about is that the InstrumentOptions are not
exclusive - INSTRUMENT_TIMER means 'collect timing and row counts' while
INSTRUMENT_ROWS means 'collect row counts'. Wouldn't it be better to
redefine the INSTRUMENT_TIMER so that it collects just timing info. I.e.
to get the current behaviour, you'd have to do this

   instrument_options |= (INSTRUMENT_TIMER | INSTRUMENT_ROWS)

It's quite trivial change in explain.c, the problem I have with that is
that it might break extensions.

Tomas

Attachment

Re: WIP: explain analyze with 'rows' but not timing

From
Tom Lane
Date:
Tomas Vondra <tv@fuzzy.cz> writes:
> One thing I'm wondering about is that the InstrumentOptions are not
> exclusive - INSTRUMENT_TIMER means 'collect timing and row counts' while
> INSTRUMENT_ROWS means 'collect row counts'. Wouldn't it be better to
> redefine the INSTRUMENT_TIMER so that it collects just timing info. I.e.
> to get the current behaviour, you'd have to do this

>    instrument_options |= (INSTRUMENT_TIMER | INSTRUMENT_ROWS)

> It's quite trivial change in explain.c, the problem I have with that is
> that it might break extensions.

I'm not especially concerned by that angle --- we make bigger API
changes all the time.  But you could change the name, eg
instrument_options |= (INSTRUMENT_TIMING | INSTRUMENT_ROWS)

and then #define INSTRUMENT_TIMER as the OR of the two real bits
for backward compatibility.
        regards, tom lane


Re: WIP: explain analyze with 'rows' but not timing

From
Alvaro Herrera
Date:
Excerpts from Tomas Vondra's message of vie dic 23 11:45:40 -0300 2011:

> One thing I'm wondering about is that the InstrumentOptions are not
> exclusive - INSTRUMENT_TIMER means 'collect timing and row counts' while
> INSTRUMENT_ROWS means 'collect row counts'. Wouldn't it be better to
> redefine the INSTRUMENT_TIMER so that it collects just timing info. I.e.
> to get the current behaviour, you'd have to do this
>
>    instrument_options |= (INSTRUMENT_TIMER | INSTRUMENT_ROWS)
>
> It's quite trivial change in explain.c, the problem I have with that is
> that it might break extensions.

No, maybe add INSTRUMENT_TIMER_ONLY and then define INSTRUMENT_TIMER as
setting both.  That way you don't break compatibility.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: WIP: explain analyze with 'rows' but not timing

From
Tomas Vondra
Date:
On 23.12.2011 16:14, Tom Lane wrote:
> Tomas Vondra <tv@fuzzy.cz> writes:
>> One thing I'm wondering about is that the InstrumentOptions are not
>> exclusive - INSTRUMENT_TIMER means 'collect timing and row counts' while
>> INSTRUMENT_ROWS means 'collect row counts'. Wouldn't it be better to
>> redefine the INSTRUMENT_TIMER so that it collects just timing info. I.e.
>> to get the current behaviour, you'd have to do this
> 
>>    instrument_options |= (INSTRUMENT_TIMER | INSTRUMENT_ROWS)
> 
>> It's quite trivial change in explain.c, the problem I have with that is
>> that it might break extensions.
> 
> I'm not especially concerned by that angle --- we make bigger API
> changes all the time.  But you could change the name, eg
> 
>     instrument_options |= (INSTRUMENT_TIMING | INSTRUMENT_ROWS)
> 
> and then #define INSTRUMENT_TIMER as the OR of the two real bits
> for backward compatibility.

OK, that seems like a good solution. But is it worth the additional
complexity in explain.c?

The motivation for this patch was that collection timing data often
causes performance issues and in some cases it's not needed. But is this
true for row counts?

Are there machines where collecting row counts is above noise level?
I've never seen that, but that's not a proof of nonexistence.

If the overhead of this is negligible, then I could just hide the row
counts from the output.

Tomas


Re: WIP: explain analyze with 'rows' but not timing

From
Tom Lane
Date:
Tomas Vondra <tv@fuzzy.cz> writes:
> The motivation for this patch was that collection timing data often
> causes performance issues and in some cases it's not needed. But is this
> true for row counts?

Perhaps more to the point, is there a use case for collecting timing
data without row counts?  I find it hard to visualize a valid reason.
        regards, tom lane


Re: WIP: explain analyze with 'rows' but not timing

From
Pavel Stehule
Date:
2011/12/23 Tom Lane <tgl@sss.pgh.pa.us>:
> Tomas Vondra <tv@fuzzy.cz> writes:
>> The motivation for this patch was that collection timing data often
>> causes performance issues and in some cases it's not needed. But is this
>> true for row counts?
>
> Perhaps more to the point, is there a use case for collecting timing
> data without row counts?  I find it hard to visualize a valid reason.

yes - a searching of bad prediction

Regards

Pavel

>
>                        regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: WIP: explain analyze with 'rows' but not timing

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> 2011/12/23 Tom Lane <tgl@sss.pgh.pa.us>:
>> Tomas Vondra <tv@fuzzy.cz> writes:
>>> The motivation for this patch was that collection timing data often
>>> causes performance issues and in some cases it's not needed. But is this
>>> true for row counts?

>> Perhaps more to the point, is there a use case for collecting timing
>> data without row counts?  I find it hard to visualize a valid reason.

> yes - a searching of bad prediction

No, because timing alone proves nothing at all.  The machine could just
have been overloaded.
        regards, tom lane


Re: WIP: explain analyze with 'rows' but not timing

From
Tomas Vondra
Date:
Dne 23.12.2011 22:37, Pavel Stehule napsal(a):
> 2011/12/23 Tom Lane<tgl@sss.pgh.pa.us>:
>> Tomas Vondra<tv@fuzzy.cz>  writes:
>>> The motivation for this patch was that collection timing data often
>>> causes performance issues and in some cases it's not needed. But is this
>>> true for row counts?
>>
>> Perhaps more to the point, is there a use case for collecting timing
>> data without row counts?  I find it hard to visualize a valid reason.
>
> yes - a searching of bad prediction

But that's the purpose of collecting row counts without timing data.

TL is asking about the opposite case - collecting timing data without 
row counts. I can't imagine such use case ...

Tomas


Re: WIP: explain analyze with 'rows' but not timing

From
Pavel Stehule
Date:
2011/12/23 Tom Lane <tgl@sss.pgh.pa.us>:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> 2011/12/23 Tom Lane <tgl@sss.pgh.pa.us>:
>>> Tomas Vondra <tv@fuzzy.cz> writes:
>>>> The motivation for this patch was that collection timing data often
>>>> causes performance issues and in some cases it's not needed. But is this
>>>> true for row counts?
>
>>> Perhaps more to the point, is there a use case for collecting timing
>>> data without row counts?  I find it hard to visualize a valid reason.
>
>> yes - a searching of bad prediction
>
> No, because timing alone proves nothing at all.  The machine could just
> have been overloaded.

sorry, I didn't understand to question.

Using only time is not practical

Regards

Pavel

>
>                        regards, tom lane