Thread: WIP: explain analyze with 'rows' but not timing
Hi all, most of the time I use auto_explain, all I need is duration of the query and the plan with estimates and actual row counts. And it would be handy to be able to catch long running queries with estimates that are significantly off (say 100x lower or higher compared to actual row numbers). The gettimeofday() calls are not exactly cheap in some cases, so why to pay that price when all you need is the number of rows? The patch attached does this: 1) adds INSTRUMENT_ROWS, a new InstrumentOption - counts rows without timing (no gettimeofday() callse) - if you want timing info, use INSTRUMENT_TIMER 2) adds new option "TIMING" to EXPLAIN, i.e. EXPLAIN (ANALYZE ON, TIMING ON) SELECT ... 3) adds auto_explain.log_rows_only (false by default) - if you set this to 'true', then the instrumentation will just count rows, without calling gettimeofday() It works quite well, except one tiny issue - when the log_rows_only is set to false (so that auto_explain requires timing), it silently overrides the EXPLAIN option. So that even when the user explicitly disables timing (TIMING OFF), it's overwritten and the explain collects the timing data. I could probably hide the timing info, but that'd make the issue even worse (the user would not notice that the timing was actually enabled). Maybe the right thing would be to explicitly disable timing for queries executed with "EXPLAIN (TIMING OFF)". Any other ideas how to make this work reasonably? The patch does not implement any checks (how far is the estimate from the reality) yet, that'll be round two. regards Tomas
Attachment
Hello 2011/12/23 Tomas Vondra <tv@fuzzy.cz>: > Hi all, > > most of the time I use auto_explain, all I need is duration of the query > and the plan with estimates and actual row counts. And it would be handy > to be able to catch long running queries with estimates that are > significantly off (say 100x lower or higher compared to actual row numbers). > > The gettimeofday() calls are not exactly cheap in some cases, so why to > pay that price when all you need is the number of rows? > > The patch attached does this: > > 1) adds INSTRUMENT_ROWS, a new InstrumentOption > > - counts rows without timing (no gettimeofday() callse) > - if you want timing info, use INSTRUMENT_TIMER > > 2) adds new option "TIMING" to EXPLAIN, i.e. > > EXPLAIN (ANALYZE ON, TIMING ON) SELECT ... > > 3) adds auto_explain.log_rows_only (false by default) > > - if you set this to 'true', then the instrumentation will just > count rows, without calling gettimeofday() > > > It works quite well, except one tiny issue - when the log_rows_only is > set to false (so that auto_explain requires timing), it silently > overrides the EXPLAIN option. So that even when the user explicitly > disables timing (TIMING OFF), it's overwritten and the explain collects > the timing data. > > I could probably hide the timing info, but that'd make the issue even > worse (the user would not notice that the timing was actually enabled). > > Maybe the right thing would be to explicitly disable timing for queries > executed with "EXPLAIN (TIMING OFF)". Any other ideas how to make this > work reasonably? > > The patch does not implement any checks (how far is the estimate from > the reality) yet, that'll be round two. It is interesting idea - but maybe we can have a have a different metric than time - this is very unstable quantity - mainly on production overloaded servers. It is good idea - we need a tool for bad statistic searching that is relative cheap. Regards Pavel > > regards > Tomas > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
2011/12/22 Tomas Vondra <tv@fuzzy.cz>: > The gettimeofday() calls are not exactly cheap in some cases, so why to > pay that price when all you need is the number of rows? Fair point. > The patch attached does this: > > 1) adds INSTRUMENT_ROWS, a new InstrumentOption > > - counts rows without timing (no gettimeofday() callse) > - if you want timing info, use INSTRUMENT_TIMER > > 2) adds new option "TIMING" to EXPLAIN, i.e. > > EXPLAIN (ANALYZE ON, TIMING ON) SELECT ... > > 3) adds auto_explain.log_rows_only (false by default) > > - if you set this to 'true', then the instrumentation will just > count rows, without calling gettimeofday() This seems like an unnecessarily confusing interface, because you've named the auto_explain option differently from the EXPLAIN option and given it (almost) the opposite sense: timing=off means the same thing as log_rows_only=on. I think the EXPLAIN (TIMING) option is good the way you have it, but then just have auto_explain.log_timing, with a default value of on. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 23.12.2011 14:57, Robert Haas wrote: > 2011/12/22 Tomas Vondra <tv@fuzzy.cz>: >> The gettimeofday() calls are not exactly cheap in some cases, so why to >> pay that price when all you need is the number of rows? > > Fair point. > >> The patch attached does this: >> >> 1) adds INSTRUMENT_ROWS, a new InstrumentOption >> >> - counts rows without timing (no gettimeofday() callse) >> - if you want timing info, use INSTRUMENT_TIMER >> >> 2) adds new option "TIMING" to EXPLAIN, i.e. >> >> EXPLAIN (ANALYZE ON, TIMING ON) SELECT ... >> >> 3) adds auto_explain.log_rows_only (false by default) >> >> - if you set this to 'true', then the instrumentation will just >> count rows, without calling gettimeofday() > > This seems like an unnecessarily confusing interface, because you've > named the auto_explain option differently from the EXPLAIN option and > given it (almost) the opposite sense: timing=off means the same thing > as log_rows_only=on. > > I think the EXPLAIN (TIMING) option is good the way you have it, but > then just have auto_explain.log_timing, with a default value of on. Makes sense. I've updated the patch to reflect this, so the option is called auto_explain.log_timing and is true by default. I'll add the patch to the next commit fest. One thing I'm wondering about is that the InstrumentOptions are not exclusive - INSTRUMENT_TIMER means 'collect timing and row counts' while INSTRUMENT_ROWS means 'collect row counts'. Wouldn't it be better to redefine the INSTRUMENT_TIMER so that it collects just timing info. I.e. to get the current behaviour, you'd have to do this instrument_options |= (INSTRUMENT_TIMER | INSTRUMENT_ROWS) It's quite trivial change in explain.c, the problem I have with that is that it might break extensions. Tomas
Attachment
Tomas Vondra <tv@fuzzy.cz> writes: > One thing I'm wondering about is that the InstrumentOptions are not > exclusive - INSTRUMENT_TIMER means 'collect timing and row counts' while > INSTRUMENT_ROWS means 'collect row counts'. Wouldn't it be better to > redefine the INSTRUMENT_TIMER so that it collects just timing info. I.e. > to get the current behaviour, you'd have to do this > instrument_options |= (INSTRUMENT_TIMER | INSTRUMENT_ROWS) > It's quite trivial change in explain.c, the problem I have with that is > that it might break extensions. I'm not especially concerned by that angle --- we make bigger API changes all the time. But you could change the name, eg instrument_options |= (INSTRUMENT_TIMING | INSTRUMENT_ROWS) and then #define INSTRUMENT_TIMER as the OR of the two real bits for backward compatibility. regards, tom lane
Excerpts from Tomas Vondra's message of vie dic 23 11:45:40 -0300 2011: > One thing I'm wondering about is that the InstrumentOptions are not > exclusive - INSTRUMENT_TIMER means 'collect timing and row counts' while > INSTRUMENT_ROWS means 'collect row counts'. Wouldn't it be better to > redefine the INSTRUMENT_TIMER so that it collects just timing info. I.e. > to get the current behaviour, you'd have to do this > > instrument_options |= (INSTRUMENT_TIMER | INSTRUMENT_ROWS) > > It's quite trivial change in explain.c, the problem I have with that is > that it might break extensions. No, maybe add INSTRUMENT_TIMER_ONLY and then define INSTRUMENT_TIMER as setting both. That way you don't break compatibility. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 23.12.2011 16:14, Tom Lane wrote: > Tomas Vondra <tv@fuzzy.cz> writes: >> One thing I'm wondering about is that the InstrumentOptions are not >> exclusive - INSTRUMENT_TIMER means 'collect timing and row counts' while >> INSTRUMENT_ROWS means 'collect row counts'. Wouldn't it be better to >> redefine the INSTRUMENT_TIMER so that it collects just timing info. I.e. >> to get the current behaviour, you'd have to do this > >> instrument_options |= (INSTRUMENT_TIMER | INSTRUMENT_ROWS) > >> It's quite trivial change in explain.c, the problem I have with that is >> that it might break extensions. > > I'm not especially concerned by that angle --- we make bigger API > changes all the time. But you could change the name, eg > > instrument_options |= (INSTRUMENT_TIMING | INSTRUMENT_ROWS) > > and then #define INSTRUMENT_TIMER as the OR of the two real bits > for backward compatibility. OK, that seems like a good solution. But is it worth the additional complexity in explain.c? The motivation for this patch was that collection timing data often causes performance issues and in some cases it's not needed. But is this true for row counts? Are there machines where collecting row counts is above noise level? I've never seen that, but that's not a proof of nonexistence. If the overhead of this is negligible, then I could just hide the row counts from the output. Tomas
Tomas Vondra <tv@fuzzy.cz> writes: > The motivation for this patch was that collection timing data often > causes performance issues and in some cases it's not needed. But is this > true for row counts? Perhaps more to the point, is there a use case for collecting timing data without row counts? I find it hard to visualize a valid reason. regards, tom lane
2011/12/23 Tom Lane <tgl@sss.pgh.pa.us>: > Tomas Vondra <tv@fuzzy.cz> writes: >> The motivation for this patch was that collection timing data often >> causes performance issues and in some cases it's not needed. But is this >> true for row counts? > > Perhaps more to the point, is there a use case for collecting timing > data without row counts? I find it hard to visualize a valid reason. yes - a searching of bad prediction Regards Pavel > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Pavel Stehule <pavel.stehule@gmail.com> writes: > 2011/12/23 Tom Lane <tgl@sss.pgh.pa.us>: >> Tomas Vondra <tv@fuzzy.cz> writes: >>> The motivation for this patch was that collection timing data often >>> causes performance issues and in some cases it's not needed. But is this >>> true for row counts? >> Perhaps more to the point, is there a use case for collecting timing >> data without row counts? I find it hard to visualize a valid reason. > yes - a searching of bad prediction No, because timing alone proves nothing at all. The machine could just have been overloaded. regards, tom lane
Dne 23.12.2011 22:37, Pavel Stehule napsal(a): > 2011/12/23 Tom Lane<tgl@sss.pgh.pa.us>: >> Tomas Vondra<tv@fuzzy.cz> writes: >>> The motivation for this patch was that collection timing data often >>> causes performance issues and in some cases it's not needed. But is this >>> true for row counts? >> >> Perhaps more to the point, is there a use case for collecting timing >> data without row counts? I find it hard to visualize a valid reason. > > yes - a searching of bad prediction But that's the purpose of collecting row counts without timing data. TL is asking about the opposite case - collecting timing data without row counts. I can't imagine such use case ... Tomas
2011/12/23 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> 2011/12/23 Tom Lane <tgl@sss.pgh.pa.us>: >>> Tomas Vondra <tv@fuzzy.cz> writes: >>>> The motivation for this patch was that collection timing data often >>>> causes performance issues and in some cases it's not needed. But is this >>>> true for row counts? > >>> Perhaps more to the point, is there a use case for collecting timing >>> data without row counts? I find it hard to visualize a valid reason. > >> yes - a searching of bad prediction > > No, because timing alone proves nothing at all. The machine could just > have been overloaded. sorry, I didn't understand to question. Using only time is not practical Regards Pavel > > regards, tom lane