Thread: performance bug in instrument.c

performance bug in instrument.c

From

Tomas Vondra

Date:

07 February 2013, 21:56:58

Hi,

it seems there's a issue in instrument.c that may have significant
performance impact for some plans when running explain analyze with
"TIMING OFF".

There's this piece of code in InstrStartNode:
   if (instr->need_timer && INSTR_TIME_IS_ZERO(instr->starttime))       INSTR_TIME_SET_CURRENT(instr->starttime);
else      elog(DEBUG2, "InstrStartNode called twice in a row");
 

but it should actually be like this
   if (instr->need_timer && INSTR_TIME_IS_ZERO(instr->starttime))       INSTR_TIME_SET_CURRENT(instr->starttime);
elseif (instr->need_timer)       elog(DEBUG2, "InstrStartNode called twice in a row");
 

or maybe
   if (instr->need_timer)   {       if (INSTR_TIME_IS_ZERO(instr->starttime))
INSTR_TIME_SET_CURRENT(instr->starttime);      else           elog(DEBUG2, "InstrStartNode called twice in a row");
}

The current code calls the "else" branch everytime when "TIMING OFF" is
used, and this may lead to serious performance degradation - see for
example this:
 CREATE TABLE x AS SELECT id FROM generate_series(1,15000) s(id); ANALYZE x; EXPLAIN ANALYZE SELECT a.id FROM x a, x
b;

Current code:
                              QUERY PLAN
-----------------------------------------------------------------------Nested Loop  (cost=0.00..2812971.50
rows=225000000width=4) (actual
 
time=0.013..29611.859 rows=225000000 loops=1)  ->  Seq Scan on x a  (cost=0.00..217.00 rows=15000 width=4) (actual
time=0.006..2.847 rows=15000 loops=1)  ->  Materialize  (cost=0.00..292.00 rows=15000 width=0) (actual
time=0.000..0.740 rows=15000 loops=15000)        ->  Seq Scan on x b  (cost=0.00..217.00 rows=15000 width=0)
(actual time=0.003..1.186 rows=15000 loops=1)Total runtime: 36054.079 ms
(5 rows)

and with the fix
                              QUERY PLAN
-----------------------------------------------------------------------Nested Loop  (cost=0.00..1458333.00
rows=116640000width=4) (actual
 
time=0.021..13158.399 rows=100000000 loops=1)  ->  Seq Scan on x a  (cost=0.00..153.00 rows=10800 width=4) (actual
time=0.014..1.960 rows=10000 loops=1)  ->  Materialize  (cost=0.00..207.00 rows=10800 width=0) (actual
time=0.000..0.499 rows=10000 loops=10000)        ->  Seq Scan on x b  (cost=0.00..153.00 rows=10800 width=0)
(actual time=0.003..1.037 rows=10000 loops=1)Total runtime: 16017.953 ms
(5 rows)

Obviously this is a quite extreme example, most plans won't be hit this
hard.

This was discovered by Pavel Stehule when backporting my "TIMING OFF"
patch (which introduced the bug).

regards
Tomas

Re: performance bug in instrument.c

From

Tom Lane

Date:

08 February 2013, 00:28:12

Tomas Vondra <tv@fuzzy.cz> writes:
> There's this piece of code in InstrStartNode:

>     if (instr->need_timer && INSTR_TIME_IS_ZERO(instr->starttime))
>         INSTR_TIME_SET_CURRENT(instr->starttime);
>     else
>         elog(DEBUG2, "InstrStartNode called twice in a row");

> but it should actually be like this

>     if (instr->need_timer)
>     {
>         if (INSTR_TIME_IS_ZERO(instr->starttime))
>         INSTR_TIME_SET_CURRENT(instr->starttime);
>         else
>             elog(DEBUG2, "InstrStartNode called twice in a row");
>     }

Hm.  It's a bit annoying that we can't detect the "called twice"
condition when !need_timer, but I suppose that any such bug would be a
caller logic error that would probably not be sensitive to need_timer,
so it's likely not worth adding overhead to handle that.

A bigger question is why this is elog(DEBUG2) and not elog(ERROR).
Had it been the latter, we'd have noticed the mistake immediately.
The current choice might be masking any caller-logic errors that
exist, too.
        regards, tom lane

Re: performance bug in instrument.c

From

Tomas Vondra

Date:

08 February 2013, 01:52:27

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
> There's this piece of code in InstrStartNode:
>
>>     if (instr->need_timer && INSTR_TIME_IS_ZERO(instr->starttime))
>>         INSTR_TIME_SET_CURRENT(instr->starttime);
>>     else
>>         elog(DEBUG2, "InstrStartNode called twice in a row");
>
>> but it should actually be like this
>
>>     if (instr->need_timer)
>>     {
>>         if (INSTR_TIME_IS_ZERO(instr->starttime))
>>       INSTR_TIME_SET_CURRENT(instr->starttime);
>>         else
>>             elog(DEBUG2, "InstrStartNode called twice in a row");
>>     }
>
> Hm.  It's a bit annoying that we can't detect the "called twice"
> condition when !need_timer, but I suppose that any such bug would be a
> caller logic error that would probably not be sensitive to need_timer,
> so it's likely not worth adding overhead to handle that.

Yes, that's annoying. But if we need / want to detect that, wouldn't it
be cleaner to add there a separate "already executed" flag, instead of
misusing starttime for that?

>
> A bigger question is why this is elog(DEBUG2) and not elog(ERROR).
> Had it been the latter, we'd have noticed the mistake immediately.
> The current choice might be masking any caller-logic errors that
> exist, too.

Not sure why it's DEBUG2, but if it really is an error then it should be
logged as ERROR I guess.

Tomas

Re: performance bug in instrument.c

From

Tom Lane

Date:

08 February 2013, 03:55:46

Tomas Vondra <tv@fuzzy.cz> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> A bigger question is why this is elog(DEBUG2) and not elog(ERROR).
>> Had it been the latter, we'd have noticed the mistake immediately.
>> The current choice might be masking any caller-logic errors that
>> exist, too.

> Not sure why it's DEBUG2, but if it really is an error then it should be
> logged as ERROR I guess.

Perhaps there was a reason for that once, but AFAICT the logic is clean
now.  I changed these elogs to ERROR and ran all the regression tests
under auto_explain, and nothing failed.  That's hardly conclusive of
course, but it's enough evidence to persuade me to make the change in
HEAD.
        regards, tom lane