Thread: Function runtime increases after 5 calls in the same session.

Function runtime increases after 5 calls in the same session.

From
"Ilya I. Ashchepkov"
Date:
Hi, all.

I wrote a function and during testing it I came across the strange behaviour.
Function runtime is about 200ms first 5 times, 6th and futher calls takes ~22000 ms.
I simplified my schema, you can see it in attached file.
I've tested on 9.4beta3 and 9.3.5.

When I removed security_barrier from the view or alter column "time" type to integer (bigint, I don't remember) all goes as expected.

It looks very strange for me.
I will be pleased if somebody checks this.

--
С уважением,
Ащепков Илья koctep@gmail.com
Attachment

Re: Function runtime increases after 5 calls in the same session.

From
Marti Raudsepp
Date:
Hi

On Tue, Oct 21, 2014 at 12:53 PM, Ilya I. Ashchepkov <koctep@gmail.com> wrote:
> I wrote a function and during testing it I came across the strange
> behaviour.
> Function runtime is about 200ms first 5 times, 6th and futher calls takes
> ~22000 ms.
> I simplified my schema, you can see it in attached file.
> I've tested on 9.4beta3 and 9.3.5.

I didn't look at the test case. But most likely the problem is that
after 5 executions, plancache decides that a generic plan is no more
expensive than a custom plan, and reverts to executing the generic
one. But in reality there is a big difference in execution time.

See this for how the logic works:
https://github.com/postgres/postgres/blob/master/src/backend/utils/cache/plancache.c#L1036

As a workaround you could use PL/pgSQL EXECUTE to run the query, which
isn't subject to generic plans:
http://www.postgresql.org/docs/current/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN

Regards,
Marti


Re: Function runtime increases after 5 calls in the same session.

From
Ilya Ashchepkov
Date:
Marti Raudsepp <marti@juffo.org> Wrote in message:
> Hi
>
> On Tue, Oct 21, 2014 at 12:53 PM, Ilya I. Ashchepkov <koctep@gmail.com> wrote:
>> I wrote a function and during testing it I came across the strange
>> behaviour.
>> Function runtime is about 200ms first 5 times, 6th and futher calls takes
>> ~22000 ms.
>> I simplified my schema, you can see it in attached file.
>> I've tested on 9.4beta3 and 9.3.5.
>
> I didn't look at the test case. But most likely the problem is that
> after 5 executions, plancache decides that a generic plan is no more
> expensive than a custom plan, and reverts to executing the generic
> one. But in reality there is a big difference in execution time.
>
> See this for how the logic works:
> https://github.com/postgres/postgres/blob/master/src/backend/utils/cache/plancache.c#L1036
>
> As a workaround you could use PL/pgSQL EXECUTE to run the query, which
> isn't subject to generic plans:
> http://www.postgresql.org/docs/current/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN
>
> Regards,
> Marti
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>


Thank you. I tried execute and it helps.
--



Re: Function runtime increases after 5 calls in the same session.

From
Tom Lane
Date:
Marti Raudsepp <marti@juffo.org> writes:
> On Tue, Oct 21, 2014 at 12:53 PM, Ilya I. Ashchepkov <koctep@gmail.com> wrote:
>> I wrote a function and during testing it I came across the strange
>> behaviour.
>> Function runtime is about 200ms first 5 times, 6th and futher calls takes
>> ~22000 ms.
>> I simplified my schema, you can see it in attached file.
>> I've tested on 9.4beta3 and 9.3.5.

> I didn't look at the test case. But most likely the problem is that
> after 5 executions, plancache decides that a generic plan is no more
> expensive than a custom plan, and reverts to executing the generic
> one. But in reality there is a big difference in execution time.

Yeah.  Ilya didn't show the exact case he was testing, but I suppose
it was one where the timestamptz range covered the whole table (since
the test script created a table with a mighty tiny range of date values,
it would have taken some effort to do otherwise).  Anyway I tried it with
this case:
    select test.fun (1, 'yesterday', 'tomorrow');
and indeed reproduced a big slowdown in later executions.

You can examine what the planner is doing with prepared statements,
such as ones in plpgsql functions, via PREPARE/EXECUTE:

regression=# prepare foo(int, timestamptz, timestamptz) as
select sum(V1.data + V2.data)
  from test.view V1
  inner join test.view V2 using(id)
  where
  V1.object_id=$1
  and V1.time between $2 and $3
  and V2.time between $2 and $3;
PREPARE

The first time you try it you get:

regression=# explain execute foo (1, 'yesterday', 'tomorrow');
                                                                                                       QUERY PLAN
                                                                                                

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=7037.25..7037.26 rows=1 width=8)
   ->  Merge Join  (cost=5909.71..6759.53 rows=55544 width=8)
         Merge Cond: (v1.id = v2.id)
         ->  Sort  (cost=2967.35..2975.69 rows=3333 width=8)
               Sort Key: v1.id
               ->  Subquery Scan on v1  (cost=0.00..2772.33 rows=3333 width=8)
                     ->  Seq Scan on data  (cost=0.00..2739.00 rows=3333 width=8)
                           Filter: (("time" >= '2014-10-20 00:00:00-04'::timestamp with time zone) AND ("time" <=
'2014-10-2200:00:00-04'::timestamp with time zone) AND (object_id = 1) AND test.can_read(object_id)) 
         ->  Sort  (cost=2942.35..2950.69 rows=3333 width=8)
               Sort Key: v2.id
               ->  Subquery Scan on v2  (cost=0.00..2747.33 rows=3333 width=8)
                     ->  Seq Scan on data data_1  (cost=0.00..2714.00 rows=3333 width=8)
                           Filter: (("time" >= '2014-10-20 00:00:00-04'::timestamp with time zone) AND ("time" <=
'2014-10-2200:00:00-04'::timestamp with time zone) AND test.can_read(object_id)) 
(13 rows)

After repeating that a few times it switches to:

regression=# explain execute foo (1, 'yesterday', 'tomorrow');
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=428.63..428.64 rows=1 width=8)
   ->  Nested Loop  (cost=5.20..428.54 rows=17 width=8)
         Join Filter: (data.id = v2.id)
         ->  Bitmap Heap Scan on data  (cost=4.91..81.29 rows=17 width=8)
               Recheck Cond: ((object_id = $1) AND ("time" >= $2) AND ("time" <= $3))
               Filter: test.can_read(object_id)
               ->  Bitmap Index Scan on data_object_id_time_idx  (cost=0.00..4.91 rows=50 width=0)
                     Index Cond: ((object_id = $1) AND ("time" >= $2) AND ("time" <= $3))
         ->  Materialize  (cost=0.29..342.79 rows=17 width=8)
               ->  Subquery Scan on v2  (cost=0.29..342.71 rows=17 width=8)
                     ->  Index Scan using data_object_id_time_idx on data data_1  (cost=0.29..342.54 rows=17 width=8)
                           Index Cond: (("time" >= $2) AND ("time" <= $3))
                           Filter: test.can_read(object_id)
(13 rows)

Notice the plan now contains parameter markers $n instead of literal
constants; this is because this is a generic plan.  The problem is
that the planner has estimated this case a lot cheaper than it did
when it saw the actual parameter values (and could determine that the
query was going to have to scan the entire table).  Without the actual
parameter values, it has to fall back on rules of thumb to estimate
how much of the table will be scanned.  Although it doesn't know the
comparison values, it does recognize the "between" constructs as being
range restrictions, and its rule of thumb is that those are fairly
selective --- about 1% IIRC.  So it thinks the query will not need to
read very much of the table and chooses a plan that'd be appropriate
if that were true.

The custom-vs-generic-plan heuristic is designed for cases where the
generic plan is based on unduly pessimistic assumptions; which is where
most of the problems have been historically.  Here we've got one where
the generic plan is based on unduly optimistic assumptions.  Not sure
how we might fix that without breaking the more commonly complained-of
case.

            regards, tom lane


Re: Function runtime increases after 5 calls in the same session.

From
Ilya Ashchepkov
Date:
Tom Lane <tgl@sss.pgh.pa.us> Wrote in message:
> Marti Raudsepp <marti@juffo.org> writes:
>> On Tue, Oct 21, 2014 at 12:53 PM, Ilya I. Ashchepkov <koctep@gmail.com> wrote:
>>> I wrote a function and during testing it I came across the strange
>>> behaviour.
>>> Function runtime is about 200ms first 5 times, 6th and futher calls takes
>>> ~22000 ms.
>>> I simplified my schema, you can see it in attached file.
>>> I've tested on 9.4beta3 and 9.3.5.
>
>> I didn't look at the test case. But most likely the problem is that
>> after 5 executions, plancache decides that a generic plan is no more
>> expensive than a custom plan, and reverts to executing the generic
>> one. But in reality there is a big difference in execution time.
>
> Yeah.  Ilya didn't show the exact case he was testing, but I suppose
> it was one where the timestamptz range covered the whole table (since
> the test script created a table with a mighty tiny range of date values,
> it would have taken some effort to do otherwise).  Anyway I tried it with
> this case:
>     select test.fun (1, 'yesterday', 'tomorrow');
> and indeed reproduced a big slowdown in later executions.
>
> You can examine what the planner is doing with prepared statements,
> such as ones in plpgsql functions, via PREPARE/EXECUTE:
>
> regression=# prepare foo(int, timestamptz, timestamptz) as
> select sum(V1.data + V2.data)
>   from test.view V1
>   inner join test.view V2 using(id)
>   where
>   V1.object_id=$1
>   and V1.time between $2 and $3
>   and V2.time between $2 and $3;
> PREPARE
>
> The first time you try it you get:
>
> regression=# explain execute foo (1, 'yesterday', 'tomorrow');
>                                                                                                        QUERY PLAN
                                                                                                  
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=7037.25..7037.26 rows=1 width=8)
>    ->  Merge Join  (cost=5909.71..6759.53 rows=55544 width=8)
>          Merge Cond: (v1.id = v2.id)
>          ->  Sort  (cost=2967.35..2975.69 rows=3333 width=8)
>                Sort Key: v1.id
>                ->  Subquery Scan on v1  (cost=0.00..2772.33 rows=3333 width=8)
>                      ->  Seq Scan on data  (cost=0.00..2739.00 rows=3333 width=8)
>                            Filter: (("time" >= '2014-10-20 00:00:00-04'::timestamp with time zone) AND ("time" <=
'2014-10-2200:00:00-04'::timestamp with time zone) AND (object_id = 1) AND test.can_read(object_id)) 
>          ->  Sort  (cost=2942.35..2950.69 rows=3333 width=8)
>                Sort Key: v2.id
>                ->  Subquery Scan on v2  (cost=0.00..2747.33 rows=3333 width=8)
>                      ->  Seq Scan on data data_1  (cost=0.00..2714.00 rows=3333 width=8)
>                            Filter: (("time" >= '2014-10-20 00:00:00-04'::timestamp with time zone) AND ("time" <=
'2014-10-2200:00:00-04'::timestamp with time zone) AND test.can_read(object_id)) 
> (13 rows)
>
> After repeating that a few times it switches to:
>
> regression=# explain execute foo (1, 'yesterday', 'tomorrow');
>                                                       QUERY PLAN

>
----------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=428.63..428.64 rows=1 width=8)
>    ->  Nested Loop  (cost=5.20..428.54 rows=17 width=8)
>          Join Filter: (data.id = v2.id)
>          ->  Bitmap Heap Scan on data  (cost=4.91..81.29 rows=17 width=8)
>                Recheck Cond: ((object_id = $1) AND ("time" >= $2) AND ("time" <= $3))
>                Filter: test.can_read(object_id)
>                ->  Bitmap Index Scan on data_object_id_time_idx  (cost=0.00..4.91 rows=50 width=0)
>                      Index Cond: ((object_id = $1) AND ("time" >= $2) AND ("time" <= $3))
>          ->  Materialize  (cost=0.29..342.79 rows=17 width=8)
>                ->  Subquery Scan on v2  (cost=0.29..342.71 rows=17 width=8)
>                      ->  Index Scan using data_object_id_time_idx on data data_1  (cost=0.29..342.54 rows=17 width=8)
>                            Index Cond: (("time" >= $2) AND ("time" <= $3))
>                            Filter: test.can_read(object_id)
> (13 rows)
>
> Notice the plan now contains parameter markers $n instead of literal
> constants; this is because this is a generic plan.  The problem is
> that the planner has estimated this case a lot cheaper than it did
> when it saw the actual parameter values (and could determine that the
> query was going to have to scan the entire table).  Without the actual
> parameter values, it has to fall back on rules of thumb to estimate
> how much of the table will be scanned.  Although it doesn't know the
> comparison values, it does recognize the "between" constructs as being
> range restrictions, and its rule of thumb is that those are fairly
> selective --- about 1% IIRC.  So it thinks the query will not need to
> read very much of the table and chooses a plan that'd be appropriate
> if that were true.
>
> The custom-vs-generic-plan heuristic is designed for cases where the
> generic plan is based on unduly pessimistic assumptions; which is where
> most of the problems have been historically.  Here we've got one where
> the generic plan is based on unduly optimistic assumptions.  Not sure
> how we might fix that without breaking the more commonly complained-of
> case.
>
>             regards, tom lane
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>

Thank you for detailed explanation.
You guessed right my test, I forgot to add it to the first message.
I'll add more data and try more realistic test
--

Ilya