Re: Function runtime increases after 5 calls in the same session. - Mailing list pgsql-general

From Tom Lane
Subject Re: Function runtime increases after 5 calls in the same session.
Date
Msg-id 27317.1413950039@sss.pgh.pa.us
Whole thread Raw
In response to Re: Function runtime increases after 5 calls in the same session.  (Marti Raudsepp <marti@juffo.org>)
List pgsql-general
Marti Raudsepp <marti@juffo.org> writes:
> On Tue, Oct 21, 2014 at 12:53 PM, Ilya I. Ashchepkov <koctep@gmail.com> wrote:
>> I wrote a function and during testing it I came across the strange
>> behaviour.
>> Function runtime is about 200ms first 5 times, 6th and futher calls takes
>> ~22000 ms.
>> I simplified my schema, you can see it in attached file.
>> I've tested on 9.4beta3 and 9.3.5.

> I didn't look at the test case. But most likely the problem is that
> after 5 executions, plancache decides that a generic plan is no more
> expensive than a custom plan, and reverts to executing the generic
> one. But in reality there is a big difference in execution time.

Yeah.  Ilya didn't show the exact case he was testing, but I suppose
it was one where the timestamptz range covered the whole table (since
the test script created a table with a mighty tiny range of date values,
it would have taken some effort to do otherwise).  Anyway I tried it with
this case:
    select test.fun (1, 'yesterday', 'tomorrow');
and indeed reproduced a big slowdown in later executions.

You can examine what the planner is doing with prepared statements,
such as ones in plpgsql functions, via PREPARE/EXECUTE:

regression=# prepare foo(int, timestamptz, timestamptz) as
select sum(V1.data + V2.data)
  from test.view V1
  inner join test.view V2 using(id)
  where
  V1.object_id=$1
  and V1.time between $2 and $3
  and V2.time between $2 and $3;
PREPARE

The first time you try it you get:

regression=# explain execute foo (1, 'yesterday', 'tomorrow');
                                                                                                       QUERY PLAN
                                                                                                

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=7037.25..7037.26 rows=1 width=8)
   ->  Merge Join  (cost=5909.71..6759.53 rows=55544 width=8)
         Merge Cond: (v1.id = v2.id)
         ->  Sort  (cost=2967.35..2975.69 rows=3333 width=8)
               Sort Key: v1.id
               ->  Subquery Scan on v1  (cost=0.00..2772.33 rows=3333 width=8)
                     ->  Seq Scan on data  (cost=0.00..2739.00 rows=3333 width=8)
                           Filter: (("time" >= '2014-10-20 00:00:00-04'::timestamp with time zone) AND ("time" <=
'2014-10-2200:00:00-04'::timestamp with time zone) AND (object_id = 1) AND test.can_read(object_id)) 
         ->  Sort  (cost=2942.35..2950.69 rows=3333 width=8)
               Sort Key: v2.id
               ->  Subquery Scan on v2  (cost=0.00..2747.33 rows=3333 width=8)
                     ->  Seq Scan on data data_1  (cost=0.00..2714.00 rows=3333 width=8)
                           Filter: (("time" >= '2014-10-20 00:00:00-04'::timestamp with time zone) AND ("time" <=
'2014-10-2200:00:00-04'::timestamp with time zone) AND test.can_read(object_id)) 
(13 rows)

After repeating that a few times it switches to:

regression=# explain execute foo (1, 'yesterday', 'tomorrow');
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=428.63..428.64 rows=1 width=8)
   ->  Nested Loop  (cost=5.20..428.54 rows=17 width=8)
         Join Filter: (data.id = v2.id)
         ->  Bitmap Heap Scan on data  (cost=4.91..81.29 rows=17 width=8)
               Recheck Cond: ((object_id = $1) AND ("time" >= $2) AND ("time" <= $3))
               Filter: test.can_read(object_id)
               ->  Bitmap Index Scan on data_object_id_time_idx  (cost=0.00..4.91 rows=50 width=0)
                     Index Cond: ((object_id = $1) AND ("time" >= $2) AND ("time" <= $3))
         ->  Materialize  (cost=0.29..342.79 rows=17 width=8)
               ->  Subquery Scan on v2  (cost=0.29..342.71 rows=17 width=8)
                     ->  Index Scan using data_object_id_time_idx on data data_1  (cost=0.29..342.54 rows=17 width=8)
                           Index Cond: (("time" >= $2) AND ("time" <= $3))
                           Filter: test.can_read(object_id)
(13 rows)

Notice the plan now contains parameter markers $n instead of literal
constants; this is because this is a generic plan.  The problem is
that the planner has estimated this case a lot cheaper than it did
when it saw the actual parameter values (and could determine that the
query was going to have to scan the entire table).  Without the actual
parameter values, it has to fall back on rules of thumb to estimate
how much of the table will be scanned.  Although it doesn't know the
comparison values, it does recognize the "between" constructs as being
range restrictions, and its rule of thumb is that those are fairly
selective --- about 1% IIRC.  So it thinks the query will not need to
read very much of the table and chooses a plan that'd be appropriate
if that were true.

The custom-vs-generic-plan heuristic is designed for cases where the
generic plan is based on unduly pessimistic assumptions; which is where
most of the problems have been historically.  Here we've got one where
the generic plan is based on unduly optimistic assumptions.  Not sure
how we might fix that without breaking the more commonly complained-of
case.

            regards, tom lane


pgsql-general by date:

Previous
From: Ilya Ashchepkov
Date:
Subject: Re: Function runtime increases after 5 calls in the same session.
Next
From: Nikhil Daddikar
Date:
Subject: To increase RAM or not