Thread: Function runtime increases after 5 calls in the same session.
Hi, all.
I wrote a function and during testing it I came across the strange behaviour.
I've tested on 9.4beta3 and 9.3.5.
When I removed security_barrier from the view or alter column "time" type to integer (bigint, I don't remember) all goes as expected.
It looks very strange for me.
I will be pleased if somebody checks this.
Attachment
Hi On Tue, Oct 21, 2014 at 12:53 PM, Ilya I. Ashchepkov <koctep@gmail.com> wrote: > I wrote a function and during testing it I came across the strange > behaviour. > Function runtime is about 200ms first 5 times, 6th and futher calls takes > ~22000 ms. > I simplified my schema, you can see it in attached file. > I've tested on 9.4beta3 and 9.3.5. I didn't look at the test case. But most likely the problem is that after 5 executions, plancache decides that a generic plan is no more expensive than a custom plan, and reverts to executing the generic one. But in reality there is a big difference in execution time. See this for how the logic works: https://github.com/postgres/postgres/blob/master/src/backend/utils/cache/plancache.c#L1036 As a workaround you could use PL/pgSQL EXECUTE to run the query, which isn't subject to generic plans: http://www.postgresql.org/docs/current/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN Regards, Marti
Marti Raudsepp <marti@juffo.org> Wrote in message: > Hi > > On Tue, Oct 21, 2014 at 12:53 PM, Ilya I. Ashchepkov <koctep@gmail.com> wrote: >> I wrote a function and during testing it I came across the strange >> behaviour. >> Function runtime is about 200ms first 5 times, 6th and futher calls takes >> ~22000 ms. >> I simplified my schema, you can see it in attached file. >> I've tested on 9.4beta3 and 9.3.5. > > I didn't look at the test case. But most likely the problem is that > after 5 executions, plancache decides that a generic plan is no more > expensive than a custom plan, and reverts to executing the generic > one. But in reality there is a big difference in execution time. > > See this for how the logic works: > https://github.com/postgres/postgres/blob/master/src/backend/utils/cache/plancache.c#L1036 > > As a workaround you could use PL/pgSQL EXECUTE to run the query, which > isn't subject to generic plans: > http://www.postgresql.org/docs/current/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN > > Regards, > Marti > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > > Thank you. I tried execute and it helps. --
Marti Raudsepp <marti@juffo.org> writes: > On Tue, Oct 21, 2014 at 12:53 PM, Ilya I. Ashchepkov <koctep@gmail.com> wrote: >> I wrote a function and during testing it I came across the strange >> behaviour. >> Function runtime is about 200ms first 5 times, 6th and futher calls takes >> ~22000 ms. >> I simplified my schema, you can see it in attached file. >> I've tested on 9.4beta3 and 9.3.5. > I didn't look at the test case. But most likely the problem is that > after 5 executions, plancache decides that a generic plan is no more > expensive than a custom plan, and reverts to executing the generic > one. But in reality there is a big difference in execution time. Yeah. Ilya didn't show the exact case he was testing, but I suppose it was one where the timestamptz range covered the whole table (since the test script created a table with a mighty tiny range of date values, it would have taken some effort to do otherwise). Anyway I tried it with this case: select test.fun (1, 'yesterday', 'tomorrow'); and indeed reproduced a big slowdown in later executions. You can examine what the planner is doing with prepared statements, such as ones in plpgsql functions, via PREPARE/EXECUTE: regression=# prepare foo(int, timestamptz, timestamptz) as select sum(V1.data + V2.data) from test.view V1 inner join test.view V2 using(id) where V1.object_id=$1 and V1.time between $2 and $3 and V2.time between $2 and $3; PREPARE The first time you try it you get: regression=# explain execute foo (1, 'yesterday', 'tomorrow'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=7037.25..7037.26 rows=1 width=8) -> Merge Join (cost=5909.71..6759.53 rows=55544 width=8) Merge Cond: (v1.id = v2.id) -> Sort (cost=2967.35..2975.69 rows=3333 width=8) Sort Key: v1.id -> Subquery Scan on v1 (cost=0.00..2772.33 rows=3333 width=8) -> Seq Scan on data (cost=0.00..2739.00 rows=3333 width=8) Filter: (("time" >= '2014-10-20 00:00:00-04'::timestamp with time zone) AND ("time" <= '2014-10-2200:00:00-04'::timestamp with time zone) AND (object_id = 1) AND test.can_read(object_id)) -> Sort (cost=2942.35..2950.69 rows=3333 width=8) Sort Key: v2.id -> Subquery Scan on v2 (cost=0.00..2747.33 rows=3333 width=8) -> Seq Scan on data data_1 (cost=0.00..2714.00 rows=3333 width=8) Filter: (("time" >= '2014-10-20 00:00:00-04'::timestamp with time zone) AND ("time" <= '2014-10-2200:00:00-04'::timestamp with time zone) AND test.can_read(object_id)) (13 rows) After repeating that a few times it switches to: regression=# explain execute foo (1, 'yesterday', 'tomorrow'); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Aggregate (cost=428.63..428.64 rows=1 width=8) -> Nested Loop (cost=5.20..428.54 rows=17 width=8) Join Filter: (data.id = v2.id) -> Bitmap Heap Scan on data (cost=4.91..81.29 rows=17 width=8) Recheck Cond: ((object_id = $1) AND ("time" >= $2) AND ("time" <= $3)) Filter: test.can_read(object_id) -> Bitmap Index Scan on data_object_id_time_idx (cost=0.00..4.91 rows=50 width=0) Index Cond: ((object_id = $1) AND ("time" >= $2) AND ("time" <= $3)) -> Materialize (cost=0.29..342.79 rows=17 width=8) -> Subquery Scan on v2 (cost=0.29..342.71 rows=17 width=8) -> Index Scan using data_object_id_time_idx on data data_1 (cost=0.29..342.54 rows=17 width=8) Index Cond: (("time" >= $2) AND ("time" <= $3)) Filter: test.can_read(object_id) (13 rows) Notice the plan now contains parameter markers $n instead of literal constants; this is because this is a generic plan. The problem is that the planner has estimated this case a lot cheaper than it did when it saw the actual parameter values (and could determine that the query was going to have to scan the entire table). Without the actual parameter values, it has to fall back on rules of thumb to estimate how much of the table will be scanned. Although it doesn't know the comparison values, it does recognize the "between" constructs as being range restrictions, and its rule of thumb is that those are fairly selective --- about 1% IIRC. So it thinks the query will not need to read very much of the table and chooses a plan that'd be appropriate if that were true. The custom-vs-generic-plan heuristic is designed for cases where the generic plan is based on unduly pessimistic assumptions; which is where most of the problems have been historically. Here we've got one where the generic plan is based on unduly optimistic assumptions. Not sure how we might fix that without breaking the more commonly complained-of case. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> Wrote in message: > Marti Raudsepp <marti@juffo.org> writes: >> On Tue, Oct 21, 2014 at 12:53 PM, Ilya I. Ashchepkov <koctep@gmail.com> wrote: >>> I wrote a function and during testing it I came across the strange >>> behaviour. >>> Function runtime is about 200ms first 5 times, 6th and futher calls takes >>> ~22000 ms. >>> I simplified my schema, you can see it in attached file. >>> I've tested on 9.4beta3 and 9.3.5. > >> I didn't look at the test case. But most likely the problem is that >> after 5 executions, plancache decides that a generic plan is no more >> expensive than a custom plan, and reverts to executing the generic >> one. But in reality there is a big difference in execution time. > > Yeah. Ilya didn't show the exact case he was testing, but I suppose > it was one where the timestamptz range covered the whole table (since > the test script created a table with a mighty tiny range of date values, > it would have taken some effort to do otherwise). Anyway I tried it with > this case: > select test.fun (1, 'yesterday', 'tomorrow'); > and indeed reproduced a big slowdown in later executions. > > You can examine what the planner is doing with prepared statements, > such as ones in plpgsql functions, via PREPARE/EXECUTE: > > regression=# prepare foo(int, timestamptz, timestamptz) as > select sum(V1.data + V2.data) > from test.view V1 > inner join test.view V2 using(id) > where > V1.object_id=$1 > and V1.time between $2 and $3 > and V2.time between $2 and $3; > PREPARE > > The first time you try it you get: > > regression=# explain execute foo (1, 'yesterday', 'tomorrow'); > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > Aggregate (cost=7037.25..7037.26 rows=1 width=8) > -> Merge Join (cost=5909.71..6759.53 rows=55544 width=8) > Merge Cond: (v1.id = v2.id) > -> Sort (cost=2967.35..2975.69 rows=3333 width=8) > Sort Key: v1.id > -> Subquery Scan on v1 (cost=0.00..2772.33 rows=3333 width=8) > -> Seq Scan on data (cost=0.00..2739.00 rows=3333 width=8) > Filter: (("time" >= '2014-10-20 00:00:00-04'::timestamp with time zone) AND ("time" <= '2014-10-2200:00:00-04'::timestamp with time zone) AND (object_id = 1) AND test.can_read(object_id)) > -> Sort (cost=2942.35..2950.69 rows=3333 width=8) > Sort Key: v2.id > -> Subquery Scan on v2 (cost=0.00..2747.33 rows=3333 width=8) > -> Seq Scan on data data_1 (cost=0.00..2714.00 rows=3333 width=8) > Filter: (("time" >= '2014-10-20 00:00:00-04'::timestamp with time zone) AND ("time" <= '2014-10-2200:00:00-04'::timestamp with time zone) AND test.can_read(object_id)) > (13 rows) > > After repeating that a few times it switches to: > > regression=# explain execute foo (1, 'yesterday', 'tomorrow'); > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=428.63..428.64 rows=1 width=8) > -> Nested Loop (cost=5.20..428.54 rows=17 width=8) > Join Filter: (data.id = v2.id) > -> Bitmap Heap Scan on data (cost=4.91..81.29 rows=17 width=8) > Recheck Cond: ((object_id = $1) AND ("time" >= $2) AND ("time" <= $3)) > Filter: test.can_read(object_id) > -> Bitmap Index Scan on data_object_id_time_idx (cost=0.00..4.91 rows=50 width=0) > Index Cond: ((object_id = $1) AND ("time" >= $2) AND ("time" <= $3)) > -> Materialize (cost=0.29..342.79 rows=17 width=8) > -> Subquery Scan on v2 (cost=0.29..342.71 rows=17 width=8) > -> Index Scan using data_object_id_time_idx on data data_1 (cost=0.29..342.54 rows=17 width=8) > Index Cond: (("time" >= $2) AND ("time" <= $3)) > Filter: test.can_read(object_id) > (13 rows) > > Notice the plan now contains parameter markers $n instead of literal > constants; this is because this is a generic plan. The problem is > that the planner has estimated this case a lot cheaper than it did > when it saw the actual parameter values (and could determine that the > query was going to have to scan the entire table). Without the actual > parameter values, it has to fall back on rules of thumb to estimate > how much of the table will be scanned. Although it doesn't know the > comparison values, it does recognize the "between" constructs as being > range restrictions, and its rule of thumb is that those are fairly > selective --- about 1% IIRC. So it thinks the query will not need to > read very much of the table and chooses a plan that'd be appropriate > if that were true. > > The custom-vs-generic-plan heuristic is designed for cases where the > generic plan is based on unduly pessimistic assumptions; which is where > most of the problems have been historically. Here we've got one where > the generic plan is based on unduly optimistic assumptions. Not sure > how we might fix that without breaking the more commonly complained-of > case. > > regards, tom lane > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > > Thank you for detailed explanation. You guessed right my test, I forgot to add it to the first message. I'll add more data and try more realistic test -- Ilya