Thread: Fwd: [JDBC] Re: 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102

We have an interesting problem, and the reporter has been kind enough to provide logs for which we can't explain.

I'd be interested to hear any plausible explanations for a prepared plan suddenly going from 2ms to 60ms for the same input values ?



---------- Forwarded message ----------
From: Thomas Kellerer <spam_eater@gmx.net>
Date: 12 January 2016 at 04:03
Subject: [JDBC] Re: 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102
To: pgsql-jdbc@postgresql.org


> Is it possible to get server logs ?

I have picked out one of the statements that suffered from this and sanitized the logfile.

   http://sql-workbench.net/pg_jdbc_94.log

The complete statement is at the top of the file and in the messages themselves,
I have replaced each occurrence of the statement with "select ...."

The interesting thing (at least for me) is that the first few executions of the
server side statement have pretty much the same runtime as the ones before the prepare.
And then suddenly the runtime shoots through the rough going up from 1ms to 40ms or even 60ms

Regards
Thomas




--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc

On 12/01/16 13:00, Dave Cramer wrote:
> We have an interesting problem, and the reporter has been kind enough to
> provide logs for which we can't explain.
>
> I'd be interested to hear any plausible explanations for a prepared plan
> suddenly going from 2ms to 60ms for the same input values ?

This is a new feature in 9.2, where on the fifth (or sixth, not sure) 
execution the planner might choose to use a generic plan.  From the 9.2 
release notes (though I'm fairly certain this is documented somewhere in 
the manual as well):

In the past, a prepared statement always had a single "generic" plan 
that was used for all parameter values, which was frequently much 
inferior to the plans used for non-prepared statements containing 
explicit constant values. Now, the planner attempts to generate custom 
plans for specific parameter values. A generic plan will only be used 
after custom plans have repeatedly proven to provide no benefit. This 
change should eliminate the performance penalties formerly seen from use 
of prepared statements (including non-dynamic statements in PL/pgSQL).


.m



Hi Marko,

Interesting so why would it choose a worse plan at that point ? Why would it change at all if the current plan is working well ?


On 12 January 2016 at 07:15, Marko Tiikkaja <marko@joh.to> wrote:
On 12/01/16 13:00, Dave Cramer wrote:
We have an interesting problem, and the reporter has been kind enough to
provide logs for which we can't explain.

I'd be interested to hear any plausible explanations for a prepared plan
suddenly going from 2ms to 60ms for the same input values ?

This is a new feature in 9.2, where on the fifth (or sixth, not sure) execution the planner might choose to use a generic plan.  From the 9.2 release notes (though I'm fairly certain this is documented somewhere in the manual as well):

In the past, a prepared statement always had a single "generic" plan that was used for all parameter values, which was frequently much inferior to the plans used for non-prepared statements containing explicit constant values. Now, the planner attempts to generate custom plans for specific parameter values. A generic plan will only be used after custom plans have repeatedly proven to provide no benefit. This change should eliminate the performance penalties formerly seen from use of prepared statements (including non-dynamic statements in PL/pgSQL).


.m

Here's the simplified testcase:
https://gist.github.com/vlsi/df08cbef370b2e86a5c1

It reproduces the problem in both 9.4.4 and 9.5rc1.
It is reproducible via both psql and pgjdbc.

I use a single table, however my production case includes a join of
two tables and the query is like
select ... from foo, bar where foo.skewed=? and bar.non_skewed=? and
foo.bar_id=bar.id

Note: my application _always_ sends *the same* *bad* value for skewed
column (it effectively is used as a filtering column in the particular
query).
Unfortunately, on 6th execution backend switches to the plan that uses
skewed index access.

Is it something that can be fixed/improved?

Good plan (the first 5 executions):
Index Scan using non_skewed__flipper on plan_flipper
(cost=0.43..42.77 rows=10 width=113) (actual time=0.030..0.072 rows=10
loops=1)
  Index Cond: (non_skewed = 42)
  Filter: (skewed = 0)
  Rows Removed by Filter: 10
  Buffers: shared hit=20 read=3
Execution time: 0.094 ms

Bad plan (all the subsequent executions):
Index Scan using skewed__flipper on plan_flipper  (cost=0.43..6.77
rows=1 width=113) (actual time=0.067..355.867 rows=10 loops=1)
  Index Cond: (skewed = $1)
  Filter: (non_skewed = $2)
  Rows Removed by Filter: 999990
  Buffers: shared hit=18182 read=2735
Execution time: 355.901 ms


Vladimir

Attachment
Vladimir Sitnikov wrote:
> Here's the simplified testcase:
> https://gist.github.com/vlsi/df08cbef370b2e86a5c1
> 
> It reproduces the problem in both 9.4.4 and 9.5rc1.
> It is reproducible via both psql and pgjdbc.
> 
> I use a single table, however my production case includes a join of
> two tables and the query is like
> select ... from foo, bar where foo.skewed=? and bar.non_skewed=? and
> foo.bar_id=bar.id
> 
> Note: my application _always_ sends *the same* *bad* value for skewed
> column (it effectively is used as a filtering column in the particular
> query).
> Unfortunately, on 6th execution backend switches to the plan that uses
> skewed index access.
> 
> Is it something that can be fixed/improved?
> 
> Good plan (the first 5 executions):
> Index Scan using non_skewed__flipper on plan_flipper
> (cost=0.43..42.77 rows=10 width=113) (actual time=0.030..0.072 rows=10
> loops=1)
>   Index Cond: (non_skewed = 42)
>   Filter: (skewed = 0)
>   Rows Removed by Filter: 10
>   Buffers: shared hit=20 read=3
> Execution time: 0.094 ms
> 
> Bad plan (all the subsequent executions):
> Index Scan using skewed__flipper on plan_flipper  (cost=0.43..6.77
> rows=1 width=113) (actual time=0.067..355.867 rows=10 loops=1)
>   Index Cond: (skewed = $1)
>   Filter: (non_skewed = $2)
>   Rows Removed by Filter: 999990
>   Buffers: shared hit=18182 read=2735
> Execution time: 355.901 ms

The problem is that the index "skewed__flipper" is more selective than
"non_skewed__flipper" except when "skewed = 0", so the generic plan prefers it.

I don't know if there is a good solution except disabling server prepared statements.

Yours,
Laurenz Albe

> I don't know if there is a good solution except disabling server prepared statements.

Why doesn't backend reuse already existing good plan?
The plan does account for the skew.

Can backend take selectivities from the original bind values?

Vladimir





2016-01-12 16:52 GMT+01:00 Vladimir Sitnikov <sitnikov.vladimir@gmail.com>:
> I don't know if there is a good solution except disabling server prepared statements.

Why doesn't backend reuse already existing good plan?

probably good plan is more expensive than wrong plan :(

this logic is driven by plan cost, not by plan execution time.

 
The plan does account for the skew.

Can backend take selectivities from the original bind values?

Vladimir


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

VS>>Why doesn't backend reuse already existing good plan?
PS>this logic is driven by plan cost, not by plan execution time.

It do understand that currently PG replans with $1, $2 and uses
default selectivities for that.

What I am asking is to make PG aware of "previously used bind values",
so it would calculate proper selectivities for $1, $2.

PS. It is not the first time the problem bites me, so I hand-crafted a testcase.

Vladimir





2016-01-12 17:01 GMT+01:00 Vladimir Sitnikov <sitnikov.vladimir@gmail.com>:
VS>>Why doesn't backend reuse already existing good plan?
PS>this logic is driven by plan cost, not by plan execution time.

It do understand that currently PG replans with $1, $2 and uses
default selectivities for that.

What I am asking is to make PG aware of "previously used bind values",
so it would calculate proper selectivities for $1, $2.

the implementation is simply - but it hard to design some really general - it is task for UI

Pavel

 

PS. It is not the first time the problem bites me, so I hand-crafted a testcase.

 

Vladimir

>the implementation is simply - but it hard to design some really general - it is task for UI

Can you please rephrase?

Current design is "if the cost of a generic plan is less than the one
of a custom plan+replan, prefer generic".
I think that is wrong.

"Generic plan" misunderestimates a cost in a sense that it assumes
some pre-defined selectivities.
In other words, if "skewed" values are used, "custom plan" would
likely to have *worse cost* than the one of a generic plan, yet custom
plan is much more suitable for a particular parameter set.
As backend refers to boundParams, it does see that particular
condition is tough, while generic estimator just the cost.

Looking into plancache.c comments I see 3 possible plans:
1) custom plan with PARAM_FLAG_CONST=1. It should probably
constant-fold based on input parameters.

2) custom plan with PARAM_FLAG_CONST=0. I think it should just use
given parameters for selectivity estimations. The generated plan
should still be valid for use with other input values.
3) generic plan. The plan with all variables. <-- here's current behavior

1 has a replan cost.

2&3 can be cached and reused.

Is that correct?
I think #2 is better option than #3 since it gives better plan
stability, thus it is much easier to test and reason about.

This all boils down to adjustment in a single line:

https://github.com/postgres/postgres/blob/ee943004466418595363d567f18c053bae407792/src/backend/utils/cache/plancache.c#L1151-L1152

Does that make sense?

Vladimir





2016-01-13 11:44 GMT+01:00 Vladimir Sitnikov <sitnikov.vladimir@gmail.com>:
>the implementation is simply - but it hard to design some really general - it is task for UI

Can you please rephrase?

Sorry - It is task for artifical inteligency 

Current design is "if the cost of a generic plan is less than the one
of a custom plan+replan, prefer generic".
I think that is wrong.

"Generic plan" misunderestimates a cost in a sense that it assumes
some pre-defined selectivities.

Generic plan in Postgres is optimized for most common values - so in avarage it should be optimal. But the reality is different - the wrong estimation can be everywhere and the estimation can be lower or upper than reality.
 
In other words, if "skewed" values are used, "custom plan" would
likely to have *worse cost* than the one of a generic plan, yet custom
plan is much more suitable for a particular parameter set.
As backend refers to boundParams, it does see that particular
condition is tough, while generic estimator just the cost.

And there is a second issue - you have not a idea, what parameter vector will follow. You cannot to check and optimize plans every where, because a planning can be expensive, and you should to reuse plan more times. What was true, for first iterations, then it should not be true in following iterations.
 
I like a strategy based on risks. Probably there are situation, when the generic plan is great every time - INSERTs, UPDATEs via PK, simple SELECTs via PK. generic plan can be well if almost all data has similar probability. Elsewhere on bigger data, the probability of pretty slow plan is higher, and then we should to prefer custom plan.

so the strategy - if cost of generic plan is less than some MAGIC CONSTANT (can be specified by GUC), then use generic plan. Elsewhere use a custom plan everytime.

It allow to controll the plan reusing. When MAGIC CONSTANT = 0 .. use custom plan everytime, When MAGIC CONSTANT = M, then use generic plan always.

Regards

Pavel


Looking into plancache.c comments I see 3 possible plans:
1) custom plan with PARAM_FLAG_CONST=1. It should probably
constant-fold based on input parameters.

2) custom plan with PARAM_FLAG_CONST=0. I think it should just use
given parameters for selectivity estimations. The generated plan
should still be valid for use with other input values.
3) generic plan. The plan with all variables. <-- here's current behavior

1 has a replan cost.

2&3 can be cached and reused.

Is that correct?
I think #2 is better option than #3 since it gives better plan
stability, thus it is much easier to test and reason about.

This all boils down to adjustment in a single line:
https://github.com/postgres/postgres/blob/ee943004466418595363d567f18c053bae407792/src/backend/utils/cache/plancache.c#L1151-L1152

Does that make sense?

Vladimir

Please, read my suggestion again.

TL;DR: I suggest to create "generic plan" with regard to current bind values.
What's wrong with that approach?

Vladimir



On 13/01/16 14:12, Pavel Stehule wrote:
> so the strategy - if cost of generic plan is less than some MAGIC CONSTANT
> (can be specified by GUC), then use generic plan. Elsewhere use a custom
> plan everytime.
>
> It allow to controll the plan reusing. When MAGIC CONSTANT = 0 .. use
> custom plan everytime, When MAGIC CONSTANT = M, then use generic plan
> always.

I don't think that would solve even the original problem without 
effectively disabling generic plans, despite the problem being 
relatively simple.  The generic plan appears to be really cheap, because 
the planner doesn't have the concept of a "worst case".


.m



On 13/01/16 14:27, Vladimir Sitnikov wrote:
> TL;DR: I suggest to create "generic plan" with regard to current bind values.
> What's wrong with that approach?

I don't understand how this is supposed to work.  Say you already have a 
plan which looks like this:
 Seq Scan on foo  (cost=0.00..100.00 rows=1 width=630)   Filter: (bar = $1)

Now the plan gets invoked with  $1 = 5.  What exactly in your mind would 
happen here?


.m



Note: I do not suggest changing already cached plans yet.
I suggest looking into "6th bind values" when building a cached plan.
In other words, "if first 5 execution do not reveal dependence on bind
values, then cache the generated plan".

>Say you already have a plan which looks like this:
>Now the plan gets invoked with  $1 = 5.  What exactly in your mind would happen here?

A sequential scan with $1=5 condition. What else could be there?

Vladimir



On 13/01/16 14:36, Vladimir Sitnikov wrote:
>> Say you already have a plan which looks like this:
>> Now the plan gets invoked with  $1 = 5.  What exactly in your mind would happen here?
>
> A sequential scan with $1=5 condition. What else could be there?

I don't know, it's your proposal :-)  But it looks like I misunderstood.

> Note: I do not suggest changing already cached plans yet.
> I suggest looking into "6th bind values" when building a cached plan.

But that wouldn't have helped your case.  The custom plan is *more 
expensive*; I'm guessing because the generic plan gambles on a better 
average case instead of preparing for the worst case.


.m



> The custom plan is *more expensive*;

You compare costs of custom vs generic plans.
I suggest: do not compare costs *at all*.

>I don't know, it's your proposal :-)  But it looks like I misunderstood.

It is not.

My suggestion is: build a generic plan (that is the plan that will
return proper result for every possible bind value), yet refer to the
values of 6th binds when estimating cardinalitites.
Is it clear now?

Vladimir



On January 13, 2016 3:02:27 PM GMT+01:00, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
>> The custom plan is *more expensive*;
>
>You compare costs of custom vs generic plans.
>I suggest: do not compare costs *at all*.
>
>>I don't know, it's your proposal :-)  But it looks like I
>misunderstood.
>
>It is not.
>
>My suggestion is: build a generic plan (that is the plan that will
>return proper result for every possible bind value), yet refer to the
>values of 6th binds when estimating cardinalitites.
>Is it clear now?

That's not going to fly for two reasons: for one custom plans can be much better than the generic plan, independent of
cardinalities.Consider e.g. a partitioned table, where the generic scan will scan all partitions. For another, just
usingthe specific values for the generic plan will have horrible results if the distribution isn't entirely boring,
considere.g a table with one somewhat common and otherwise just unique values.
 

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.



Maybe more work, but:

Could a custom plan be cached /along with the values of
the parameters for which it was planned/?  Then it's kind of
a no-brainer to say "if the parameters are the same this time,
I'm probably looking at the best plan already."  Pretty simpleminded,
but at least it would catch the testcase where the plan flips to
a bad one even when literally the very same parameters are used.

Generalizing a little, how about caching a plan along with a
boolean expression tree over the parameters, evaluating to true
if the new parameters are "close enough" to the planned ones so
that the plan is probably still better than generic?

Version zero could just create the expression p1 = oldp1
AND p2 = oldp2 AND p3 = oldp3, and be simply the same as the
first suggestion. But then, how to generate more interesting
and useful validity-expressions for different plan types and
statistics could be a future-work area with some meat to it,
promising successive improvements. Maybe plugins could
supplement it for particular characterized workloads....

-Chap


On 01/13/2016 08:28 AM, Marko Tiikkaja wrote:
> On 13/01/16 14:12, Pavel Stehule wrote:
>> so the strategy - if cost of generic plan is less than some MAGIC
>> CONSTANT
>> (can be specified by GUC), then use generic plan. Elsewhere use a custom
>> plan everytime.
>>
>> It allow to controll the plan reusing. When MAGIC CONSTANT = 0 .. use
>> custom plan everytime, When MAGIC CONSTANT = M, then use generic plan
>> always.
> 
> I don't think that would solve even the original problem without
> effectively disabling generic plans, despite the problem being
> relatively simple.  The generic plan appears to be really cheap, because
> the planner doesn't have the concept of a "worst case".
> 
> 
> .m
> 




> for one custom plans can be much better than the generic plan, independent of cardinalities

So what? I do not suggest dropping custom plans entirely.
I perfectly understand there are cases when better replan every time.

> consider e.g a table with one somewhat common and otherwise just unique values.

So what?
If I understand you properly, you mean: "if client sends unique binds
first 5-6 executions and bad non-unique afterwards, then cached plan
would be bad". Is that what you are saying?
I agree that is the corner-case for my suggestion.
Is is really happening often?

I state the following:
1) It is way easier to debug & analyze.
For instance: current documentation does *not* list a way to get a
*generic plan*.
Is that obvious that "you just need to EXPLAIN ANALYZE EXECUTE *6
times in a row*" just to get a generic plan?

2) It is likely to be more performant. We just need to explain users
that "if different plans required, just use different statements".
Isn't that obvious?
Frankly speaking, I do not like "plug&pray" kind of code that just
sends bind values and expects magically optimized plan for each bind
combination.

3) What about "client sends top most common value 5 times in a row"?
Why assume "it will stop doing that"?
I think the better assumption is "it will continue doing that".

At the end, if a client wants specific treatment of a query, then
he/she might be better using separate server-prepared statements (the
one for "unique values", and another one for "non-unique").

Vladimir



On 13/01/16 15:26, Vladimir Sitnikov wrote:
> 2) It is likely to be more performant. We just need to explain users
> that "if different plans required, just use different statements".

This completely screws over PL/PgSQL, among other things.


.m



>This completely screws over PL/PgSQL, among other things.

Can you elaborate a bit?

Vladimir



On 2016-01-13 17:26:43 +0300, Vladimir Sitnikov wrote:
> > consider e.g a table with one somewhat common and otherwise just unique values.> 
> So what?
> If I understand you properly, you mean: "if client sends unique binds
> first 5-6 executions and bad non-unique afterwards, then cached plan
> would be bad". Is that what you are saying?

That's one of several problems, yes. Generally using a very small sample
("bind values in the the first query"), to plan every future query isn't
going to be fun.

> I agree that is the corner-case for my suggestion.
> Is is really happening often?

Yes.

> I state the following:
> 1) It is way easier to debug & analyze.

Meh. That a prepared statement suddenly performs way differently
depending on which the first bind values are is not, in any way, easier
to debug.

> For instance: current documentation does *not* list a way to get a
> *generic plan*.

Which doensn't have anything to do with your proposal. That'd not change
with the change you propose.

> Is that obvious that "you just need to EXPLAIN ANALYZE EXECUTE *6
> times in a row*" just to get a generic plan?

No. And I hate that. I think it'd be very good to expand EXPLAIN's
output to include information about custom/generic plans.


> 3) What about "client sends top most common value 5 times in a row"?
> Why assume "it will stop doing that"?
> I think the better assumption is "it will continue doing that".

If 20% of your values are nonunique and the rest is unique you'll get
*drastically* different plans, each performing badly for the other case;
with the unique cardinality plan being extremly bad.

Andres



On 13/01/16 15:34, Vladimir Sitnikov wrote:
>> This completely screws over PL/PgSQL, among other things.
>
> Can you elaborate a bit?

You just write a query like this:
  SELECT * FROM foo WHERE bar = _Variable;

so you don't get to (or want to) have any control over the underlying 
prepared statement.


.m



>so you don't get to (or want to) have any control over the underlying prepared statement.

That is pl/pgsql's problem, isn't it?
In the mean time, user can use different query texts (e.g. by adding
offset 0, offset 0*1, offset 0*2, etc kind of stuff they typically use
to tune queries) to convince plpgsql to use different statement ids.

Vladimir



On 2016-01-13 17:38:22 +0300, Vladimir Sitnikov wrote:
> >so you don't get to (or want to) have any control over the underlying prepared statement.
> 
> That is pl/pgsql's problem, isn't it?
> In the mean time, user can use different query texts (e.g. by adding
> offset 0, offset 0*1, offset 0*2, etc kind of stuff they typically use
> to tune queries) to convince plpgsql to use different statement ids.

Basically you're arguing to fix one specific edge case which bugs you
personally, by creating a lot of others, which don't bug you. Not
convincing.

Andres Freund



> Generally using a very small sample

That is another issue. Inventing some other algorithm instead of
current "cache after 5 executions" is another effort.

However, I suggest to "learn" from what client is sending.
You suggest to completely ignore that and just prepare for the case
he/she will send "a random value".
Why expect client would stop sending MCVs if we have already seen them
during previous 5 executions?

> That'd not change with the change you propose.

It will.
In my suggestion, the first "explain analyze execute" will match the
"finally cached plan" provided the plan is not treated in a special
way (e.g. replan every time, etc).

> That a prepared statement suddenly performs way differently
>depending on which the first bind values are is not, in any way, easier
>to debug.

It is way easier to debug since *the first* execution plan you get out
of "explain" *matches* the one that will finally be used.
Lots of developers are just not aware of "5 replans by backend".
Lots of the remaining confuse it with "5 non-server-prepared
executions by pgjdbc driver".

In other way: in order to identify a root cause of a slow query you
find bind values. Then you perform explain analyze and you find shiny
fast plan.
Does that immediately ring bells that you need to execute it 6 times
to ensure the plan would still be good?
Well, try being someone not that smart as you are when you answering
this question.

2) In my suggestion, "the first execution would likely to match the plan".

VS>> 3) What about "client sends top most common value 5 times in a row"?
VS>> Why assume "it will stop doing that"?
AF>If 20% of your values are nonunique and the rest is unique you'll get
AF>*drastically* different plans, each performing badly for the other case;
AF>with the unique cardinality plan being extremly bad.

Can you elaborate a bit? I can hardly follow that.



Vladimir



>Basically you're arguing to fix one specific edge case which bugs you
>personally, by creating a lot of others, which don't bug you. Not
>convincing.

It bugs me.
It bugs clients of pgjdbc (e.g. Thomas who started the thread).

Note: support of prepared statements for java applications has just
landed. The release dates are 2015-08-27 for pgjdbc, and 2014-02-24
for pgjdbc-ng.
I think current report is just a tip of the iceberg.

> by creating a lot of others, which don't bug you

I think it will not create "lots of others".
Do you have any statistics why particular flavour of generic plan was
committed in 9.2?

Suppose there are two type of binds: "non_unique" (N) and "unique" (U)
that require different plans for perfect response times.

I see the following sequences
UUUUUUUU -- all clear, all the approaches would converge to plan for
"unique values".

NNNNNNNN -- query for non-unique value is executed again and again. Perfect optimizer would either replan or reuse plan
withregard to "MCV" Current behaviour would switch to "optimistic" plan at 6th
 
iteration. It is the case of the thread. My suggestion is to learn that "MCV is used -> use plan optimized for MCV"

^^^ note that above are "recommended" uses of the database. Each
statement is used for its own purpose: one for MCVs, another for "good
values".

Then there are cases of mixed executions.
Note: I state that mixing "kinds" of bind values is a bad application
design anyway. In other words, application developer should understand
if a query is DWH-like (requires replans) or OLTP-like (does not
require replans). Agreed?

NNNNNUUUUUUU Current behavior optimized for exactly this pattern. Well, why was it chosen over "UUUUUNNNNNNN"?

In other words, a pattern like UUUUUNNNNNNN would "create a lot of
others" as you say.

NUNUNUNUNUN -- perfect optimizer would replan every time (or have two
sets of plans, but let's leave that out) Neither my suggestion nor current behaviour properly covers the case.


I suggest to spare "NNNNNUUUUUUU" pattern in order to improve "NNNNNNNN".

Vladimir



Pavel Stehule wrote:
> I like a strategy based on risks. Probably there are situation, when the generic plan is great every
> time - INSERTs, UPDATEs via PK, simple SELECTs via PK. generic plan can be well if almost all data has
> similar probability. Elsewhere on bigger data, the probability of pretty slow plan is higher, and then
> we should to prefer custom plan.
> 
> so the strategy - if cost of generic plan is less than some MAGIC CONSTANT (can be specified by GUC),
> then use generic plan. Elsewhere use a custom plan everytime.
> 
> It allow to controll the plan reusing. When MAGIC CONSTANT = 0 .. use custom plan everytime, When
> MAGIC CONSTANT = M, then use generic plan always.

I have a different idea:

What about a GUC "custom_plan_threshold" that controls after how many
executions a generic plan will be considered, with a default value of 5.

A value of -1 could disable generic plans.

Yours,
Laurenz Albe

>A value of -1 could disable generic plans

I do not like the idea.

I've seen dramatic performance improvements from using cached plans.
The numbers are like "20ms to plan vs 1ms to execute" for an often
used OLTP query. Query text is involved (~5-10KiB).

Vladimir





2016-01-13 16:18 GMT+01:00 Albe Laurenz <laurenz.albe@wien.gv.at>:
Pavel Stehule wrote:
> I like a strategy based on risks. Probably there are situation, when the generic plan is great every
> time - INSERTs, UPDATEs via PK, simple SELECTs via PK. generic plan can be well if almost all data has
> similar probability. Elsewhere on bigger data, the probability of pretty slow plan is higher, and then
> we should to prefer custom plan.
>
> so the strategy - if cost of generic plan is less than some MAGIC CONSTANT (can be specified by GUC),
> then use generic plan. Elsewhere use a custom plan everytime.
>
> It allow to controll the plan reusing. When MAGIC CONSTANT = 0 .. use custom plan everytime, When
> MAGIC CONSTANT = M, then use generic plan always.

I have a different idea:

What about a GUC "custom_plan_threshold" that controls after how many
executions a generic plan will be considered, with a default value of 5.

A value of -1 could disable generic plans.

yes, I though about it too - it is simple, and almost deterministic

Pavel
 

Yours,
Laurenz Albe



2016-01-13 16:22 GMT+01:00 Vladimir Sitnikov <sitnikov.vladimir@gmail.com>:
>A value of -1 could disable generic plans

I do not like the idea.

I've seen dramatic performance improvements from using cached plans.
The numbers are like "20ms to plan vs 1ms to execute" for an often
used OLTP query. Query text is involved (~5-10KiB).

but currently we have not any tool how to check the quality of plan for new set of parameters. If plan is ok for one value parameters, then can be pretty bad for following parameters.

Albe's proposal can be good enough for 2/3 cases and it doesn't block any other enhancing. There is still 1/3 of queries - too complex (slow planning) too dynamic plan (the generic plan doesn't work).

Regards

Pavel
 

Vladimir

Vladimir Sitnikov <sitnikov.vladimir@gmail.com> writes:
> Note: I state that mixing "kinds" of bind values is a bad application
> design anyway. In other words, application developer should understand
> if a query is DWH-like (requires replans) or OLTP-like (does not
> require replans). Agreed?

No, not agreed.  As was already pointed out upthread, such information
is not available in many use-cases for the plancache.

The real problem here IMO is inaccurate plan cost estimates, and that's
not something that there is any easy fix for.

However ... one specific aspect of that is that to some extent, the cost
estimate made for the generic plan is incommensurate with the estimates
for the custom plans because the latter are made with more information.
I don't remember the details of your specific case anymore, but we've
seen cases where the generic plan is falsely estimated to be cheaper
than custom plans because of this.

I wonder whether it would be useful to reject a generic plan anytime its
estimate is less than the average (or minimum?) estimate for the custom
plans.  If it is less, then either (1) the generic plan is falsely
optimistic, or (2) the specific parameter values provided for the custom
plans were all ones for which the planner could see that the generic plan
was non-optimal.  If (2) holds for the first few custom plans then it's
not unreasonable to suppose that it will keep on holding, and we had
better not use the generic plan.

Basically, the case we're *expecting* to see is that a custom plan is the
same or better cost as the generic plan --- same cost if it's really the
same plan, better cost if knowing the parameter values allows some
optimization to be performed (LIKE-pattern-to-index conversion, partition
scan suppression via constraint exclusion, etc).  If we get a higher cost
estimate for the custom plan then something is fishy and we shouldn't
believe it.

Maybe I'm missing some case where that situation would arise naturally.
Or maybe such a rule wouldn't actually help in very many real-world
cases.  But it seems worth looking into.
        regards, tom lane



>(1) the generic plan is falsely optimistic

That is my case.
Application is sending most common value on every execution while
backend is optimistic and it things that the app would stop sending
MCVs.

Costs for the plans are OK. However, there is a data skew, so it is
hard to tell what is the "true" selectivity of the skewed column in
general, thus the discussion.

VS>>In other words, application developer should understand
VS>> if a query is DWH-like (requires replans) or OLTP-like (does not
VS>> require replans). Agreed?
Tom>No, not agreed.  As was already pointed out upthread, such information
Tom>is not available in many use-cases for the plancache.

I think you answer the wrong question.
I was asking if you agree that _application_ developer (not pg backed
developer) should know if a query is OLTP or DWH like.

Do you really think app developer should not care which plan would be
chosen for a particular query he is working on?
Why all that "explain" stuff in documentation then?

In the plancache.c you have CURSOR_OPT_GENERIC_PLAN and
CURSOR_OPT_CUSTOM_PLAN flags.
It is obvious that those flags are not yet exposed/used by
applications, but my message is that "one should *not* think that DB
has artificial intelligence to properly identify a plan for each bind
sets and cache plans at the same time".

Vladimir



>If plan is ok for one value parameters, then can be pretty bad for following parameters.

Happy statements are all alike; every unhappy statement is unhappy in
its own way (see [1]).
If user is sending different kinds of parameters, he is shooting in the foot.

>Albe's proposal can be good enough for 2/3 cases and it doesn't block any other enhancing

Albe's proposal effectively disables plan cache, thus it blocks enhancing.
If a user goes "replan every time" route, there is no way you
introduce plan caching there.

[1]: https://en.wikipedia.org/wiki/Anna_Karenina_principle

Vladimir





2016-01-13 17:12 GMT+01:00 Vladimir Sitnikov <sitnikov.vladimir@gmail.com>:
>If plan is ok for one value parameters, then can be pretty bad for following parameters.

Happy statements are all alike; every unhappy statement is unhappy in
its own way (see [1]).
If user is sending different kinds of parameters, he is shooting in the foot.

>Albe's proposal can be good enough for 2/3 cases and it doesn't block any other enhancing

Albe's proposal effectively disables plan cache, thus it blocks enhancing.
If a user goes "replan every time" route, there is no way you
introduce plan caching there.

I am sorry, I disagree. Albe's proposal should be compatible with current state, so your argument is too strong. Default is same.

Pavel
 

[1]: https://en.wikipedia.org/wiki/Anna_Karenina_principle

Vladimir

On Wed, Jan 13, 2016 at 10:47:18AM -0500, Tom Lane wrote:
> Vladimir Sitnikov <sitnikov.vladimir@gmail.com> writes:
> > Note: I state that mixing "kinds" of bind values is a bad application
> > design anyway. In other words, application developer should understand
> > if a query is DWH-like (requires replans) or OLTP-like (does not
> > require replans). Agreed?
> 
> No, not agreed.  As was already pointed out upthread, such information
> is not available in many use-cases for the plancache.
> 
> The real problem here IMO is inaccurate plan cost estimates, and that's
> not something that there is any easy fix for.
> 
> However ... one specific aspect of that is that to some extent, the cost
> estimate made for the generic plan is incommensurate with the estimates
> for the custom plans because the latter are made with more information.
> I don't remember the details of your specific case anymore, but we've
> seen cases where the generic plan is falsely estimated to be cheaper
> than custom plans because of this.

I never understood why we don't just keep the selectivity estimates of
previous plans and just reuse the plan if the selectivity estimates are
similar.  Isn't parameter selectivity the only thing that distinguishes
on plan with parameter from another?

Checking selectivity estimates must be cheaper than replanning.  This
could be done at the second use of the prepared plan, and maybe for all
plan reuses, rather than waiting for five and then perhaps getting this
bad behavior.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +



Bruce Momjian <bruce@momjian.us> writes:
> I never understood why we don't just keep the selectivity estimates of
> previous plans and just reuse the plan if the selectivity estimates are
> similar.  Isn't parameter selectivity the only thing that distinguishes
> on plan with parameter from another?

> Checking selectivity estimates must be cheaper than replanning.  This
> could be done at the second use of the prepared plan, and maybe for all
> plan reuses, rather than waiting for five and then perhaps getting this
> bad behavior.

You're imagining that a selectivity recheck could be separated out from
the rest of the planner.  That's nowhere near feasible, IMO.  Even if it
were, what would we do with it?  There's no reliable way to determine
whether X% change in one or another selectivity number would change the
selected plan, other than by redoing practically all of the planning work.
        regards, tom lane



On Mon, Jan 18, 2016 at 02:14:11PM -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > I never understood why we don't just keep the selectivity estimates of
> > previous plans and just reuse the plan if the selectivity estimates are
> > similar.  Isn't parameter selectivity the only thing that distinguishes
> > on plan with parameter from another?
> 
> > Checking selectivity estimates must be cheaper than replanning.  This
> > could be done at the second use of the prepared plan, and maybe for all
> > plan reuses, rather than waiting for five and then perhaps getting this
> > bad behavior.
> 
> You're imagining that a selectivity recheck could be separated out from
> the rest of the planner.  That's nowhere near feasible, IMO.  Even if it

I think you would have to do the checks before entering the planner and
save them off for use in the planner.

> were, what would we do with it?  There's no reliable way to determine
> whether X% change in one or another selectivity number would change the
> selected plan, other than by redoing practically all of the planning work.

Well, if it is +/-1%, I think we can assume we can reuse the plan just
fine from the second prepared call until we see a major selectivity
change.  While we have never exposed the count of prepared queries
before we choose a generic plan, I can see us exposing this percentage.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +



On Wed, Jan 13, 2016 at 10:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Vladimir Sitnikov <sitnikov.vladimir@gmail.com> writes:
>> Note: I state that mixing "kinds" of bind values is a bad application
>> design anyway. In other words, application developer should understand
>> if a query is DWH-like (requires replans) or OLTP-like (does not
>> require replans). Agreed?
>
> No, not agreed.  As was already pointed out upthread, such information
> is not available in many use-cases for the plancache.
>
> The real problem here IMO is inaccurate plan cost estimates, and that's
> not something that there is any easy fix for.

Not really.  Even if the cost estimates for all of the plans tried are
perfectly accurate, you'll have only seen 5 values when you decide to
switch to a generic plan.  If the 6th, 60th, 600th, or 6000th
execution uses a parameter where a custom plan would have been a big
win, you will blindly use the generic plan anyway and lose bigtime.
On the other hand, if first five plans are all equivalent to each
other and to the generic plan, then you've spent the cost of uselessly
replanning six times instead of just caching the first plan and being
done with it.  I'm aware of an actual case where that extra
re-planning causes a serious performance problem, aggregated across
many queries and many backends.

This isn't the first complaint about this mechanism that we've gotten,
and it won't be the last.  Way too many of our users are way more
aware than they should be that the threshold here is five rather than
any other number, which to me is a clear-cut sign that this needs to
be improved.  How to improve it is a harder question.  We lack the
ability to do any kind of sensitivity analysis on a plan, so we can't
know whether there are other parameter values that would have resulted
in a different plan, nor can we test whether a particular set of
parameter values would have changed the outcome.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> writes:
> This isn't the first complaint about this mechanism that we've gotten,
> and it won't be the last.  Way too many of our users are way more
> aware than they should be that the threshold here is five rather than
> any other number, which to me is a clear-cut sign that this needs to
> be improved.

No argument there.

> How to improve it is a harder question.

Exactly.  I'm very suspicious of any easy answers to this; they'll
most likely just shift the pain around.
        regards, tom lane



Robert Haas wrote:
> This isn't the first complaint about this mechanism that we've gotten,
> and it won't be the last.  Way too many of our users are way more
> aware than they should be that the threshold here is five rather than
> any other number, which to me is a clear-cut sign that this needs to
> be improved.  How to improve it is a harder question.  We lack the
> ability to do any kind of sensitivity analysis on a plan, so we can't
> know whether there are other parameter values that would have resulted
> in a different plan, nor can we test whether a particular set of
> parameter values would have changed the outcome.

(I initially posted that question on the JDBC mailing list)

To be honest: looking at the efforts Oracle has done since 9 up until 12 I
am not sure this is a problem that can be solved by caching plans. 

Even with the new "in-flight" re-planning in Oracle 12 ("cardinality
feedback") and all the effort that goes into caching plans we are still
seeing similar problems with (prepared) statements that are suddenly slow.
And as far as I can tell, the infrastructure around plan caching,
invalidation, bind variable peeking and all that seems to be a *lot* more
complex ("sophisticated") in Oracle compared to Postgres. And the results
don't seem to justify the effort (at least in my experience).

With all the problems I have seen (in Oracle and Postgres) I think that
maybe a better solution to this problem is to make the planner fast (and
reliable) enough so that plan caching isn't necessary in the first place. 

However I have no idea how feasible that is. 





--
View this message in context:
http://postgresql.nabble.com/Fwd-JDBC-Re-9-4-1207-behaves-differently-with-server-side-prepared-statements-compared-to-9-2-1102-tp5881825p5882835.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.





2016-01-18 23:50 GMT+01:00 Thomas Kellerer <spam_eater@gmx.net>:
Robert Haas wrote:
> This isn't the first complaint about this mechanism that we've gotten,
> and it won't be the last.  Way too many of our users are way more
> aware than they should be that the threshold here is five rather than
> any other number, which to me is a clear-cut sign that this needs to
> be improved.  How to improve it is a harder question.  We lack the
> ability to do any kind of sensitivity analysis on a plan, so we can't
> know whether there are other parameter values that would have resulted
> in a different plan, nor can we test whether a particular set of
> parameter values would have changed the outcome.

(I initially posted that question on the JDBC mailing list)

To be honest: looking at the efforts Oracle has done since 9 up until 12 I
am not sure this is a problem that can be solved by caching plans.

Even with the new "in-flight" re-planning in Oracle 12 ("cardinality
feedback") and all the effort that goes into caching plans we are still
seeing similar problems with (prepared) statements that are suddenly slow.
And as far as I can tell, the infrastructure around plan caching,
invalidation, bind variable peeking and all that seems to be a *lot* more
complex ("sophisticated") in Oracle compared to Postgres. And the results
don't seem to justify the effort (at least in my experience).

With all the problems I have seen (in Oracle and Postgres) I think that
maybe a better solution to this problem is to make the planner fast (and
reliable) enough so that plan caching isn't necessary in the first place.

However I have no idea how feasible that is.

for statements like INSERT INTO tab VALUES(..), UPDATE tab SET x = WHERE id = .. will be planner significant overhead. But these statements are relative simply and probably some solution is exists.

Regards

Pavel

 





--
View this message in context: http://postgresql.nabble.com/Fwd-JDBC-Re-9-4-1207-behaves-differently-with-server-side-prepared-statements-compared-to-9-2-1102-tp5881825p5882835.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Tue, Jan 19, 2016 at 4:20 AM, Thomas Kellerer <spam_eater@gmx.net> wrote:
>
> Robert Haas wrote:
> > This isn't the first complaint about this mechanism that we've gotten,
> > and it won't be the last.  Way too many of our users are way more
> > aware than they should be that the threshold here is five rather than
> > any other number, which to me is a clear-cut sign that this needs to
> > be improved.  How to improve it is a harder question.  We lack the
> > ability to do any kind of sensitivity analysis on a plan, so we can't
> > know whether there are other parameter values that would have resulted
> > in a different plan, nor can we test whether a particular set of
> > parameter values would have changed the outcome.
>
> (I initially posted that question on the JDBC mailing list)
>
> To be honest: looking at the efforts Oracle has done since 9 up until 12 I
> am not sure this is a problem that can be solved by caching plans.
>
> Even with the new "in-flight" re-planning in Oracle 12 ("cardinality
> feedback") and all the effort that goes into caching plans we are still
> seeing similar problems with (prepared) statements that are suddenly slow.
> And as far as I can tell, the infrastructure around plan caching,
> invalidation, bind variable peeking and all that seems to be a *lot* more
> complex ("sophisticated") in Oracle compared to Postgres. And the results
> don't seem to justify the effort (at least in my experience).
>

I have heard the same feedback as above some time back from some
of the research fellows doing research in query optimization area.  They
come-up with different concept called "Plan Bouquet" [1] where
in they try to execute multiple plans during execution and proceed with
the best-among those or something like that to address bad
plan-selection problems and their claim is that this technique proves to
be better on benchmarks than existing mechanisms used for query
optimisation.

I am not advocating any such mechanism, but rather sharing an
information, I came across.
On Mon, Jan 18, 2016 at 5:50 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:
> With all the problems I have seen (in Oracle and Postgres) I think that
> maybe a better solution to this problem is to make the planner fast (and
> reliable) enough so that plan caching isn't necessary in the first place.
>
> However I have no idea how feasible that is.

The problem is that the floor is already littered with
potentially-very-beneficial query planning ideas that got discarded
because they would add too many cycles to planning time.  Despite
that, planning time is a killer on some workloads.  So right now we've
got workloads where we plan too much and workloads where we plan too
little.  Argh.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company