Thread: Prepared statements and generic plans

Prepared statements and generic plans

From
Bruce Momjian
Date:
In Postgres 9.2 we improved the logic of when generic plans are used by
EXECUTE.  We weren't sure how well it would work, and the docs included
a vague description on when generic plans are chosen.

I have gotten a few questions lately about how prepared statements are
handled with multiple executions so I have updated the PREPARE manual
page with the attached patch to more clearly explain generic plans and
when they are chosen.

I would like to apply this to the 9.6 docs.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

Attachment

Re: Prepared statements and generic plans

From
Bruce Momjian
Date:
On Thu, Jun  2, 2016 at 09:56:48PM -0400, Bruce Momjian wrote:
> In Postgres 9.2 we improved the logic of when generic plans are used by
> EXECUTE.  We weren't sure how well it would work, and the docs included
> a vague description on when generic plans are chosen.
> 
> I have gotten a few questions lately about how prepared statements are
> handled with multiple executions so I have updated the PREPARE manual
> page with the attached patch to more clearly explain generic plans and
> when they are chosen.
> 
> I would like to apply this to the 9.6 docs.

FYI, I used this set of queries for testing:
DROP TABLE IF EXISTS test;CREATE TABLE test (x INT, y INT);INSERT INTO test SELECT 0, y FROM generate_series(1, 10000)
ASz(y);INSERT INTO test SELECT 1, y FROM generate_series(10001, 20000) AS z(y);-- INSERT INTO test SELECT 2, 20001;--
INSERTINTO test SELECT 3, 20002;CREATE INDEX i_test_x ON test(x);CREATE INDEX i_test_y ON test(y);ANALYZE test;PREPARE
prep_xAS SELECT * FROM test WHERE x = $1;PREPARE prep_y AS SELECT * FROM test WHERE y = $1;
 

Doing execute 5+ times on the two prepared statements with the constants
'0' and '2' show the documented behavior, e.g.:
EXPLAIN EXECUTE prep_x(2);EXPLAIN EXECUTE prep_y(2);

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Prepared statements and generic plans

From
"David G. Johnston"
Date:
On Thu, Jun 2, 2016 at 9:56 PM, Bruce Momjian <bruce@momjian.us> wrote:
In Postgres 9.2 we improved the logic of when generic plans are used by
EXECUTE.  We weren't sure how well it would work, and the docs included
a vague description on when generic plans are chosen.

I have gotten a few questions lately about how prepared statements are
handled with multiple executions so I have updated the PREPARE manual
page with the attached patch to more clearly explain generic plans and
when they are chosen.

I would like to apply this to the 9.6 docs.

​While putting the proposed patch in context I came across this.

​"""
Prepared statements have the largest performance advantage when a single session is being used to execute a large number of similar statements. The performance difference will be particularly significant if the statements are complex to plan or rewrite, for example, if the query involves a join of many tables or requires the application of several rules. If the statement is relatively simple to plan and rewrite but relatively expensive to execute, the performance advantage of prepared statements will be less noticeable.
​"""

Until and unless the generic plan is used the  "...if statements are complex to plan..." doesn't make sense; and no where in the description itself do we introduce the generic plan concept.  This is inconsistent but I'm not addressing it below though its worth considering before a patch to this area is committed.

As to the patch...

Isn't this backwards? <note> [change]

"""
otherwise it <switching to a generic plan> occurs only after five or more executions produce [execution /strike plans] plus planning costs that are, on average, [roughly equal to /strike cheaper] than the generic plan.
"""

I'd maybe go with something like this:

All executions of a prepared statement having zero parameters will use the same plan so the planning time taken during the first execution will be spread across all subsequent executions.  For statements having parameters the first five executions will result in value-specific plans as previously described.  However, on the sixth execution a generic plan will also be computed and if the average planning + execution cost of all previous value-specific plans is about equal to the execution cost of the generic plan the generic plan will be chosen for that and all subsequent executions.

<existing next paragraph>

If we are getting generic plans significantly cheaper than the value-specific plans I suspect there is a problem...so any comparison that indicates "less-than" is prone to cause confusion.  The original is worded well on this point: "...generic plan appears to be not much more expensive..." but lacks detail elsewhere.

This part:

!    A generic plan assumes each value supplied to <command>EXECUTE</command>
!    is one of the column's distinct values and that column values are
!    uniformly distributed.  For example, if statistics records three
!    distinct column values, a generic plan assumes a column equality
!    comparison will match 33% of processed rows.  Column statistics
!    also allows generic plans to accurately compute the selectivity of
!    unique columns.  Comparisons on non-uniformly-distributed columns and
!    specification of non-existent values affects the average plan cost,
!    and hence if and when a generic plan is chosen.  [elided the last sentence, placed in the first paragraph]

I'm not sure of the specific goal here but this level detail seems a bit out-of-place in the SQL Command documentation.  So, do we want this user-facing and if so do we want it here?

So while looking to find a home for this detail I came across the fact that Chapter 66 doesn't have a table of contents....quite possibly because it is presently a single section chapter...


It seems like this content would find a nice home there as well (though we should fix any TOC bug before adding the new section...)

IMO there are now three possibilities for the existing paragraph:

Status-quo
First portion of Bruce's patch
Mine, above

I think the second part of Bruce's should be, possibly with additional detail and maybe some concrete examples, added to Chapter 66 instead.  A pointer to that section in the SQL Command docs could be made - though I'm not sure how much pointing into internals we do from the user-facing areas of the docs.  In any case the material would be available to those who wish to see it.

The pertinent points, 5 value-specific executions and comparing the average of all previous executions to the generic plan generated on the 6th execution, both do seem worth upsetting the status-quo for.  Beyond that it is style - except I think Bruce got the comparison direction and magnitude wrong as noted above.

This leaves Bruce's second alteration: which probably should follow the rest over to chapter 66.  The point of the existing sentence is to give the casual user the means to detect the current type of plan and I think that is all that is needed here.

David J.


Re: Prepared statements and generic plans

From
Albe Laurenz
Date:
David G. Johnston wrote:
> On Thu, Jun 2, 2016 at 9:56 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> In Postgres 9.2 we improved the logic of when generic plans are used by
>> EXECUTE.  We weren't sure how well it would work, and the docs included
>> a vague description on when generic plans are chosen.
>> 
>> I have gotten a few questions lately about how prepared statements are
>> handled with multiple executions so I have updated the PREPARE manual
>> page with the attached patch to more clearly explain generic plans and
>> when they are chosen.
>> 
>> I would like to apply this to the 9.6 docs.

I think that this is a very good idea.

I had to dig through the sources before to answer such questions.

> ​While putting the proposed patch in context I came across this.
> 
> ​"""
> ​
> Prepared statements have the largest performance advantage when a single session is being used to
> execute a large number of similar statements. The performance difference will be particularly
> significant if the statements are complex to plan or rewrite, for example, if the query involves a
> join of many tables or requires the application of several rules. If the statement is relatively
> simple to plan and rewrite but relatively expensive to execute, the performance advantage of prepared
> statements will be less noticeable.
> 
> ​"""
> 
> Until and unless the generic plan is used the  "...if statements are complex to plan..." doesn't make
> sense; and no where in the description itself do we introduce the generic plan concept.  This is
> inconsistent but I'm not addressing it below though its worth considering before a patch to this area
> is committed.

I think that the paragraph is still true; it talks about "a large number of similar statements".
As long as "large" is sufficiently greater than five, that is.

> As to the patch...
> 
> Isn't this backwards? <note> [change]
> 
> """
> otherwise it <switching to a generic plan> occurs only after five or more executions produce
> [execution /strike plans] plus planning costs that are, on average, [roughly equal to /strike cheaper]
> than the generic plan.
> """

I agree.
I also fouund this sentence hard to read.

> I'd maybe go with something like this:
> 
> All executions of a prepared statement having zero parameters will use the same plan so the planning
> time taken during the first execution will be spread across all subsequent executions.  For statements
> having parameters the first five executions will result in value-specific plans as previously
> described.  However, on the sixth execution a generic plan will also be computed and if the average
> planning + execution cost of all previous value-specific plans is about equal to the execution cost of
> the generic plan the generic plan will be chosen for that and all subsequent executions.

I think that is much better, but I suggest this wording:

"All executions of a prepared statement having zero parameters use the same plan, so theywill use the generic plan
immediately. For statements having parameters the first five executionswill result in value-specific plans as
previouslydescribed.However, on the sixth execution a generic plan will also be computed, and if the average cost
estimateofall previous value-specific plans is about equal to the cost estimate of the generic plan,the generic plan
willbe chosen for that and all subsequent executions."
 

This emphasizes that it is only estimates we are dealing with, otherwise it would be hard
to understand why estimation errors can lead to generic plans being chosen that are much worse.

> <existing next paragraph>
> 
> If we are getting generic plans significantly cheaper than the value-specific plans I suspect there is
> a problem...so any comparison that indicates "less-than" is prone to cause confusion.  The original is
> worded well on this point: "...generic plan appears to be not much more expensive..." but lacks detail
> elsewhere.

I don't quite get that.  Do you mean the same thing that I wrote above?

> This part:
> 
> !    A generic plan assumes each value supplied to <command>EXECUTE</command>
> !    is one of the column's distinct values and that column values are
> !    uniformly distributed.  For example, if statistics records three
> !    distinct column values, a generic plan assumes a column equality
> !    comparison will match 33% of processed rows.  Column statistics
> !    also allows generic plans to accurately compute the selectivity of
> !    unique columns.  Comparisons on non-uniformly-distributed columns and
> !    specification of non-existent values affects the average plan cost,
> !    and hence if and when a generic plan is chosen.  [elided the last sentence, placed in the first
> paragraph]
> 
> I'm not sure of the specific goal here but this level detail seems a bit out-of-place in the SQL
> Command documentation.  So, do we want this user-facing and if so do we want it here?

[...]

> This leaves Bruce's second alteration: which probably should follow the rest over to chapter 66.  The
> point of the existing sentence is to give the casual user the means to detect the current type of plan
> and I think that is all that is needed here.

I agree that this is too much detail.
I would vote for omitting it altogether.

Anybody who needs that level of detail is better served with the source anyway.

Yours,
Laurenz Albe

Re: Prepared statements and generic plans

From
"David G. Johnston"
Date:
On Fri, Jun 3, 2016 at 3:17 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

> I'd maybe go with something like this:
>
> All executions of a prepared statement having zero parameters will use the same plan so the planning
> time taken during the first execution will be spread across all subsequent executions.  For statements
> having parameters the first five executions will result in value-specific plans as previously
> described.  However, on the sixth execution a generic plan will also be computed and if the average
> planning + execution cost of all previous value-specific plans is about equal to the execution cost of
> the generic plan the generic plan will be chosen for that and all subsequent executions.

I think that is much better, but I suggest this wording:

"All executions of a prepared statement having zero parameters use the same plan, so they
 will use the generic plan immediately.  For statements having parameters the first five executions
 will result in value-specific plans as previously described.
 However, on the sixth execution a generic plan will also be computed, and if the average cost estimate
 of all previous value-specific plans is about equal to the cost estimate of the generic plan,
 the generic plan will be chosen for that and all subsequent executions."

This emphasizes that it is only estimates we are dealing with, otherwise it would be hard
to understand why estimation errors can lead to generic plans being chosen that are much worse.


​You've dropped what I think is the essential point of comparing (planning + execution) cost of the per-value setup to just the execution costs of the generic plan.  You also make it sound like only the sixth plan is performs the comparison and if the value-specific plan is still preferred the generic plan will never again be considered.  Change "on" to "beginning with" and that goes away.  That, and adding back in the costs types, will make me happy.

 
> <existing next paragraph>
>
> If we are getting generic plans significantly cheaper than the value-specific plans I suspect there is
> a problem...so any comparison that indicates "less-than" is prone to cause confusion.  The original is
> worded well on this point: "...generic plan appears to be not much more expensive..." but lacks detail
> elsewhere.

I don't quite get that.  Do you mean the same thing that I wrote above?

​Yeah.  I was pointing out the Bruce's was trying to compare using a less-than which I think is prone to confusion if not outright wrong.
 

> This part:
>
> !    A generic plan assumes each value supplied to <command>EXECUTE</command>
> !    is one of the column's distinct values and that column values are
> !    uniformly distributed.  For example, if statistics records three
> !    distinct column values, a generic plan assumes a column equality
> !    comparison will match 33% of processed rows.  Column statistics
> !    also allows generic plans to accurately compute the selectivity of
> !    unique columns.  Comparisons on non-uniformly-distributed columns and
> !    specification of non-existent values affects the average plan cost,
> !    and hence if and when a generic plan is chosen.  [elided the last sentence, placed in the first
> paragraph]
>
> I'm not sure of the specific goal here but this level detail seems a bit out-of-place in the SQL
> Command documentation.  So, do we want this user-facing and if so do we want it here?

[...]

> This leaves Bruce's second alteration: which probably should follow the rest over to chapter 66.  The
> point of the existing sentence is to give the casual user the means to detect the current type of plan
> and I think that is all that is needed here.

I agree that this is too much detail.
I would vote for omitting it altogether.

Anybody who needs that level of detail is better served with the source anyway.

​This goes back to Bruce's motivation but as long as it goes into the internals section I have no problem adding material that someone felt was worth their time to write.

​David J.

Re: Prepared statements and generic plans

From
Bruce Momjian
Date:
On Fri, Jun  3, 2016 at 08:27:38AM -0400, David G. Johnston wrote:
> ​This goes back to Bruce's motivation but as long as it goes into the internals
> section I have no problem adding material that someone felt was worth their

OK, updated version attached.  I added "potential" to the first
paragraph, and added "estimated cost" to the later part, fixed the
"cheaper than", and clarified that we add the plan time cost to the
non-generic plan, which is how it can be cheaper than the generic plan.
I also moved the "Once a generic plan is chosen" line.

Yeah, that's a lot of changes, but they all improved the text.  Thanks.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

Attachment

Re: Prepared statements and generic plans

From
Albe Laurenz
Date:
Bruce Momjian wrote:
> OK, updated version attached.  I added "potential" to the first
> paragraph, and added "estimated cost" to the later part, fixed the
> "cheaper than", and clarified that we add the plan time cost to the
> non-generic plan, which is how it can be cheaper than the generic plan.
> I also moved the "Once a generic plan is chosen" line.
> 
> Yeah, that's a lot of changes, but they all improved the text.  Thanks.

Thanks for working on this.

!    Prepared statements can optionally use generic plans rather than
!    re-planning with each set of supplied <command>EXECUTE</command> values.
!    This occurs immediately for prepared statements with no parameters;
!    otherwise it occurs only after five or more executions produce estimated
!    plan costs, with planning overhead added, that are, on average, more
!    expensive than the generic plan cost.

The following might be easier to understand:
... after five or more executions produce plans whose estimated cost average
(including planning overhead) is more expensive than the generic plan cost estimate.

!    A generic plan assumes each value supplied to <command>EXECUTE</command>

... assumes *that* each value ...

!    is one of the column's distinct values and that column values are
!    uniformly distributed.  For example, if statistics records three

Shouldn't it be "record"?
The documentation treats "statistics" as a plural word throughout.

!    distinct column values, a generic plan assumes a column equality
!    comparison will match 33% of processed rows.  Column statistics

... assumes *that* a column equality comparison will match 33% of *the* processed rows.

!    also allows generic plans to accurately compute the selectivity of

Column statistics also *allow* ...

!    unique columns.  Comparisons on non-uniformly-distributed columns and
!    specification of non-existent values affects the average plan cost,
!    and hence if and when a generic plan is chosen.

(Disclaimer: I am not a native speaker.)
Other than that, a hearty +1.

Yours,
Laurenz Albe

Re: Prepared statements and generic plans

From
'Bruce Momjian *EXTERN*'
Date:
On Mon, Jun  6, 2016 at 07:19:37AM +0000, Albe Laurenz wrote:
> Bruce Momjian wrote:
> > OK, updated version attached.  I added "potential" to the first
> > paragraph, and added "estimated cost" to the later part, fixed the
> > "cheaper than", and clarified that we add the plan time cost to the
> > non-generic plan, which is how it can be cheaper than the generic plan.
> > I also moved the "Once a generic plan is chosen" line.
> >
> > Yeah, that's a lot of changes, but they all improved the text.  Thanks.
>
> Thanks for working on this.
>
> !    Prepared statements can optionally use generic plans rather than
> !    re-planning with each set of supplied <command>EXECUTE</command> values.
> !    This occurs immediately for prepared statements with no parameters;
> !    otherwise it occurs only after five or more executions produce estimated
> !    plan costs, with planning overhead added, that are, on average, more
> !    expensive than the generic plan cost.
>
> The following might be easier to understand:
> ... after five or more executions produce plans whose estimated cost average
> (including planning overhead) is more expensive than the generic plan cost estimate.

Agreed.

> !    A generic plan assumes each value supplied to <command>EXECUTE</command>
>
> ... assumes *that* each value ...

Agreed.

> !    is one of the column's distinct values and that column values are
> !    uniformly distributed.  For example, if statistics records three
>
> Shouldn't it be "record"?
> The documentation treats "statistics" as a plural word throughout.

Agreed, not sure how I missed that.

> !    distinct column values, a generic plan assumes a column equality
> !    comparison will match 33% of processed rows.  Column statistics
>
> ... assumes *that* a column equality comparison will match 33% of *the* processed rows.

Uh, that seems overly wordy.  I think the rule is that if the sentence
makes sense without the words, you should not use them, but it is
clearly a judgement call in this case.  Do you agree?

> !    also allows generic plans to accurately compute the selectivity of
>
> Column statistics also *allow* ...

Yep.

Updated patch attached.

One more thing --- there was talk of moving some of this into chapter
66, but as someone already mentioned, there are no subsections there
because it is a dedicated topic:

    66. How the Planner Uses Statistics.

I am not inclined to add a prepare-only section to that chapter.  On the
other hand, the issues described apply to PREPARE and to protocol-level
prepare, so having it in PREPARE also seems illogical.  However, I am
inclined to leave it in PREPARE until we are ready to move all of this
to chapter 66.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

Attachment

Re: Prepared statements and generic plans

From
"David G. Johnston"
Date:
On Mon, Jun 6, 2016 at 11:45 AM, Bruce Momjian *EXTERN* <bruce@momjian.us> wrote:

One more thing --- there was talk of moving some of this into chapter
66, but as someone already mentioned, there are no subsections there
because it is a dedicated topic:

        66. How the Planner Uses Statistics.

I am not inclined to add a prepare-only section to that chapter.  On the
other hand, the issues described apply to PREPARE and to protocol-level
prepare, so having it in PREPARE also seems illogical.  However, I am
inclined to leave it in PREPARE until we are ready to move all of this
to chapter 66.


​Actually, 66.1 exists ​- Row Estimation Examples

We should create 66.2 - e.g. Statistics of Parameterized Plans
My point was 66. How the Planner Uses Statistics doesn't have a table of contents.  I suspect it will if we add 66.2.  But, should it have one now, when the only section is 66.1?

The wording improvement are good; but my recommended split between the SQL command section and 66.2 still applies.

David J.

Re: Prepared statements and generic plans

From
Albe Laurenz
Date:
Bruce Momjian wrote:
>> !    distinct column values, a generic plan assumes a column equality
>> !    comparison will match 33% of processed rows.  Column statistics
>>
>> ... assumes *that* a column equality comparison will match 33% of *the* processed rows.
> 
> Uh, that seems overly wordy.  I think the rule is that if the sentence
> makes sense without the words, you should not use them, but it is
> clearly a judgement call in this case.  Do you agree?

My gut feeling is that at least the "the" should be retained, but mine
are the guts of a German speaker.
It is clearly a judgement call, so follow your instincts.

> Updated patch attached.
> 
> One more thing --- there was talk of moving some of this into chapter
> 66, but as someone already mentioned, there are no subsections there
> because it is a dedicated topic:
> 
>     66. How the Planner Uses Statistics.
> 
> I am not inclined to add a prepare-only section to that chapter.  On the
> other hand, the issues described apply to PREPARE and to protocol-level
> prepare, so having it in PREPARE also seems illogical.  However, I am
> inclined to leave it in PREPARE until we are ready to move all of this
> to chapter 66.

I think it would be ok to leave it where it is in your patch; while the
paragraph goes into technical detail, it is still alright in the general
documentation (but only just).

Yours,
Laurenz Albe

Re: Prepared statements and generic plans

From
"'bruce@momjian.us'"
Date:
On Tue, Jun  7, 2016 at 06:52:15AM +0000, Albe Laurenz wrote:
> Bruce Momjian wrote:
> >> !    distinct column values, a generic plan assumes a column equality
> >> !    comparison will match 33% of processed rows.  Column statistics
> >>
> >> ... assumes *that* a column equality comparison will match 33% of *the* processed rows.
> >
> > Uh, that seems overly wordy.  I think the rule is that if the sentence
> > makes sense without the words, you should not use them, but it is
> > clearly a judgement call in this case.  Do you agree?
>
> My gut feeling is that at least the "the" should be retained, but mine
> are the guts of a German speaker.
> It is clearly a judgement call, so follow your instincts.

I think "that/the" would make sense if this sentence was referencing a
specific result.  The sentence is referencing a hypothetical, so I don't
think "that/the" is needed.

> > One more thing --- there was talk of moving some of this into chapter
> > 66, but as someone already mentioned, there are no subsections there
> > because it is a dedicated topic:
> >
> >     66. How the Planner Uses Statistics.
> >
> > I am not inclined to add a prepare-only section to that chapter.  On the
> > other hand, the issues described apply to PREPARE and to protocol-level
> > prepare, so having it in PREPARE also seems illogical.  However, I am
> > inclined to leave it in PREPARE until we are ready to move all of this
> > to chapter 66.
>
> I think it would be ok to leave it where it is in your patch; while the
> paragraph goes into technical detail, it is still alright in the general
> documentation (but only just).

I researched moving some of this text into chapter 66, but found that
only some of it related to the optimizer.  I also realized that the text
applies to the libpq/wire protocol prepare cases too, so rather than
bounce readers to the PREPARE manual page, and then to chapter 66, I
just kept it all in PREPARE, with a reference from the wire protocol
prepare section.

Also, is it possible to do an EXPLAIN prepare() with the libpq/wire
protocol?  I can't do PREPARE EXPLAIN, but I can do EXPLAIN EXECUTE.
However, I don't see any way to inject EXPLAIN into the libpq/wire
prepare case.  Can you specify prepare(EXPLAIN SELECT)?  (PREPARE
EXPLAIN SELECT throws a syntax error.)

Looking at how the code behaves, it seems custom plans that are _more_
expensive (plus planning cost) than the generic plan switch to the
generic plan after five executions, as now documented.  Custom plans
that are significantly _cheaper_ than the generic plan _never_ use the
generic plan.

Here is an example --- first load this SQL:

    DROP TABLE IF EXISTS test;
    CREATE TABLE test (c1 INT, c2 INT);
    INSERT INTO test SELECT c1, abs(floor(random() * 4)-1) FROM generate_series(1, 10000) AS a(c1);
    INSERT INTO test SELECT c1, abs(floor(random() * 4)-1) FROM generate_series(10001, 15000) AS a(c1);
    INSERT INTO test SELECT c1, abs(floor(random() * 4)-1) FROM generate_series(15001, 20000) AS a(c1);
    -- add non-uniformly-distributed values to 'c2'
    INSERT INTO test SELECT 20001, 3;
    INSERT INTO test SELECT 20002, 4;
    CREATE INDEX i_test_c1 ON test (c1);
    CREATE INDEX i_test_c2 ON test (c2);
    ANALYZE test;
    PREPARE prep_c1 AS SELECT * FROM test WHERE c1 = $1;
    PREPARE prep_c2 AS SELECT * FROM test WHERE c2 = $1;

prep_c1 references 'c1', which is a unique column.  Any value used in
the EXECUTE, e.g. EXPLAIN EXECUTE prep_c1(1), existent or non-existent,
generates an index scan, and after five executions a generic index
scan is used.

For prep_c2, if you use the 50% common value '1', the first five
executions use a sequential scan, then the sixth is a generic Bitmap
Heap Scan. For the 25% value of '0' or '2',  the first five runs
generate a Bitmap Heap Scan, and a generic Bitmap Heap Scan on the sixth
and after.

For a prep_c2 value of 3 or any non-existent value, an Index Scan is
used, and a generic plan is never chosen, because the Index Scan is
significantly cheaper than the generic plan.

Updated patch attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

Attachment

Re: Prepared statements and generic plans

From
Albe Laurenz
Date:
Bruce Momjian wrote:
> Also, is it possible to do an EXPLAIN prepare() with the libpq/wire
> protocol?  I can't do PREPARE EXPLAIN, but I can do EXPLAIN EXECUTE.
> However, I don't see any way to inject EXPLAIN into the libpq/wire
> prepare case.  Can you specify prepare(EXPLAIN SELECT)?  (PREPARE
> EXPLAIN SELECT throws a syntax error.)

I am not sure what you mean:
EXPLAIN PREPARE to get EXPLAIN for PREPARE, or PREPARE ... FOR EXPLAIN
to get an EXPLAIN statement with parameters.
What should EXPLAIN PREPARE show that EXPLAIN SELECT wouldn't?
Why the need for EXPLAIN statements with parameters?

> Looking at how the code behaves, it seems custom plans that are _more_
> expensive (plus planning cost) than the generic plan switch to the
> generic plan after five executions, as now documented.  Custom plans
> that are significantly _cheaper_ than the generic plan _never_ use the
> generic plan.

Yes, that's what the suggested documentation improvement says as well,
right?

> Updated patch attached.

Upon re-read, one tiny question:

!    Prepared statements can optionally use generic plans rather than
!    re-planning with each set of supplied <command>EXECUTE</command> values.

Maybe the "optionally" should be omitted, since the user has no choice.

It is true that there is a cursor option CURSOR_OPT_CUSTOM_PLAN, but that
cannot be used on the SQL level.

Yours,
Laurenz Albe

Re: Prepared statements and generic plans

From
"'bruce@momjian.us'"
Date:
On Mon, Jun 13, 2016 at 01:26:04PM +0000, Albe Laurenz wrote:
> Bruce Momjian wrote:
> > Also, is it possible to do an EXPLAIN prepare() with the libpq/wire
> > protocol?  I can't do PREPARE EXPLAIN, but I can do EXPLAIN EXECUTE.
> > However, I don't see any way to inject EXPLAIN into the libpq/wire
> > prepare case.  Can you specify prepare(EXPLAIN SELECT)?  (PREPARE
> > EXPLAIN SELECT throws a syntax error.)
>
> I am not sure what you mean:
> EXPLAIN PREPARE to get EXPLAIN for PREPARE, or PREPARE ... FOR EXPLAIN
> to get an EXPLAIN statement with parameters.
> What should EXPLAIN PREPARE show that EXPLAIN SELECT wouldn't?
> Why the need for EXPLAIN statements with parameters?

Well, you can't use EXPLAIN with SQL PREPARE:

    test=> EXPLAIN PREPARE SELECT * FROM pg_class;
    ERROR:  syntax error at or near "PREPARE"
    LINE 1: EXPLAIN PREPARE SELECT * FROM pg_class;
                    ^
    test=> PREPARE EXPLAIN SELECT * FROM pg_class;
    ERROR:  syntax error at or near "SELECT"
    LINE 1: PREPARE EXPLAIN SELECT * FROM pg_class;
                        ^
You can only do EXPLAIN EXECUTE ..., which works fine, e.g.:

    EXPLAIN EXECUTE prep_c1(0);

However, for the wire protocol prepare/execute, how do you do EXPLAIN?
The only way I can see doing it is to put the EXPLAIN in the prepare
query, but I wasn't sure that works.  So, I just wrote and tested the
attached C program and it properly output the explain information, e.g.

    res = PQprepare(conn, "prep1", "EXPLAIN SELECT * FROM pg_language", 0, NULL);
                                    -------
generated:

    QUERY PLAN

    Seq Scan on pg_language  (cost=0.00..1.04 rows=4 width=114)

so that works --- good.

> > Looking at how the code behaves, it seems custom plans that are _more_
> > expensive (plus planning cost) than the generic plan switch to the
> > generic plan after five executions, as now documented.  Custom plans
> > that are significantly _cheaper_ than the generic plan _never_ use the
> > generic plan.
>
> Yes, that's what the suggested documentation improvement says as well,
> right?

Yes.  What is odd is that it isn't the plan of the actual supplied
parameters that is cheaper, just the generic plan that assumes each
distinct value in the query is equally likely to be used.  So, when we
say the generic plan is cheaper, it is just comparing the custom plan
with the supplied parameters vs. the generic plan --- it is not saying
that running the supplied constants with the generic plan will execute
faster, because in fact we might be using a sub-optimial generic plan.

For example, giving my test table that I posted earlier, if you ran the
most common constant (50% common) the first five time, the custom plan
would use a sequential scan.  On the sixth run of that same constant, a
bitmap scan generic plan would be used.  Now, that does have a lower
cost, but only for the _average_ distinct value, not for the 50%
constant that is being used.  A bitmap scan on a constant that would
normally use a sequential scan will take longer than even a sequential
scan, because if it didn't, the custom plan would have chosen the bitmap
scan.

I am not sure how we can improve things, but I wanted to clarify exactly
what is happening.

> > Updated patch attached.
>
> Upon re-read, one tiny question:
>
> !    Prepared statements can optionally use generic plans rather than
> !    re-planning with each set of supplied <command>EXECUTE</command> values.
>
> Maybe the "optionally" should be omitted, since the user has no choice.
>
> It is true that there is a cursor option CURSOR_OPT_CUSTOM_PLAN, but that
> cannot be used on the SQL level.

Right.  Updated patch attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

Attachment

Re: Prepared statements and generic plans

From
"David G. Johnston"
Date:
On Mon, Jun 13, 2016 at 3:40 PM, bruce@momjian.us <bruce@momjian.us> wrote:

> > Looking at how the code behaves, it seems custom plans that are _more_
> > expensive (plus planning cost) than the generic plan switch to the
> > generic plan after five executions, as now documented.  Custom plans
> > that are significantly _cheaper_ than the generic plan _never_ use the
> > generic plan.
>
> Yes, that's what the suggested documentation improvement says as well,
> right?

Yes.  What is odd is that it isn't the plan of the actual supplied
parameters that is cheaper, just the generic plan that assumes each
distinct value in the query is equally likely to be used.  So, when we
say the generic plan is cheaper, it is just comparing the custom plan
with the supplied parameters vs. the generic plan --- it is not saying
that running the supplied constants with the generic plan will execute
faster, because in fact we might be using a sub-optimial generic plan.

For example, giving my test table that I posted earlier, if you ran the
most common constant (50% common) the first five time, the custom plan
would use a sequential scan.  On the sixth run of that same constant, a
bitmap scan generic plan would be used.  Now, that does have a lower
cost, but only for the _average_ distinct value, not for the 50%
constant that is being used.  A bitmap scan on a constant that would
normally use a sequential scan will take longer than even a sequential
scan, because if it didn't, the custom plan would have chosen the bitmap
scan.

I am not sure how we can improve things, but I wanted to clarify exactly
what is happening.

​"""
Comparisons on non-uniformly-distributed
columns and specification of non-existent values affects the average
plan cost, and hence if and when a generic plan is chosen
​"""

If we are going to be more precise lets do so here as well.  I have, just reading this, no clue whether having non-uniformity and often searching for non-existent value will increase or decrease the average plan cost.

I'm still not certain how this is particularly useful.  If we are willing to draw a conclusion here in what circumstances would I, as an end-user, want to forgo using a prepared statement and instead dynamically construct an SQL statement?  Because at this point while this seems like good detail often times my choice of parameters is influenced by what I consider data external to the query proper and not any kind of inherent performance aspect.  I'd consider this advanced usage which doesn't neatly fit into the SQL Command section of the docs.

David J.

Re: Prepared statements and generic plans

From
"bruce@momjian.us"
Date:
On Mon, Jun 13, 2016 at 04:29:26PM -0400, David G. Johnston wrote:
> On Mon, Jun 13, 2016 at 3:40 PM, bruce@momjian.us <bruce@momjian.us> wrote:
>     I am not sure how we can improve things, but I wanted to clarify exactly
>     what is happening.
> 
> 
> ​"""
> Comparisons on non-uniformly-distributed
> columns and specification of non-existent values affects the average
> plan cost, and hence if and when a generic plan is chosen
> ​"""
> 
> If we are going to be more precise lets do so here as well.  I have, just
> reading this, no clue whether having non-uniformity and often searching for
> non-existent value will increase or decrease the average plan cost.

Well, we can't be more precise here as the average plan cost could go up
or down, depending on the constants used, becuase the values are not
uniformly-distributed.

> I'm still not certain how this is particularly useful.  If we are willing to
> draw a conclusion here in what circumstances would I, as an end-user, want to
> forgo using a prepared statement and instead dynamically construct an SQL
> statement?  Because at this point while this seems like good detail often times
> my choice of parameters is influenced by what I consider data external to the
> query proper and not any kind of inherent performance aspect.  I'd consider
> this advanced usage which doesn't neatly fit into the SQL Command section of
> the docs.

True, but we have lumped all the "prepared" information into that
section, and I don't see a more logical location for this tidbit of
information.

The big point is that the constants don't affect the generic plan, they
just choose if and when the generic plan is chosen.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Prepared statements and generic plans

From
Albe Laurenz
Date:
Bruce Momjian wrote:
> However, for the wire protocol prepare/execute, how do you do EXPLAIN?
> The only way I can see doing it is to put the EXPLAIN in the prepare
> query, but I wasn't sure that works.  So, I just wrote and tested the
> attached C program and it properly output the explain information, e.g.
> 
>     res = PQprepare(conn, "prep1", "EXPLAIN SELECT * FROM pg_language", 0, NULL);
>                                     -------
> generated:
> 
>     QUERY PLAN
> 
>     Seq Scan on pg_language  (cost=0.00..1.04 rows=4 width=114)
> 
> so that works --- good.

Hm, yes.

Were you just curious or is it relevant for the documentation update?

>>> Looking at how the code behaves, it seems custom plans that are _more_
>>> expensive (plus planning cost) than the generic plan switch to the
>>> generic plan after five executions, as now documented.  Custom plans
>>> that are significantly _cheaper_ than the generic plan _never_ use the
>>> generic plan.
>>
>> Yes, that's what the suggested documentation improvement says as well,
>> right?
> 
> Yes.  What is odd is that it isn't the plan of the actual supplied
> parameters that is cheaper, just the generic plan that assumes each
> distinct value in the query is equally likely to be used.  So, when we
> say the generic plan is cheaper, it is just comparing the custom plan
> with the supplied parameters vs. the generic plan --- it is not saying
> that running the supplied constants with the generic plan will execute
> faster, because in fact we might be using a sub-optimial generic plan.

Right, that's why it is important to document that it is estimates that are
compared, not actual costs.

This has caused confussion in the past, see
https://www.postgresql.org/message-id/flat/561E749D.4090301%40socialserve.com#561E749D.4090301@socialserve.com

> Right.  Updated patch attached.

I am happy with the patch as it is.

Yours,
Laurenz Albe

Re: Prepared statements and generic plans

From
"''bruce@momjian.us' *EXTERN*'"
Date:
On Tue, Jun 14, 2016 at 08:37:12AM +0000, Albe Laurenz wrote:
> Bruce Momjian wrote:
> > However, for the wire protocol prepare/execute, how do you do EXPLAIN?
> > The only way I can see doing it is to put the EXPLAIN in the prepare
> > query, but I wasn't sure that works.  So, I just wrote and tested the
> > attached C program and it properly output the explain information, e.g.
> > 
> >     res = PQprepare(conn, "prep1", "EXPLAIN SELECT * FROM pg_language", 0, NULL);
> >                                     -------
> > generated:
> > 
> >     QUERY PLAN
> > 
> >     Seq Scan on pg_language  (cost=0.00..1.04 rows=4 width=114)
> > 
> > so that works --- good.
> 
> Hm, yes.
> 
> Were you just curious or is it relevant for the documentation update?

I was curious because if there was no way to do it, I should document
that.

> >>> Looking at how the code behaves, it seems custom plans that are _more_
> >>> expensive (plus planning cost) than the generic plan switch to the
> >>> generic plan after five executions, as now documented.  Custom plans
> >>> that are significantly _cheaper_ than the generic plan _never_ use the
> >>> generic plan.
> >>
> >> Yes, that's what the suggested documentation improvement says as well,
> >> right?
> > 
> > Yes.  What is odd is that it isn't the plan of the actual supplied
> > parameters that is cheaper, just the generic plan that assumes each
> > distinct value in the query is equally likely to be used.  So, when we
> > say the generic plan is cheaper, it is just comparing the custom plan
> > with the supplied parameters vs. the generic plan --- it is not saying
> > that running the supplied constants with the generic plan will execute
> > faster, because in fact we might be using a sub-optimial generic plan.
> 
> Right, that's why it is important to document that it is estimates that are
> compared, not actual costs.
> 
> This has caused confussion in the past, see
> https://www.postgresql.org/message-id/flat/561E749D.4090301%40socialserve.com#561E749D.4090301@socialserve.com
> 
> > Right.  Updated patch attached.
> 
> I am happy with the patch as it is.

Good.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Prepared statements and generic plans

From
Bruce Momjian
Date:
On Tue, Jun 14, 2016 at 08:20:12AM -0400, ''bruce@momjian.us' *EXTERN*' wrote:
> > This has caused confussion in the past, see
> > https://www.postgresql.org/message-id/flat/561E749D.4090301%40socialserve.com#561E749D.4090301@socialserve.com
> > 
> > > Right.  Updated patch attached.
> > 
> > I am happy with the patch as it is.
> 
> Good.

Patch applied.  Thanks for the assistance.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +