Thread: [HACKERS] Cached plans and statement generalization

[HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:
Hi hackers,

There were a lot of discussions about query plan caching in hackers 
mailing list, but I failed to find some clear answer for my question and 
the current consensus on this question in Postgres community. As far as 
I understand current state is the following:
1. We have per-connection prepared statements.
2. Queries executed inside plpgsql code are implicitly prepared.

It is not always possible or convenient to use prepared statements.
For example, if pgbouncer is used to perform connection pooling.
Another use case (which is actually the problem I am trying to solve 
now) is partitioning.
Efficient execution of query to partitioned table requires hardcoded 
value for partitioning key.
Only in this case optimizer will be able to construct efficient query 
plan which access only affected tables (partitions).

My small benchmark for distributed partitioned table based on pg_pathman 
+ postgres_fdw shows 3 times degrade of performance in case of using 
prepared statements.
But without prepared statements substantial amount of time is spent in 
query compilation and planning. I was be able to speed up benchmark more 
than two time by
sending prepared queries directly to the remote nodes.

So what I am thinking now is implicit query caching. If the same query 
with different literal values is repeated many times, then we can try to 
generalize this query and replace it with prepared query with 
parameters. I am not considering now shared query cache: is seems to be 
much harder to implement. But local caching of generalized queries seems 
to be not so difficult to implement and requires not so much changes in 
Postgres code. And it can be useful not only for sharding, but for many 
other cases where prepared statements can not be used.

I wonder if such option was already considered and if it was for some 
reasons rejected: can you point me at this reasons?

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Cached plans and statement generalization

From
Alexander Korotkov
Date:
Hi, Konstantin!

On Mon, Apr 24, 2017 at 11:46 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
There were a lot of discussions about query plan caching in hackers mailing list, but I failed to find some clear answer for my question and the current consensus on this question in Postgres community. As far as I understand current state is the following:
1. We have per-connection prepared statements.
2. Queries executed inside plpgsql code are implicitly prepared.

It is not always possible or convenient to use prepared statements.
For example, if pgbouncer is used to perform connection pooling.
Another use case (which is actually the problem I am trying to solve now) is partitioning.
Efficient execution of query to partitioned table requires hardcoded value for partitioning key.
Only in this case optimizer will be able to construct efficient query plan which access only affected tables (partitions).

My small benchmark for distributed partitioned table based on pg_pathman + postgres_fdw shows 3 times degrade of performance in case of using prepared statements.
But without prepared statements substantial amount of time is spent in query compilation and planning. I was be able to speed up benchmark more than two time by
sending prepared queries directly to the remote nodes.

I don't think it's correct to ask PostgreSQL hackers about problem which arises with pg_pathman while pg_pathman is an extension supported by Postgres Pro.
Since we have declarative partitioning committed to 10, I think that community should address this issue in the context of declarative partitioning.
However, it's unlikely we can spot this issue with declarative partitioning because it still uses very inefficient constraint exclusion mechanism.  Thus, issues you are writing about would become visible on declarative partitioning only when constraint exclusion would be replaced with something more efficient.

Long story short, could you reproduce this issue without pg_pathman?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:


On 24.04.2017 13:24, Alexander Korotkov wrote:
Hi, Konstantin!

On Mon, Apr 24, 2017 at 11:46 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
There were a lot of discussions about query plan caching in hackers mailing list, but I failed to find some clear answer for my question and the current consensus on this question in Postgres community. As far as I understand current state is the following:
1. We have per-connection prepared statements.
2. Queries executed inside plpgsql code are implicitly prepared.

It is not always possible or convenient to use prepared statements.
For example, if pgbouncer is used to perform connection pooling.
Another use case (which is actually the problem I am trying to solve now) is partitioning.
Efficient execution of query to partitioned table requires hardcoded value for partitioning key.
Only in this case optimizer will be able to construct efficient query plan which access only affected tables (partitions).

My small benchmark for distributed partitioned table based on pg_pathman + postgres_fdw shows 3 times degrade of performance in case of using prepared statements.
But without prepared statements substantial amount of time is spent in query compilation and planning. I was be able to speed up benchmark more than two time by
sending prepared queries directly to the remote nodes.

I don't think it's correct to ask PostgreSQL hackers about problem which arises with pg_pathman while pg_pathman is an extension supported by Postgres Pro.
Since we have declarative partitioning committed to 10, I think that community should address this issue in the context of declarative partitioning.
However, it's unlikely we can spot this issue with declarative partitioning because it still uses very inefficient constraint exclusion mechanism.  Thus, issues you are writing about would become visible on declarative partitioning only when constraint exclusion would be replaced with something more efficient.

Long story short, could you reproduce this issue without pg_pathman?

 
Sorry, I have mentioned pg_pathman just as example. 
The same problems takes place with partitioning based on standard Postgres inheritance mechanism (when I manually create derived tables and specify constraints for them).
I didn't test yet declarative partitioning committed to 10, but I expect the that it will also suffer from this problem (because is based on inheritance).
But as I wrote, I think that the problem with plan caching is wider and is not bounded just to partitioning.
 


------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] Cached plans and statement generalization

From
Andres Freund
Date:
Hi,

On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
> So what I am thinking now is implicit query caching. If the same query with
> different literal values is repeated many times, then we can try to
> generalize this query and replace it with prepared query with
> parameters.

That's not actuall all that easy:
- You pretty much do parse analysis to be able to do an accurate match. How much overhead is parse analysis vs.
planningin your cases?
 
- The invalidation infrastructure for this, if not tied to to fully parse-analyzed statements, is going to be hell.
- Migrating to parameters can actually cause significant slowdowns, not nice if that happens implicitly.

Greetings,

Andres Freund



Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:
On 24.04.2017 21:43, Andres Freund wrote:
Hi,

On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
So what I am thinking now is implicit query caching. If the same query with
different literal values is repeated many times, then we can try to
generalize this query and replace it with prepared query with
parameters.
That's not actuall all that easy:
- You pretty much do parse analysis to be able to do an accurate match. How much overhead is parse analysis vs. planning in your cases?
- The invalidation infrastructure for this, if not tied to to fully parse-analyzed statements, is going to be hell.
- Migrating to parameters can actually cause significant slowdowns, not nice if that happens implicitly.

Well, first of all I want to share results I already get: pgbench with default parameters,  scale 10 and one connection:

protocol
TPS
simple
3492
extended
2927
prepared
6865
simple + autoprepare
6844

So autoprepare is as efficient as explicit prepare and can increase performance almost two times.

My current implementation is replacing with parameters only string literals in the query, i.e. select * from T where x='123'; -> select * from T where x=$1;
It greatly simplifies matching of parameters - it is just necessary to locate '\'' character and then correctly handle pairs  of quotes.
Handling of integer and real literals is really challenged task.
One source of problems is negation: it is not so easy to correctly understand whether minus should be treated as part of literal or as operator:
(-1), (1-1), (1-1)-1
Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1".

Fully correct substitution can be done by first performing parsing the query, then transform parse tree, replacing literal nodes with parameter nodes and finally deparse tree into generalized query. postgres_fdw already contains such deparse code. It can be moved to postgres core and reused for autoprepare (and may be somewhere else).
But in this case overhead will be much higher.
I still think that query parsing time is significantly smaller than time needed for building and optimizing query execution plan.
But it should be measured if community will be interested in such approach.

There is obvious question: how I managed to get this pgbench results if currently only substitution of string literals is supported and queries constructed by pgbench don't contain string literals? I just made small patch in pgbench replaceVariable method wrapping value's representation in quotes. It has almost no impact on performance (3482 TPS  vs. 3492 TPS),
but allows autoprepare to deal with pgbench queries.

I attached my patch to this mail. It is just first version of the patch (based on REL9_6_STABLE branch) just to illustrate proposed approach.
I will be glad to receive any comments and if such optimization is considered to be useful, I will continue work on this patch.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Serge Rielau
Date:

On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1”.
You will also need to deal with modifiers in types such as VARCHAR(10). Not sure if there are specific functions which can only deal with literals (?) as well.

Doug Doole did this work in DB2 LUW and he may be able to point to more places to watch out for semantically.

Generally, in my experience, this feature is very valuable when dealing with (poorly designed) web apps that just glue together strings.
Protecting it under a GUC would allow to only do the work if it’s deemed likely to help.
Another rule I find useful is to abort any efforts to substitute literals if any bind variable is found in the query.
That can be used as a cue that the author of the SQL left the remaining literals in on purpose.

A follow up feature would be to formalize different flavors of peeking. 
I.e. can you produce a generic plan, but still recruit the initial set of bind values/substituted literals to dos costing?

Cheers
Serge Rielau

PS: FWIW, I like this feature.

Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:


On 25.04.2017 19:12, Serge Rielau wrote:

On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1”.
You will also need to deal with modifiers in types such as VARCHAR(10). Not sure if there are specific functions which can only deal with literals (?) as well.

Sorry, I do not completely understand how presence of type modifiers can affect string literals used in query.
Can you provide me some example?


Doug Doole did this work in DB2 LUW and he may be able to point to more places to watch out for semantically.

Generally, in my experience, this feature is very valuable when dealing with (poorly designed) web apps that just glue together strings.

I do not think that this optimization will be useful only for poorly designed application.
I already pointed on two use cases where prepapred statements can not be used:
1. pgbouncer without session-level pooling.
2. partitioning

Protecting it under a GUC would allow to only do the work if it’s deemed likely to help.
Another rule I find useful is to abort any efforts to substitute literals if any bind variable is found in the query.
That can be used as a cue that the author of the SQL left the remaining literals in on purpose.

A follow up feature would be to formalize different flavors of peeking. 
I.e. can you produce a generic plan, but still recruit the initial set of bind values/substituted literals to dos costing?
Here situation is the same as for explicitly prepared statements, isn't it?
Sometimes it is preferrable to use specialized plan rather than generic plan.
I am not sure if postgres now is able to do it.



Cheers
Serge Rielau

PS: FWIW, I like this feature.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] Cached plans and statement generalization

From
David Fetter
Date:
On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote:
> On 24.04.2017 21:43, Andres Freund wrote:
> > Hi,
> > 
> > On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
> > > So what I am thinking now is implicit query caching. If the same query with
> > > different literal values is repeated many times, then we can try to
> > > generalize this query and replace it with prepared query with
> > > parameters.
> > That's not actuall all that easy:
> > - You pretty much do parse analysis to be able to do an accurate match.
> >    How much overhead is parse analysis vs. planning in your cases?
> > - The invalidation infrastructure for this, if not tied to to fully
> >    parse-analyzed statements, is going to be hell.
> > - Migrating to parameters can actually cause significant slowdowns, not
> >    nice if that happens implicitly.
> 
> Well, first of all I want to share results I already get: pgbench with
> default parameters,  scale 10 and one connection:
> 
> protocol
>     TPS
> simple
>     3492
> extended
>     2927
> prepared
>     6865
> simple + autoprepare
>     6844

If this is string mashing on the unparsed query, as it appears to be,
it's going to be a perennial source of security issues.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: [HACKERS] Cached plans and statement generalization

From
Serge Rielau
Date:

On Tue, Apr 25, 2017 at 9:45 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
On 25.04.2017 19:12, Serge Rielau wrote:

On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1”.
You will also need to deal with modifiers in types such as VARCHAR(10). Not sure if there are specific functions which can only deal with literals (?) as well.
Sorry, I do not completely understand how presence of type modifiers can affect string literals used in query.
Can you provide me some example?
SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;

You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.

Also some OLAP syntax like “rows preceding”

It pretty much boils down to whether you can do some shallow parsing rather than expending the effort to build the parse tree.

Cheers
Serge

Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:
On 04/25/2017 07:54 PM, David Fetter wrote:
> On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote:
>> On 24.04.2017 21:43, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
>>>> So what I am thinking now is implicit query caching. If the same query with
>>>> different literal values is repeated many times, then we can try to
>>>> generalize this query and replace it with prepared query with
>>>> parameters.
>>> That's not actuall all that easy:
>>> - You pretty much do parse analysis to be able to do an accurate match.
>>>     How much overhead is parse analysis vs. planning in your cases?
>>> - The invalidation infrastructure for this, if not tied to to fully
>>>     parse-analyzed statements, is going to be hell.
>>> - Migrating to parameters can actually cause significant slowdowns, not
>>>     nice if that happens implicitly.
>> Well, first of all I want to share results I already get: pgbench with
>> default parameters,  scale 10 and one connection:
>>
>> protocol
>>     TPS
>> simple
>>     3492
>> extended
>>     2927
>> prepared
>>     6865
>> simple + autoprepare
>>     6844
> If this is string mashing on the unparsed query, as it appears to be,
> it's going to be a perennial source of security issues.

Sorry, may be I missed something, but I can not understand how security can be violated by extracting string literals
fromquery. I am just copying bytes from one buffer to another. I do not try to somehow interpret this parameters.  What
Iam doing is 
 
very similar with standard prepared statements.
And moreover query is parsed! Only query which was already parsed and executed (but with different values of
parameters)can be autoprepared.
 


>
> Best,
> David.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:
On 04/25/2017 08:09 PM, Serge Rielau wrote:

On Tue, Apr 25, 2017 at 9:45 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
On 25.04.2017 19:12, Serge Rielau wrote:

On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1”.
You will also need to deal with modifiers in types such as VARCHAR(10). Not sure if there are specific functions which can only deal with literals (?) as well.
Sorry, I do not completely understand how presence of type modifiers can affect string literals used in query.
Can you provide me some example?
SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;

You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.

I am substituting only string literals. So the query above will be transformed to

SELECT $1::CHAR(10) || $2, 5 + 6;

What's wrong with it?


Also some OLAP syntax like “rows preceding”

It pretty much boils down to whether you can do some shallow parsing rather than expending the effort to build the parse tree.

Cheers
Serge


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Cached plans and statement generalization

From
Serge Rielau
Date:

On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;

You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.

I am substituting only string literals. So the query above will be transformed to

SELECT $1::CHAR(10) || $2, 5 + 6;

What's wrong with it?

Oh, well that leaves a lot of opportunities on the table, doesn’t it?

Cheers
Serge

Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:
On 04/25/2017 11:40 PM, Serge Rielau wrote:

On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;

You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.

I am substituting only string literals. So the query above will be transformed to

SELECT $1::CHAR(10) || $2, 5 + 6;

What's wrong with it?

Oh, well that leaves a lot of opportunities on the table, doesn’t it?

Well, actually my primary intention was not to make badly designed programs (not using prepared statements) work faster.
I wanted to address cases when it is not possible to use prepared statements.
If we want to substitute with parameters as much literals as possible, then parse+deparse tree seems to be the only reasonable approach.
I will try to implement it also, just to estimate parsing overhead.




Cheers
Serge



-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Cached plans and statement generalization

From
Doug Doole
Date:
When I did this in DB2, I didn't use the parser - it was too expensive. I just tokenized the statement and used some simple rules to bypass the invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd disallow replacement replacement until I hit the end of the current subquery or statement.

There are a few limitations to this approach. For example, DB2 allowed you to cast using function notation like VARCHAR(foo, 10). This meant I would never replace the second parameter of any VARCHAR function. Now it's possible that when the statement was fully compiled we'd find that VARCHAR(foo,10) actually resolved to BOB.VARCHAR() instead of the built-in cast function. Our thinking that these cases were rare enough that we wouldn't worry about them. (Of course, PostgreSQL's ::VARCHAR(10) syntax avoids this problem completely.)

Because SQL is so structured, the implementation ended up being quite simple (a few hundred line of code) with no significant maintenance issues. (Other developers had no problem adding in new cases where constants had to be preserved.)

The simple tokenizer was also fairly extensible. I'd prototyped using the same code to also normalize statements (uppercase all keywords, collapse whitespace to a single blank, etc.) but that feature was never added to the product.

- Doug

On Tue, Apr 25, 2017 at 1:47 PM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
On 04/25/2017 11:40 PM, Serge Rielau wrote:

On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;

You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.

I am substituting only string literals. So the query above will be transformed to

SELECT $1::CHAR(10) || $2, 5 + 6;

What's wrong with it?

Oh, well that leaves a lot of opportunities on the table, doesn’t it?

Well, actually my primary intention was not to make badly designed programs (not using prepared statements) work faster.
I wanted to address cases when it is not possible to use prepared statements.
If we want to substitute with parameters as much literals as possible, then parse+deparse tree seems to be the only reasonable approach.
I will try to implement it also, just to estimate parsing overhead.




Cheers
Serge



-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Cached plans and statement generalization

From
Andres Freund
Date:
On 2017-04-25 21:11:08 +0000, Doug Doole wrote:
> When I did this in DB2, I didn't use the parser - it was too expensive. I
> just tokenized the statement and used some simple rules to bypass the
> invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
> disallow replacement replacement until I hit the end of the current
> subquery or statement.

How did you manage plan invalidation and such?

- Andres



Re: [HACKERS] Cached plans and statement generalization

From
Doug Doole
Date:
Plan invalidation was no different than for any SQL statement. DB2 keeps a list of the objects the statement depends on. If any of the objects changes in an incompatible way the plan is invalidated and kicked out of the cache.

I suspect what is more interesting is plan lookup. DB2 has something called the "compilation environment". This is a collection of everything that impacts how a statement is compiled (SQL path, optimization level, etc.). Plan lookup is done using both the statement text and the compilation environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your path is ANDRES, MYTEAM, SYSIBM we will have different compilation environments. If we both issue "SELECT * FROM T" we'll end up with different cache entries even if T in both of our statements resolves to MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then execute "SELECT * FROM T" again, I have a new compilation environment so the second invocation of the statement will create a new entry in the cache. The first entry is not kicked out - it will still be there for re-use if I change my SQL path back to my original value (modulo LRU for cache memory management of course).

With literal replacement, the cache entry is on the modified statement text. Given the modified statement text and the compilation environment, you're guaranteed to get the right plan entry.

On Tue, Apr 25, 2017 at 2:47 PM Andres Freund <andres@anarazel.de> wrote:
On 2017-04-25 21:11:08 +0000, Doug Doole wrote:
> When I did this in DB2, I didn't use the parser - it was too expensive. I
> just tokenized the statement and used some simple rules to bypass the
> invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
> disallow replacement replacement until I hit the end of the current
> subquery or statement.

How did you manage plan invalidation and such?

- Andres

Re: [HACKERS] Cached plans and statement generalization

From
David Fetter
Date:
On Tue, Apr 25, 2017 at 11:35:21PM +0300, Konstantin Knizhnik wrote:
> On 04/25/2017 07:54 PM, David Fetter wrote:
> > On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote:
> > > On 24.04.2017 21:43, Andres Freund wrote:
> > > > Hi,
> > > > 
> > > > On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
> > > > > So what I am thinking now is implicit query caching. If the same query with
> > > > > different literal values is repeated many times, then we can try to
> > > > > generalize this query and replace it with prepared query with
> > > > > parameters.
> > > > That's not actuall all that easy:
> > > > - You pretty much do parse analysis to be able to do an accurate match.
> > > >     How much overhead is parse analysis vs. planning in your cases?
> > > > - The invalidation infrastructure for this, if not tied to to fully
> > > >     parse-analyzed statements, is going to be hell.
> > > > - Migrating to parameters can actually cause significant slowdowns, not
> > > >     nice if that happens implicitly.
> > > Well, first of all I want to share results I already get: pgbench with
> > > default parameters,  scale 10 and one connection:
> > > 
> > > protocol
> > >     TPS
> > > simple
> > >     3492
> > > extended
> > >     2927
> > > prepared
> > >     6865
> > > simple + autoprepare
> > >     6844
> > If this is string mashing on the unparsed query, as it appears to be,
> > it's going to be a perennial source of security issues.
> 
> Sorry, may be I missed something, but I can not understand how
> security can be violated by extracting string literals from query. I
> am just copying bytes from one buffer to another. I do not try to
> somehow interpret this parameters.  What I am doing is very similar
> with standard prepared statements.  And moreover query is parsed!
> Only query which was already parsed and executed (but with different
> values of parameters) can be autoprepared.

I don't have an exploit yet.  What concerns me is attackers' access to
what is in essence the ability to poke at RULEs when they only have
privileges to read.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: [HACKERS] Cached plans and statement generalization

From
Andres Freund
Date:
Hi,

(FWIW, on this list we don't do top-quotes)

On 2017-04-25 22:21:22 +0000, Doug Doole wrote:
> Plan invalidation was no different than for any SQL statement. DB2 keeps a
> list of the objects the statement depends on. If any of the objects changes
> in an incompatible way the plan is invalidated and kicked out of the cache.
> 
> I suspect what is more interesting is plan lookup. DB2 has something called
> the "compilation environment". This is a collection of everything that
> impacts how a statement is compiled (SQL path, optimization level, etc.).
> Plan lookup is done using both the statement text and the compilation
> environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your
> path is ANDRES, MYTEAM, SYSIBM we will have different compilation
> environments. If we both issue "SELECT * FROM T" we'll end up with
> different cache entries even if T in both of our statements resolves to
> MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then
> execute "SELECT * FROM T" again, I have a new compilation environment so
> the second invocation of the statement will create a new entry in the
> cache. The first entry is not kicked out - it will still be there for
> re-use if I change my SQL path back to my original value (modulo LRU for
> cache memory management of course).

It's not always that simple, at least in postgres, unless you disregard
search_path.  Consider e.g. cases like

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!

it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?

Greetings,

Andres Freund



Re: [HACKERS] Cached plans and statement generalization

From
"David G. Johnston"
Date:
On Tue, Apr 25, 2017 at 3:24 PM, David Fetter <david@fetter.org> wrote:
I don't have an exploit yet.  What concerns me is attackers' access to
what is in essence the ability to poke at RULEs when they only have
privileges to read.

​If they want to see how it works they can read the source code.  In terms of runtime data it would limited to whatever the session itself created.  In most cases the presence of the cache would be invisible.  I suppose it might appear if one were to explain a query, reset the session, explain another query and then re-explain the original.  If the chosen plan in the second pass differed because of the presence of the leading query it would be noticeable but not revealing.  Albeit I'm a far cry from a security expert...

David J.

Re: [HACKERS] Cached plans and statement generalization

From
Doug Doole
Date:
(FWIW, on this list we don't do top-quotes)

I know. Forgot and just did "reply all". My bad.

It's not always that simple, at least in postgres, unless you disregard
search_path.  Consider e.g. cases like

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!

it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?

DB2 does handle this case. Unfortunately I don't know the details of how it worked though.

A naive option would be to invalidate anything that depends on table or view *.FOOBAR. You could probably make it a bit smarter by also requiring that schema A appear in the path.

- Doug

Re: [HACKERS] Cached plans and statement generalization

From
"Finnerty, Jim"
Date:
On 4/25/17, 6:34 PM, "pgsql-hackers-owner@postgresql.org on behalf of Andres Freund"
<pgsql-hackers-owner@postgresql.orgon behalf of andres@anarazel.de> wrote:
 

It's not always that simple, at least in postgres, unless you disregard   search_path.  Consider e.g. cases like
CREATESCHEMA a;   CREATE SCHEMA b;   CREATE TABLE a.foobar(somecol int);   SET search_patch = 'b,a';   SELECT * FROM
foobar;  CREATE TABLE b.foobar(anothercol int);   SELECT * FROM foobar; -- may not be cached plan from before!      it
sounds- my memory of DB2 is very faint, and I never used it much -   like similar issues could arise in DB2 too?
Greetings,     Andres Freund         --    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)   To make
changesto your subscription:   http://www.postgresql.org/mailpref/pgsql-hackers   
 

You may need to store the reloid of each relation in the cached query to ensure that the schema bindings are the same,
andalso invalidate dependent cache entries if a referenced relation is dropped. 
 

Regards,
   Jim Finnerty


Re: [HACKERS] Cached plans and statement generalization

From
Serge Rielau
Date:



On Tue, Apr 25, 2017 at 3:48 PM, Doug Doole <ddoole@salesforce.com> wrote:
It's not always that simple, at least in postgres, unless you disregard
search_path. Consider e.g. cases like

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!

it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?

DB2 does handle this case. Unfortunately I don't know the details of how it worked though.

A naive option would be to invalidate anything that depends on table or view *.FOOBAR. You could probably make it a bit smarter by also requiring that schema A appear in the path.
While this specific scenario does not arise in DB2 since it uses CURRENT SCHEMA only for tables (much to my dislike) your examples holds for functions and types which are resolved by path.
For encapsulated SQL (in views, functions) conservative semantics are enforced via including the timestamp. For dynamic SQL the problem you describe does exist though and I think it is handled in the way Doug describes.
    
However, as noted by Doug the topic of plan invalidation is really orthogonal to normalizing the queries. All it does is provide more opportunities to run into any pre-existing bugs.

Cheers
Serge

PS: I’m just starting to look at the plan invalidation code in PG because we are dealing with potentially 10s of thousands of cached SQL statements. So these complete wipe outs or walks of every plan in the cache don’t scale.

Re: [HACKERS] Cached plans and statement generalization

From
"Tsunakawa, Takayuki"
Date:
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Konstantin
> Knizhnik
> Well, first of all I want to share results I already get: pgbench with default
> parameters,  scale 10 and one connection:
> 
> So autoprepare is as efficient as explicit prepare and can increase
> performance almost two times.

This sounds great.

BTW, when are the autoprepared statements destroyed?  Are you considering some upper limit on the number of prepared
statements?

Regards
Takayuki Tsunakawa





Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 26.04.2017 00:47, Andres Freund wrote:
> On 2017-04-25 21:11:08 +0000, Doug Doole wrote:
>> When I did this in DB2, I didn't use the parser - it was too expensive. I
>> just tokenized the statement and used some simple rules to bypass the
>> invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
>> disallow replacement replacement until I hit the end of the current
>> subquery or statement.
> How did you manage plan invalidation and such?

The same mechanism as for prepared statements.
Cached plans are linked in the list by SaveCachedPlan function and are 
invalidated by PlanCacheRelCallback.



> - Andres
>
>

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 26.04.2017 01:34, Andres Freund wrote:
> Hi,
>
> (FWIW, on this list we don't do top-quotes)
>
> On 2017-04-25 22:21:22 +0000, Doug Doole wrote:
>> Plan invalidation was no different than for any SQL statement. DB2 keeps a
>> list of the objects the statement depends on. If any of the objects changes
>> in an incompatible way the plan is invalidated and kicked out of the cache.
>>
>> I suspect what is more interesting is plan lookup. DB2 has something called
>> the "compilation environment". This is a collection of everything that
>> impacts how a statement is compiled (SQL path, optimization level, etc.).
>> Plan lookup is done using both the statement text and the compilation
>> environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your
>> path is ANDRES, MYTEAM, SYSIBM we will have different compilation
>> environments. If we both issue "SELECT * FROM T" we'll end up with
>> different cache entries even if T in both of our statements resolves to
>> MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then
>> execute "SELECT * FROM T" again, I have a new compilation environment so
>> the second invocation of the statement will create a new entry in the
>> cache. The first entry is not kicked out - it will still be there for
>> re-use if I change my SQL path back to my original value (modulo LRU for
>> cache memory management of course).
> It's not always that simple, at least in postgres, unless you disregard
> search_path.  Consider e.g. cases like
>
> CREATE SCHEMA a;
> CREATE SCHEMA b;
> CREATE TABLE a.foobar(somecol int);
> SET search_patch = 'b,a';
> SELECT * FROM foobar;
> CREATE TABLE b.foobar(anothercol int);
> SELECT * FROM foobar; -- may not be cached plan from before!
>
> it sounds - my memory of DB2 is very faint, and I never used it much -
> like similar issues could arise in DB2 too?

There is the same problem with explicitly prepared statements, isn't it?
Certainly in case of using prepared statements it is responsibility of 
programmer to avoid such collisions.
And in case of autoprepare programmer it is hidden from programming.
But there is guc variable controlling autoprepare feature and by default 
it is switched off.
So if programmer or DBA enables it, then them should take in account 
effects of such decision.

By the way, isn't it a bug in PostgreSQL that altering search path is 
not invalidating cached plans?
As I already mentioned, the same problem can be reproduced with 
explicitly prepared statements.



>
> Greetings,
>
> Andres Freund

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 26.04.2017 04:00, Tsunakawa, Takayuki wrote:
> From: pgsql-hackers-owner@postgresql.org
>> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Konstantin
>> Knizhnik
>> Well, first of all I want to share results I already get: pgbench with default
>> parameters,  scale 10 and one connection:
>>
>> So autoprepare is as efficient as explicit prepare and can increase
>> performance almost two times.
> This sounds great.
>
> BTW, when are the autoprepared statements destroyed?
Right now them are destroyed only in case of receiving invalidation  
message (when catalog is changed).
Prepared statements are local to backend and are located in backend's  
memory.
It is unlikely, that there will be too much different queries which  
cause memory overflow.
But in theory such situation is certainly possible.


>   Are you considering some upper limit on the number of prepared statements?
In this case we need some kind of LRU for maintaining cache of  
autoprepared statements.
I think that it is good idea to have such limited cached - it can avoid  
memory overflow problem.
I will try to implement it.


> Regards
> Takayuki Tsunakawa
>
>

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:


On 26.04.2017 10:49, Konstantin Knizhnik wrote:


On 26.04.2017 04:00, Tsunakawa, Takayuki wrote:   Are you considering some upper limit on the number of prepared statements?
In this case we need some kind of LRU for maintaining cache of autoprepared statements.
I think that it is good idea to have such limited cached - it can avoid memory overflow problem.
I will try to implement it.

I attach new patch which allows to limit the number of autoprepared statements (autoprepare_limit GUC variable).
Also I did more measurements, now with several concurrent connections and read-only statements.
Results of pgbench with 10 connections, scale 10 and read-only statements are below:

Protocol
TPS
extended
87k
prepared
209k
simple+autoprepare
206k

As you can see, autoprepare provides more than 2 times speed improvement.

Also I tried to measure overhead of parsing (to be able to substitute all literals, not only string literals).
I just added extra call of pg_parse_query. Speed is reduced to 181k.
So overhead is noticeable, but still making such optimization useful.
This is why I want to ask question:  is it better to implement slower but safer and more universal solution?

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Pavel Stehule
Date:


2017-04-26 12:30 GMT+02:00 Konstantin Knizhnik <k.knizhnik@postgrespro.ru>:


On 26.04.2017 10:49, Konstantin Knizhnik wrote:


On 26.04.2017 04:00, Tsunakawa, Takayuki wrote:   Are you considering some upper limit on the number of prepared statements?
In this case we need some kind of LRU for maintaining cache of autoprepared statements.
I think that it is good idea to have such limited cached - it can avoid memory overflow problem.
I will try to implement it.

I attach new patch which allows to limit the number of autoprepared statements (autoprepare_limit GUC variable).
Also I did more measurements, now with several concurrent connections and read-only statements.
Results of pgbench with 10 connections, scale 10 and read-only statements are below:

Protocol
TPS
extended
87k
prepared
209k
simple+autoprepare
206k

As you can see, autoprepare provides more than 2 times speed improvement.

Also I tried to measure overhead of parsing (to be able to substitute all literals, not only string literals).
I just added extra call of pg_parse_query. Speed is reduced to 181k.
So overhead is noticeable, but still making such optimization useful.
This is why I want to ask question:  is it better to implement slower but safer and more universal solution?

Unsafe solution has not any sense, and it is dangerous (80% of database users has not necessary knowledge). If somebody needs the max possible performance, then he use explicit prepared statements.

Regards

Pavel
 


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

From
Doug Doole
Date:
A naive option would be to invalidate anything that depends on table or view *.FOOBAR. You could probably make it a bit smarter by also requiring that schema A appear in the path.

This has been rumbling around in my head. I wonder if you could solve this problem by registering dependencies on objects which don't yet exist. Consider:

CREATE TABLE C.T1(...);
CREATE TABLE C.T2(...);
SET search_path='A,B,C,D';
SELECT * FROM C.T1, T2;

For T1, you'd register a hard dependency on C.T1 and no virtual dependencies since the table is explicitly qualified.

For T2, you'd register a hard dependency on C.T2 since that is the table that was selected for the query. You'd also register virtual dependencies on A.T2 and B.T2 since if either of those tables (or views) are created you need to recompile the statement. (Note that no virtual dependency is created on D.T2() since that table would never be selected by the compiler.)

The catch is that virtual dependencies would have to be recorded and searched as strings, not OIDs since the objects don't exist. Virtual dependencies only have to be checked during CREATE processing though, so that might not be too bad.

But this is getting off topic - I just wanted to capture the idea while it was rumbling around.

Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:
On 04/26/2017 08:08 PM, Doug Doole wrote:
A naive option would be to invalidate anything that depends on table or view *.FOOBAR. You could probably make it a bit smarter by also requiring that schema A appear in the path.

This has been rumbling around in my head. I wonder if you could solve this problem by registering dependencies on objects which don't yet exist. Consider:

CREATE TABLE C.T1(...);
CREATE TABLE C.T2(...);
SET search_path='A,B,C,D';
SELECT * FROM C.T1, T2;

For T1, you'd register a hard dependency on C.T1 and no virtual dependencies since the table is explicitly qualified.

For T2, you'd register a hard dependency on C.T2 since that is the table that was selected for the query. You'd also register virtual dependencies on A.T2 and B.T2 since if either of those tables (or views) are created you need to recompile the statement. (Note that no virtual dependency is created on D.T2() since that table would never be selected by the compiler.)

The catch is that virtual dependencies would have to be recorded and searched as strings, not OIDs since the objects don't exist. Virtual dependencies only have to be checked during CREATE processing though, so that might not be too bad.

But this is getting off topic - I just wanted to capture the idea while it was rumbling around.

I think that it will be enough to handle modification of search path and invalidate prepared statements cache in this case.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:


On 26.04.2017 13:46, Pavel Stehule wrote:

I attach new patch which allows to limit the number of autoprepared statements (autoprepare_limit GUC variable).
Also I did more measurements, now with several concurrent connections and read-only statements.
Results of pgbench with 10 connections, scale 10 and read-only statements are below:

Protocol
TPS
extended
87k
prepared
209k
simple+autoprepare
206k

As you can see, autoprepare provides more than 2 times speed improvement.

Also I tried to measure overhead of parsing (to be able to substitute all literals, not only string literals).
I just added extra call of pg_parse_query. Speed is reduced to 181k.
So overhead is noticeable, but still making such optimization useful.
This is why I want to ask question:  is it better to implement slower but safer and more universal solution?

Unsafe solution has not any sense, and it is dangerous (80% of database users has not necessary knowledge). If somebody needs the max possible performance, then he use explicit prepared statements.


I attached new patch to this mail. I completely reimplement my original approach and now use parse tree transformation.
New pgbench (-S -c 10) results are the following:

Protocol
TPS
extended
87k
prepared
209k
simple+autoprepare
185k

So there is some slowdown comparing with my original implementation and explicitly prepared statements, but still it provide more than two times speed-up comparing with unprepared queries. And it doesn't require to change existed applications.
As far as most of real production application are working with DBMS through some connection pool (pgbouncer,...), I think that such optimization will be useful.
Isn't it interesting if If we can increase system throughput almost two times by just setting one parameter in configuration file?

I also tried to enable autoprepare by default and run regression tests. 7 tests are not passed because of the following reasons:
1. Slightly different error reporting (for example error location is not always identically specified).
2. Difference in query behavior caused by  changed local settings (Andres gives an example with search_path,  and date test is failed because of changing datestyle).
3. Problems with indirect dependencies (when table is altered only cached plans directly depending on this relation and invalidated, but not plans with indirect dependencies).
4. Not performing domain checks for null values.
 
I do not think that this issues can cause problems for real application.

Also it is possible to limit number of autoprepared statements using autoprepare_limit parameter, avoid possible backend memory overflow in case of larger number of unique queries sent by application. LRU discipline is used to drop least recently used plans.

Any comments and suggestions for future improvement of this patch are welcome.
-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Robert Haas
Date:
On Fri, Apr 28, 2017 at 6:01 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

Any comments and suggestions for future improvement of this patch are welcome.

+        PG_TRY();
+        {
+            query = parse_analyze_varparams(parse_tree,
+                                            query_string,
+                                            &param_types,
+                                            &num_params);
+        }
+        PG_CATCH();
+        {
+            /*
+             * In case of analyze errors revert back to original query processing
+             * and disable autoprepare for this query to avoid such problems in future.
+             */
+            FlushErrorState();
+            if (snapshot_set) {
+                PopActiveSnapshot();
+            }
+            entry->disable_autoprepare = true;
+            undo_query_plan_changes(parse_tree, const_param_list);
+            MemoryContextSwitchTo(old_context);
+            return false;
+        }
+        PG_END_TRY();

This is definitely not a safe way of using TRY/CATCH.

+
+            /* Convert literal value to parameter value */
+            switch (const_param->literal->val.type)
+            {
+              /*
+               * Convert from integer literal
+               */
+              case T_Integer:
+                switch (ptype) {
+                  case INT8OID:
+                    params->params[paramno].value = Int64GetDatum((int64)const_param->literal->val.val.ival);
+                    break;
+                  case INT4OID:
+                    params->params[paramno].value = Int32GetDatum((int32)const_param->literal->val.val.ival);
+                    break;
+                  case INT2OID:
+                    if (const_param->literal->val.val.ival < SHRT_MIN
+                        || const_param->literal->val.val.ival > SHRT_MAX)
+                    {
+                        ereport(ERROR,
+                                (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+                                 errmsg("smallint out of range")));
+                    }
+                    params->params[paramno].value = Int16GetDatum((int16)const_param->literal->val.val.ival);
+                    break;
+                  case FLOAT4OID:
+                    params->params[paramno].value = Float4GetDatum((float)const_param->literal->val.val.ival);
+                    break;
+                  case FLOAT8OID:
+                    params->params[paramno].value = Float8GetDatum((double)const_param->literal->val.val.ival);
+                    break;
+                  case INT4RANGEOID:
+                    sprintf(buf, "[%ld,%ld]", const_param->literal->val.val.ival, const_param->literal->val.val.ival);
+                    getTypeInputInfo(ptype, &typinput, &typioparam);
+                    params->params[paramno].value = OidInputFunctionCall(typinput, buf, typioparam, -1);
+                    break;
+                  default:
+                    pg_lltoa(const_param->literal->val.val.ival, buf);
+                    getTypeInputInfo(ptype, &typinput, &typioparam);
+                    params->params[paramno].value = OidInputFunctionCall(typinput, buf, typioparam, -1);
+                }
+                break;
+              case T_Null:
+                params->params[paramno].isnull = true;
+                break;
+              default:
+                /*
+                 * Convert from string literal
+                 */
+                getTypeInputInfo(ptype, &typinput, &typioparam);
+                params->params[paramno].value = OidInputFunctionCall(typinput, const_param->literal->val.val.str, typioparam, -1);
+            }

I don't see something with a bunch of hard-coded rules for particular type OIDs having any chance of being acceptable.

This patch seems to duplicate a large amount of existing code.  That would be a good thing to avoid.

It could use a visit from the style police and a spell-checker, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:


On 01.05.2017 18:52, Robert Haas wrote:
On Fri, Apr 28, 2017 at 6:01 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

Any comments and suggestions for future improvement of this patch are welcome.

+        PG_TRY();
+        {
+            query = parse_analyze_varparams(parse_tree,
+                                            query_string,
+                                            &param_types,
+                                            &num_params);
+        }
+        PG_CATCH();
+        {
+            /*
+             * In case of analyze errors revert back to original query processing
+             * and disable autoprepare for this query to avoid such problems in future.
+             */
+            FlushErrorState();
+            if (snapshot_set) {
+                PopActiveSnapshot();
+            }
+            entry->disable_autoprepare = true;
+            undo_query_plan_changes(parse_tree, const_param_list);
+            MemoryContextSwitchTo(old_context);
+            return false;
+        }
+        PG_END_TRY();

This is definitely not a safe way of using TRY/CATCH.

+
+            /* Convert literal value to parameter value */
+            switch (const_param->literal->val.type)
+            {
+              /*
+               * Convert from integer literal
+               */
+              case T_Integer:
+                switch (ptype) {
+                  case INT8OID:
+                    params->params[paramno].value = Int64GetDatum((int64)const_param->literal->val.val.ival);
+                    break;
+                  case INT4OID:
+                    params->params[paramno].value = Int32GetDatum((int32)const_param->literal->val.val.ival);
+                    break;
+                  case INT2OID:
+                    if (const_param->literal->val.val.ival < SHRT_MIN
+                        || const_param->literal->val.val.ival > SHRT_MAX)
+                    {
+                        ereport(ERROR,
+                                (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+                                 errmsg("smallint out of range")));
+                    }
+                    params->params[paramno].value = Int16GetDatum((int16)const_param->literal->val.val.ival);
+                    break;
+                  case FLOAT4OID:
+                    params->params[paramno].value = Float4GetDatum((float)const_param->literal->val.val.ival);
+                    break;
+                  case FLOAT8OID:
+                    params->params[paramno].value = Float8GetDatum((double)const_param->literal->val.val.ival);
+                    break;
+                  case INT4RANGEOID:
+                    sprintf(buf, "[%ld,%ld]", const_param->literal->val.val.ival, const_param->literal->val.val.ival);
+                    getTypeInputInfo(ptype, &typinput, &typioparam);
+                    params->params[paramno].value = OidInputFunctionCall(typinput, buf, typioparam, -1);
+                    break;
+                  default:
+                    pg_lltoa(const_param->literal->val.val.ival, buf);
+                    getTypeInputInfo(ptype, &typinput, &typioparam);
+                    params->params[paramno].value = OidInputFunctionCall(typinput, buf, typioparam, -1);
+                }
+                break;
+              case T_Null:
+                params->params[paramno].isnull = true;
+                break;
+              default:
+                /*
+                 * Convert from string literal
+                 */
+                getTypeInputInfo(ptype, &typinput, &typioparam);
+                params->params[paramno].value = OidInputFunctionCall(typinput, const_param->literal->val.val.str, typioparam, -1);
+            }

I don't see something with a bunch of hard-coded rules for particular type OIDs having any chance of being acceptable.


Well, what I need is to convert literal value represented in Value struct to parameter datum value.
Struct "value" contains union with integer literal and text.
So this peace of code is just provides efficient handling of most common cases (integer parameters) and uses type's input function in other cases.


This patch seems to duplicate a large amount of existing code.  That would be a good thing to avoid.

Yes,  I have to copy a lot of code from exec_parse_message + exec_bind_message + exec_execute_message functions.
Definitely copying of code is bad flaw. It will be much better and easier just to call three original functions instead of mixing gathering their code into the new function.
But I failed to do it because
1.  Autoprepare should be integrated into exec_simple_query. Before executing query in normal way, I need to perform cache lookup for previously prepared plan for this generalized query.
And generalization of query requires building of query tree (query parsing). In other words, parsing should be done before I can call exec_parse_message.
2. exec_bind_message works with parameters passed by client though libpwq protocol, while autoprepare deals with values of parameters extracted from literals.
3. I do not want to generate dummy name for autoprepared query to handle it as normal prepared statement.
And I can not use unnamed statements because I want lifetime of autoprepared statements will be larger than one statement.
4. I have to use slightly different memory context policy than named or unnamed prepared statements.

Also this three exec_* functions contain prolog/epilog code which is needed because them are serving separate libpq requests.
But in case of autoprepared statements them need to be executed in the context of single libpq message, so most of this code is redundant.


It could use a visit from the style police and a spell-checker, too.

I will definitely fix style and misspelling - I have not submitted yet this patch to commit fest and there is long enough time to next commitfest.
My primary intention of publishing this patch is receive feedback on the proposed approach.
I already got two very useful advices: limit number of cached statements and pay more attention to safety.
This is why I have reimplemented my original approach with substituting string literals with parameters without building parse tree.

Right now I am mostly thinking about two problems:
1. Finding out best criteria of detecting literals which need to be replaced with parameters and which not. It is clear that replacing "limit 10" with "limit $10"
will have not so much sense and can cause worse execution plan. So right now I just ignore sort, group by and limit parts. But may be it is possible to find some more flexible approach.
2. Which type to chose for parameters. I can try to infer type from context (current solution), or try to use type of literal.
The problem with first approach is that query compiler is not always able to do it and even if type can be determined, it may be too generic (for example numeric instead of real
or range instead of integer). The problem with second approach is opposite: type inferred from literal type can be too restrictive - quite often integer literals are used to specify values of floating point constant. The best solution is first try to determine parameter type from context and then refine it based on literal type. But  it will require repeat of query analysis.
Not sure if it is possible.



--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Thanks for your feedback.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] Cached plans and statement generalization

From
Robert Haas
Date:
On Tue, May 2, 2017 at 5:50 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
>> I don't see something with a bunch of hard-coded rules for particular type
>> OIDs having any chance of being acceptable.
>
> Well, what I need is ...

Regarding this...

> Definitely copying of code is bad flaw. It will be much better and easier
> just to call three original functions instead of mixing gathering their code
> into the new function.
> But I failed to do it because ...

...and also this:

I am sympathetic to the fact that this is a hard problem to solve.
I'm just telling you that the way you've got it is not acceptable and
nobody's going to commit it like that (or if they do, they will end up
having to revert it).  If you want to have a technical discussion
about what might be a way to change the patch to be more acceptable,
cool, but I don't want to get into a long debate about whether what
you have is acceptable or not; I've already said what I think about
that and I believe that opinion will be widely shared.  I am not
trying to beat you up here, just trying to be clear.

> Thanks for your feedback.

Sure thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:
On 02.05.2017 21:26, Robert Haas wrote:
>
> I am sympathetic to the fact that this is a hard problem to solve.
> I'm just telling you that the way you've got it is not acceptable and
> nobody's going to commit it like that (or if they do, they will end up
> having to revert it).  If you want to have a technical discussion
> about what might be a way to change the patch to be more acceptable,
> cool, but I don't want to get into a long debate about whether what
> you have is acceptable or not; I've already said what I think about
> that and I believe that opinion will be widely shared.  I am not
> trying to beat you up here, just trying to be clear.
>
>

Based on the Robert's feedback and Tom's proposal I have implemented two 
new versions of autoprepare patch.

First version is just refactoring of my original implementation: I have 
extracted common code into prepare_cached_plan and exec_prepared_plan
function to avoid code duplication. Also I rewrote assignment of values 
to parameters. Now types of parameters are inferred from types of 
literals, so there may be several
prepared plans which are different only by types of parameters. Due to 
the problem with type coercion for parameters, I have to catch errors in 
parse_analyze_varparams.

Robert, can you please explain why using TRY/CATCH is not safe here:
> This is definitely not a safe way of using TRY/CATCH.

Second version is based on Tom's suggestion:
> Personally I'd think about
> replacing the entire literal-with-cast construct with a Param having
> already-known type.
So here I first patch raw parse tree, replacing A_Const with ParamRef. 
Such plan is needed to perform cache lookup.
Then I restore original raw parse tree and call pg_analyze_and_rewrite. 
Then I mutate analyzed tree, replacing Const with Param nodes.
In this case type coercion is already performed and I know precise types 
which should be used for parameters.
It seems to be more sophisticated approach. And I can not extract common 
code in prepare_cached_plan,
because preparing of plan is currently mix of steps done in 
exec_simple_query and exec_parse_message.
But there is no need to catch analyze errors.

Finally performance of both approaches is the same: at pgbench it is 
180k TPS on read-only queries, comparing with 80k TPS for not prepared 
queries.
In both cases 7 out of  177 regression tests  are not passed (mostly 
because session environment is changed between subsequent query execution).

I am going to continue work on this patch I will be glad to receive any 
feedback and suggestions for its improvement.
In most cases, applications are not accessing Postgres directly, but 
using some connection pooling layer and so them are not able to use 
prepared statements.
But at simple OLTP Postgres spent more time on building query plan than 
on execution itself. And it is possible to speedup Postgres about two 
times at such workload!
Another alternative is true shared plan cache.  May be it is even more 
perspective approach, but definitely much more invasive and harder to 
implement.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Bruce Momjian
Date:
On Wed, May 10, 2017 at 07:11:07PM +0300, Konstantin Knizhnik wrote:
> I am going to continue work on this patch I will be glad to receive any
> feedback and suggestions for its improvement.
> In most cases, applications are not accessing Postgres directly, but using
> some connection pooling layer and so them are not able to use prepared
> statements.
> But at simple OLTP Postgres spent more time on building query plan than on
> execution itself. And it is possible to speedup Postgres about two times at
> such workload!
> Another alternative is true shared plan cache.  May be it is even more
> perspective approach, but definitely much more invasive and harder to
> implement.

Can we back up and get an overview of what you are doing and how you are
doing it?  Our TODO list suggests this order for successful patches:
Desirability -> Design -> Implement -> Test -> Review -> Commit

You kind of started at the Implementation/patch level, which makes it
hard to evaluate.

I think everyone agrees on the Desirability of the feature, but the
Design is the tricky part.  I think the design questions are:

*  What information is stored about cached plans?
*  How are the cached plans invalidated?
*  How is a query matched against a cached plan?

Looking at the options, ideally the plan would be cached at the same
query stage as the stage where the incoming query is checked against the
cache.  However, caching and checking at the same level offers no
benefit, so they are going to be different.  For example, caching a
parse tree at the time it is created, then checking at the same point if
the incoming query is the same doesn't help you because you already had
to create the parse tree get to that point.

A more concrete example is prepared statements.  They are stored at the
end of planning and matched in the parser.  However, you can easily do
that since the incoming query specifies the name of the prepared query,
so there is no trick to matching.

The desire is to cache as late as possible so you cache more work and
you have more detail about the referenced objects, which helps with
cache invalidation.  However, you also want to do cache matching as
early as possible to improve performance.

So, let's look at some options.  One interesting idea from Doug Doole
was to do it between the tokenizer and parser.  I think they are glued
together so you would need a way to run the tokenizer separately and
compare that to the tokens you stored for the cached plan.  The larger
issue is that prepared plans already are checked after parsing, and we
know they are a win, so matching any earlier than that just seems like
overkill and likely to lead to lots of problems.

So, you could do it after parsing but before parse-analysis, which is
kind of what prepared queries do.  One tricky problem is that we don't
bind the query string tokens to database objects until after parse
analysis.

Doing matching before parse-analysis is going to be tricky, which is why
there are so many comments about the approach.  Changing search_path can
certainly affect it, but creating objects in earlier-mentioned schemas
can also change how an object reference in a query is resolved.  Even
obscure things like the creation of a new operator that has higher
precedence in the query could change the plan, though am not sure if
our prepared query system even handles that properly.

Anyway, that is my feedback.  I would like to get an overview of what
you are trying to do and the costs/benefits of each option so we can
best guide you.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] Cached plans and statement generalization

From
Douglas Doole
Date:
One interesting idea from Doug Doole was to do it between the tokenizer and parser.  I think they are glued together so you would need a way to run the tokenizer separately and compare that to the tokens you stored for the cached plan.

When I did this, we had the same problem that the tokenizer and parser were tightly coupled. Fortunately, I was able to do as you suggest and run the tokenizer separately to do my analysis. 

So my model was to do statement generalization before entering the compiler at all. I would tokenize the statement to find the literals and generate a new statement string with placeholders. The new string would the be passed to the compiler which would then tokenize and parse the reworked statement.

This means we incurred the cost of tokenizing twice, but the tokenizer was lightweight enough that it wasn't a problem. In exchange I was able to do statement generalization without touching the compiler - the compiler saw the generalized statement text as any other statement and handled it in the exact same way. (There was just a bit of new code around variable binding.)

Re: [HACKERS] Cached plans and statement generalization

From
Bruce Momjian
Date:
On Thu, May 11, 2017 at 05:39:58PM +0000, Douglas Doole wrote:
>     One interesting idea from Doug Doole was to do it between the tokenizer and
>     parser.  I think they are glued together so you would need a way to run the
>     tokenizer separately and compare that to the tokens you stored for the
>     cached plan.
> 
> 
> When I did this, we had the same problem that the tokenizer and parser were
> tightly coupled. Fortunately, I was able to do as you suggest and run the
> tokenizer separately to do my analysis. 
> 
> So my model was to do statement generalization before entering the compiler at
> all. I would tokenize the statement to find the literals and generate a new
> statement string with placeholders. The new string would the be passed to the
> compiler which would then tokenize and parse the reworked statement.
> 
> This means we incurred the cost of tokenizing twice, but the tokenizer was
> lightweight enough that it wasn't a problem. In exchange I was able to do
> statement generalization without touching the compiler - the compiler saw the
> generalized statement text as any other statement and handled it in the exact
> same way. (There was just a bit of new code around variable binding.)

Good point.  I think we need to do some measurements to see if the
parser-only stage is actually significant.  I have a hunch that
commercial databases have much heavier parsers than we do.

This split would also not work if the scanner feeds changes back into
the parser.  I know C does that for typedefs but I don't think we do.

Ideally I would like to see percentage-of-execution numbers for typical
queries for scan, parse, parse-analysis, plan, and execute to see where
the wins are.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] Cached plans and statement generalization

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Good point.  I think we need to do some measurements to see if the
> parser-only stage is actually significant.  I have a hunch that
> commercial databases have much heavier parsers than we do.

FWIW, gram.y does show up as significant in many of the profiles I take.
I speculate that this is not so much that it eats many CPU cycles, as that
the constant tables are so large as to incur lots of cache misses.  scan.l
is not quite as big a deal for some reason, even though it's also large.
        regards, tom lane



Re: [HACKERS] Cached plans and statement generalization

From
Andres Freund
Date:

On May 11, 2017 11:31:02 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Bruce Momjian <bruce@momjian.us> writes:
>> Good point.  I think we need to do some measurements to see if the
>> parser-only stage is actually significant.  I have a hunch that
>> commercial databases have much heavier parsers than we do.
>
>FWIW, gram.y does show up as significant in many of the profiles I
>take.
>I speculate that this is not so much that it eats many CPU cycles, as
>that
>the constant tables are so large as to incur lots of cache misses.
>scan.l
>is not quite as big a deal for some reason, even though it's also
>large.

And that there's a lot of unpredictable branches, leading to a lot if pipeline stalls.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:
On 05/11/2017 06:12 PM, Bruce Momjian wrote:
> On Wed, May 10, 2017 at 07:11:07PM +0300, Konstantin Knizhnik wrote:
>> I am going to continue work on this patch I will be glad to receive any
>> feedback and suggestions for its improvement.
>> In most cases, applications are not accessing Postgres directly, but using
>> some connection pooling layer and so them are not able to use prepared
>> statements.
>> But at simple OLTP Postgres spent more time on building query plan than on
>> execution itself. And it is possible to speedup Postgres about two times at
>> such workload!
>> Another alternative is true shared plan cache.  May be it is even more
>> perspective approach, but definitely much more invasive and harder to
>> implement.
> Can we back up and get an overview of what you are doing and how you are
> doing it?  Our TODO list suggests this order for successful patches:
>
>     Desirability -> Design -> Implement -> Test -> Review -> Commit
>
> You kind of started at the Implementation/patch level, which makes it
> hard to evaluate.
>
> I think everyone agrees on the Desirability of the feature, but the
> Design is the tricky part.  I think the design questions are:
>
> *  What information is stored about cached plans?
> *  How are the cached plans invalidated?
> *  How is a query matched against a cached plan?
>
> Looking at the options, ideally the plan would be cached at the same
> query stage as the stage where the incoming query is checked against the
> cache.  However, caching and checking at the same level offers no
> benefit, so they are going to be different.  For example, caching a
> parse tree at the time it is created, then checking at the same point if
> the incoming query is the same doesn't help you because you already had
> to create the parse tree get to that point.
>
> A more concrete example is prepared statements.  They are stored at the
> end of planning and matched in the parser.  However, you can easily do
> that since the incoming query specifies the name of the prepared query,
> so there is no trick to matching.
>
> The desire is to cache as late as possible so you cache more work and
> you have more detail about the referenced objects, which helps with
> cache invalidation.  However, you also want to do cache matching as
> early as possible to improve performance.
>
> So, let's look at some options.  One interesting idea from Doug Doole
> was to do it between the tokenizer and parser.  I think they are glued
> together so you would need a way to run the tokenizer separately and
> compare that to the tokens you stored for the cached plan.  The larger
> issue is that prepared plans already are checked after parsing, and we
> know they are a win, so matching any earlier than that just seems like
> overkill and likely to lead to lots of problems.
>
> So, you could do it after parsing but before parse-analysis, which is
> kind of what prepared queries do.  One tricky problem is that we don't
> bind the query string tokens to database objects until after parse
> analysis.
>
> Doing matching before parse-analysis is going to be tricky, which is why
> there are so many comments about the approach.  Changing search_path can
> certainly affect it, but creating objects in earlier-mentioned schemas
> can also change how an object reference in a query is resolved.  Even
> obscure things like the creation of a new operator that has higher
> precedence in the query could change the plan, though am not sure if
> our prepared query system even handles that properly.
>
> Anyway, that is my feedback.  I would like to get an overview of what
> you are trying to do and the costs/benefits of each option so we can
> best guide you.
>
Sorry, for luck of overview.
I have started with small prototype just to investigate if such optimization makes sense or not.
When I get more than two time advantage in performance on standard pgbench, I come to conclusion that this
optimization can be really very useful and now try to find the best way of its implementation.

I have started with simplest approach when string literals are replaced with parameters. It is done before parsing.
And can be done very fast - just need to locate data in quotes.
But this approach is not safe and universal: you will not be able to speedup most of the existed queries without
rewritingthem.
 

This is why I have provided second implementation which replace literals with parameters after raw parsing.
Certainly it is slower than first approach. But still provide significant advantage in performance: more than two times
atpgbench.
 
Then I tried to run regression tests and find several situations where type analysis is not correctly performed in case
ofreplacing literals with parameters.
 

So my third attempt is to replace constant nodes with parameters in already analyzed tree.

Now answering your questions:

*  What information is stored about cached plans?

Key to locate cached plan is raw parse tree. Value is saved CachedPlanSource.

*  How are the cached plans invalidated?

In the same way as plans for explicitly prepared statements.

*  How is a query matched against a cached plan?

Hash function is calculated for raw parse tree and equal() function is used to compare raw plans with literal nodes
replacedwith parameters. 
 


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:
On 05/11/2017 09:31 PM, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
>> Good point.  I think we need to do some measurements to see if the
>> parser-only stage is actually significant.  I have a hunch that
>> commercial databases have much heavier parsers than we do.
> FWIW, gram.y does show up as significant in many of the profiles I take.
> I speculate that this is not so much that it eats many CPU cycles, as that
> the constant tables are so large as to incur lots of cache misses.  scan.l
> is not quite as big a deal for some reason, even though it's also large.
>
>             regards, tom lane
Yes, my results shows that pg_parse_query adds not so much overhead:
206k TPS for my first variant with string literal substitution and modified query text used as hash key vs.
181k. TPS for version with patching raw parse tree constructed by pg_parse_query.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Cached plans and statement generalization

From
Andres Freund
Date:
On 2017-05-11 22:48:26 +0300, Konstantin Knizhnik wrote:
> On 05/11/2017 09:31 PM, Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> > > Good point.  I think we need to do some measurements to see if the
> > > parser-only stage is actually significant.  I have a hunch that
> > > commercial databases have much heavier parsers than we do.
> > FWIW, gram.y does show up as significant in many of the profiles I take.
> > I speculate that this is not so much that it eats many CPU cycles, as that
> > the constant tables are so large as to incur lots of cache misses.  scan.l
> > is not quite as big a deal for some reason, even though it's also large.
> > 
> >             regards, tom lane
> Yes, my results shows that pg_parse_query adds not so much overhead:
> 206k TPS for my first variant with string literal substitution and modified query text used as hash key vs.
> 181k. TPS for version with patching raw parse tree constructed by pg_parse_query.

Those numbers and your statement seem to contradict each other?

- Andres



Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:
On 05/11/2017 10:52 PM, Andres Freund wrote:
> On 2017-05-11 22:48:26 +0300, Konstantin Knizhnik wrote:
>> On 05/11/2017 09:31 PM, Tom Lane wrote:
>>> Bruce Momjian <bruce@momjian.us> writes:
>>>> Good point.  I think we need to do some measurements to see if the
>>>> parser-only stage is actually significant.  I have a hunch that
>>>> commercial databases have much heavier parsers than we do.
>>> FWIW, gram.y does show up as significant in many of the profiles I take.
>>> I speculate that this is not so much that it eats many CPU cycles, as that
>>> the constant tables are so large as to incur lots of cache misses.  scan.l
>>> is not quite as big a deal for some reason, even though it's also large.
>>>
>>>             regards, tom lane
>> Yes, my results shows that pg_parse_query adds not so much overhead:
>> 206k TPS for my first variant with string literal substitution and modified query text used as hash key vs.
>> 181k. TPS for version with patching raw parse tree constructed by pg_parse_query.
> Those numbers and your statement seem to contradict each other?

Oops, my parse error:( I incorrectly read Tom's statement.
Actually, I also was afraid that price of parsing is large enough and this is why my first attempt was to avoid
parsing.
But then I find out that most of the time is spent in analyze and planning (see attached profile):

pg_parse_query: 4.23%
pg_analyze_and_rewrite 8.45%
pg_plan_queries: 15.49%

>
> - Andres


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Bruce Momjian
Date:
On Thu, May 11, 2017 at 10:41:45PM +0300, Konstantin Knizhnik wrote:
> This is why I have provided second implementation which replace
> literals with parameters after raw parsing.  Certainly it is slower
> than first approach. But still provide significant advantage in
> performance: more than two times at pgbench.  Then I tried to run
> regression tests and find several situations where type analysis is
> not correctly performed in case of replacing literals with parameters.

So the issue is that per-command output from the parser, SelectStmt,
only has strings for identifers, e.g. table and column names, so you
can't be sure it is the same as the cached entry you matched.  I suppose
if you cleared the cache every time someone created an object or changed
search_path, it might work.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 12.05.2017 03:58, Bruce Momjian wrote:
> On Thu, May 11, 2017 at 10:41:45PM +0300, Konstantin Knizhnik wrote:
>> This is why I have provided second implementation which replace
>> literals with parameters after raw parsing.  Certainly it is slower
>> than first approach. But still provide significant advantage in
>> performance: more than two times at pgbench.  Then I tried to run
>> regression tests and find several situations where type analysis is
>> not correctly performed in case of replacing literals with parameters.
> So the issue is that per-command output from the parser, SelectStmt,
> only has strings for identifers, e.g. table and column names, so you
> can't be sure it is the same as the cached entry you matched.  I suppose
> if you cleared the cache every time someone created an object or changed
> search_path, it might work.
>
Definitely changing session context (search_path, date/time format, ...) 
may cause incorrect behavior of cached statements.
Actually you may get the same problem with explicitly prepared 
statements (certainly, in the last case, you better understand what 
going on and it is your choice whether to use or not to use prepared 
statement).

The fact of failure of 7 regression tests means that autoprepare can 
really change behavior of existed program. This is why my suggestion is  
to switch off this feature by default.
But in 99.9% real cases (my estimation plucked out of thin air:) there 
will be no such problems with autoprepare. And it can significantly 
improve performance of OLTP applications
which are not able to use prepared statements (because of working 
through pgbouncer or any other reasons).

Can autoprepare slow down the system?
Yes, it can. It can happen if application perform larger number of 
unique queries and autoprepare cache size is not limited.
In this case large (and infinitely growing) number of stored plans can 
consume a lot of memory and, what is even worse, slowdown cache lookup.
This is why I by default limit number of cached statements 
(autoprepare_limit parameter) by 100.

I am almost sure that there will be some other issues with autoprepare 
which I have not encountered yet (because I mostly tested it on pgbench 
and Postgres regression tests).
But I am also sure that benefit of doubling system performance is good 
motivation to continue work in this direction.

My main concern is whether to continue to improve current approach with 
local (per-backend) cache of prepared statements.
Or create shared cache (as in Oracle). It is much more difficult to 
implement shared cache (the same problem with session context, different 
catalog snapshots, cache invalidation,...)
but it also provides more opportunities for queries optimization and 
tuning.





-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Cached plans and statement generalization

From
Bruce Momjian
Date:
On Fri, May 12, 2017 at 10:50:41AM +0300, Konstantin Knizhnik wrote:
> Definitely changing session context (search_path, date/time format, ...) may
> cause incorrect behavior of cached statements.

I wonder if we should clear the cache whenever any SET command is
issued.

> Actually you may get the same problem with explicitly prepared statements
> (certainly, in the last case, you better understand what going on and it is
> your choice whether to use or not to use prepared statement).
> 
> The fact of failure of 7 regression tests means that autoprepare can really
> change behavior of existed program. This is why my suggestion is  to switch
> off this feature by default.

I would like to see us target something that can be enabled by default. 
Even if it only improves performance by 5%, it would be better overall
than a feature that improves performance by 90% but is only used by 1%
of our users.

> But in 99.9% real cases (my estimation plucked out of thin air:) there will
> be no such problems with autoprepare. And it can significantly improve
> performance of OLTP applications
> which are not able to use prepared statements (because of working through
> pgbouncer or any other reasons).

Right, but we can't ship something that is 99.9% accurate when the
inaccuracy is indeterminate.  The bottom line is that Postgres has a
very high bar, and I realize you are just prototyping at this point, but
the final product is going to have to address all the intricacies of the
issue for it to be added.

> Can autoprepare slow down the system?
> Yes, it can. It can happen if application perform larger number of unique
> queries and autoprepare cache size is not limited.
> In this case large (and infinitely growing) number of stored plans can
> consume a lot of memory and, what is even worse, slowdown cache lookup.
> This is why I by default limit number of cached statements
> (autoprepare_limit parameter) by 100.

Yes, good idea.

> I am almost sure that there will be some other issues with autoprepare which
> I have not encountered yet (because I mostly tested it on pgbench and
> Postgres regression tests).
> But I am also sure that benefit of doubling system performance is good
> motivation to continue work in this direction.

Right, you are still testing to see where the edges are.

> My main concern is whether to continue to improve current approach with
> local (per-backend) cache of prepared statements.
> Or create shared cache (as in Oracle). It is much more difficult to
> implement shared cache (the same problem with session context, different
> catalog snapshots, cache invalidation,...)
> but it also provides more opportunities for queries optimization and tuning.

I would continue in the per-backend cache direction at this point
because we don't even have that solved yet.  The global cache is going
to be even more complex.

Ultimately I think we are going to want global and local caches because
the plans of the local cache are much more likely to be accurate.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 12.05.2017 18:23, Bruce Momjian wrote:
> On Fri, May 12, 2017 at 10:50:41AM +0300, Konstantin Knizhnik wrote:
>> Definitely changing session context (search_path, date/time format, ...) may
>> cause incorrect behavior of cached statements.
> I wonder if we should clear the cache whenever any SET command is
> issued.

Well, with autoprepare cache disabled on each set variable, alter system 
and any slow utility statement
only one regression test is not passed. And only because of different 
error message context:

*** /home/knizhnik/postgresql.master/src/test/regress/expected/date.out 
2017-04-11 18:07:56.497461208 +0300
--- /home/knizhnik/postgresql.master/src/test/regress/results/date.out 
2017-05-12 20:21:19.767566302 +0300
***************
*** 1443,1452 ****  --  SELECT EXTRACT(MICROSEC  FROM DATE 'infinity');     -- ERROR: 
timestamp units "microsec" not recognized  ERROR:  timestamp units "microsec" not recognized
- CONTEXT:  SQL function "date_part" statement 1  SELECT EXTRACT(UNDEFINED FROM DATE 'infinity');     -- ERROR: 
timestamp units "undefined" not supported  ERROR:  timestamp units "undefined" not supported
- CONTEXT:  SQL function "date_part" statement 1  -- test constructors  select make_date(2013, 7, 15);   make_date
--- 1443,1450 ----

======================================================================



>> Actually you may get the same problem with explicitly prepared statements
>> (certainly, in the last case, you better understand what going on and it is
>> your choice whether to use or not to use prepared statement).
>>
>> The fact of failure of 7 regression tests means that autoprepare can really
>> change behavior of existed program. This is why my suggestion is  to switch
>> off this feature by default.
> I would like to see us target something that can be enabled by default.
> Even if it only improves performance by 5%, it would be better overall
> than a feature that improves performance by 90% but is only used by 1%
> of our users.

I have to agree with you here.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Cached plans and statement generalization

From
Andres Freund
Date:

On May 12, 2017 12:50:41 AM PDT, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

>Definitely changing session context (search_path, date/time format,
>...)
>may cause incorrect behavior of cached statements.
>Actually you may get the same problem with explicitly prepared
>statements (certainly, in the last case, you better understand what
>going on and it is your choice whether to use or not to use prepared
>statement).
>
>The fact of failure of 7 regression tests means that autoprepare can
>really change behavior of existed program. This is why my suggestion is
>
>to switch off this feature by default.

I can't see us agreeing with this feature unless it actually reliably works, even if disabled by default.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: [HACKERS] Cached plans and statement generalization

From
Bruce Momjian
Date:
On Fri, May 12, 2017 at 08:35:26PM +0300, Konstantin Knizhnik wrote:
> 
> 
> On 12.05.2017 18:23, Bruce Momjian wrote:
> >On Fri, May 12, 2017 at 10:50:41AM +0300, Konstantin Knizhnik wrote:
> >>Definitely changing session context (search_path, date/time format, ...) may
> >>cause incorrect behavior of cached statements.
> >I wonder if we should clear the cache whenever any SET command is
> >issued.
> 
> Well, with autoprepare cache disabled on each set variable, alter system and
> any slow utility statement
> only one regression test is not passed. And only because of different error
> message context:

Great.  We have to think of how we would keep this reliable when future
changes happen.  Simple is often better because it limits the odds of
breakage.

> >>Actually you may get the same problem with explicitly prepared statements
> >>(certainly, in the last case, you better understand what going on and it is
> >>your choice whether to use or not to use prepared statement).
> >>
> >>The fact of failure of 7 regression tests means that autoprepare can really
> >>change behavior of existed program. This is why my suggestion is  to switch
> >>off this feature by default.
> >I would like to see us target something that can be enabled by default.
> >Even if it only improves performance by 5%, it would be better overall
> >than a feature that improves performance by 90% but is only used by 1%
> >of our users.
> 
> I have to agree with you here.

Good.  I know there are database systems that will try to get the most
performance possible no matter how complex it is for users to configure
or to maintain, but that isn't us.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] Cached plans and statement generalization

From
Robert Haas
Date:
On Wed, May 10, 2017 at 12:11 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> Robert, can you please explain why using TRY/CATCH is not safe here:
>>
>> This is definitely not a safe way of using TRY/CATCH.

This has been discussed many, many times on this mailing list before,
and I don't really want to go over it again here.  We really need a
README or some documentation about this so that we don't have to keep
answering this same question again and again.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 15.05.2017 18:31, Robert Haas wrote:
> On Wed, May 10, 2017 at 12:11 PM, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> Robert, can you please explain why using TRY/CATCH is not safe here:
>>> This is definitely not a safe way of using TRY/CATCH.
> This has been discussed many, many times on this mailing list before,
> and I don't really want to go over it again here.  We really need a
> README or some documentation about this so that we don't have to keep
> answering this same question again and again.
>
First of all I want to notice that new version of my patch is not using 
PG_TRY/PG_CATCH.
But I still want to clarify for myself whats wrong with this constructions.
I searched both hackers mailing list archive and world-wide using google 
but failed to find any references except of
sharing non-volatilie variables between try and catch blocks.
Can you please point me at the thread where this problem was discussed 
or just explain in few words the source of the problem?
From my own experience I found out that PG_TRY/PG_CATCH mechanism is 
not providing proper cleanup (unlike C++ exceptions).
If there are opened relations, catalog cache entries,... then throwing 
error will not release them.
It will cause no problems if error is handled in PostgresMain which 
aborts current transaction and releases all resources in any case.
But if I want to ignore this error and continue query execution, then 
warnings about resources leaks can be reported.
Is it want you mean by unsafety of PG_TRY/PG_CATCH constructions?

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Cached plans and statement generalization

From
Andres Freund
Date:
On 2017-05-18 11:57:57 +0300, Konstantin Knizhnik wrote:
> From my own experience I found out that PG_TRY/PG_CATCH mechanism is not
> providing proper cleanup (unlike C++ exceptions).

Right, simply because there's no portable way to transparently do so.
Would be possible on elf glibc platforms, but ...


> If there are opened relations, catalog cache entries,... then throwing error
> will not release them.
> It will cause no problems if error is handled in PostgresMain which aborts
> current transaction and releases all resources in any case.
> But if I want to ignore this error and continue query execution, then
> warnings about resources leaks can be reported.
> Is it want you mean by unsafety of PG_TRY/PG_CATCH constructions?

There's worse than just leaking resources.  Everything touching the
database might cause persistent corruption if you don't roll back.
Consider an insert that failed with a foreign key exception, done from
some function.  If you ignore that error, the row will still be visible,
but the foreign key will be violated.   If you want to continue after a
PG_CATCH you have to use a subtransaction/savepoint for the PG_TRY
contents, like several PLs do.

- Andres



Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:
On 10.05.2017 19:11, Konstantin Knizhnik wrote:
>
> Based on the Robert's feedback and Tom's proposal I have implemented 
> two new versions of autoprepare patch.
>
> First version is just refactoring of my original implementation: I 
> have extracted common code into prepare_cached_plan and 
> exec_prepared_plan
> function to avoid code duplication. Also I rewrote assignment of 
> values to parameters. Now types of parameters are inferred from types 
> of literals, so there may be several
> prepared plans which are different only by types of parameters. Due to 
> the problem with type coercion for parameters, I have to catch errors 
> in parse_analyze_varparams.
>

Attached please find rebased version of the autoprepare patch based on 
Tom's proposal (perform analyze for tree with constant literals and then 
replace them with parameters).
Also I submitted this patch for the Autum commitfest.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Thomas Munro
Date:
On Fri, May 26, 2017 at 3:54 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> Attached please find rebased version of the autoprepare patch based on Tom's
> proposal (perform analyze for tree with constant literals and then replace
> them with parameters).
> Also I submitted this patch for the Autum commitfest.

The patch didn't survive the Summer bitrotfest.  Could you please rebase it?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:


On 09.09.2017 06:35, Thomas Munro wrote:
On Fri, May 26, 2017 at 3:54 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
Attached please find rebased version of the autoprepare patch based on Tom's
proposal (perform analyze for tree with constant literals and then replace
them with parameters).
Also I submitted this patch for the Autum commitfest.
The patch didn't survive the Summer bitrotfest.  Could you please rebase it?

Attached please find rebased version of the patch.
There are the updated performance results (pgbench -s 100 -c 1):

protocol (-M)
read-write
read-only (-S)
simple
3327
19325
extended
2256
16908
prepared
6145
39326
simple+autoprepare
4950
34841


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:


On 11.09.2017 12:24, Konstantin Knizhnik wrote:
Attached please find rebased version of the patch.
There are the updated performance results (pgbench -s 100 -c 1):

protocol (-M)
read-write
read-only (-S)
simple
3327
19325
extended
2256
16908
prepared
6145
39326
simple+autoprepare
4950
34841


One more patch passing all regression tests with autoprepare_threshold=1.
I still do not think that it should be switch on by default...

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Michael Paquier
Date:
On Wed, Sep 13, 2017 at 2:11 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> One more patch passing all regression tests with autoprepare_threshold=1.
> I still do not think that it should be switch on by default...

This patch does not apply, and did not get any reviews. So I am moving
it to next CF with waiting on author as status. Please provide a
rebased version. Tsunakawa-san, you are listed as a reviewer of this
patch. If you are not planning to look at it anymore, you may want to
remove your name from the related CF entry
https://commitfest.postgresql.org/16/1150/.
-- 
Michael


RE: [HACKERS] Cached plans and statement generalization

From
"Tsunakawa, Takayuki"
Date:
From: Michael Paquier [mailto:michael.paquier@gmail.com]
> This patch does not apply, and did not get any reviews. So I am moving it
> to next CF with waiting on author as status. Please provide a rebased version.
> Tsunakawa-san, you are listed as a reviewer of this patch. If you are not
> planning to look at it anymore, you may want to remove your name from the
> related CF entry https://commitfest.postgresql.org/16/1150/.

Sorry, but I'm planning to review this next month because this proposal could alleviate some problem we faced.

Regards
Takayuki Tsunakawa



Re: [HACKERS] Cached plans and statement generalization

From
Michael Paquier
Date:
On Thu, Nov 30, 2017 at 11:07 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> From: Michael Paquier [mailto:michael.paquier@gmail.com]
>> This patch does not apply, and did not get any reviews. So I am moving it
>> to next CF with waiting on author as status. Please provide a rebased version.
>> Tsunakawa-san, you are listed as a reviewer of this patch. If you are not
>> planning to look at it anymore, you may want to remove your name from the
>> related CF entry https://commitfest.postgresql.org/16/1150/.
>
> Sorry, but I'm planning to review this next month because this proposal could alleviate some problem we faced.

No problem. Thanks for the update.
-- 
Michael


Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 30.11.2017 04:59, Michael Paquier wrote:
> On Wed, Sep 13, 2017 at 2:11 AM, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> One more patch passing all regression tests with autoprepare_threshold=1.
>> I still do not think that it should be switch on by default...
> This patch does not apply, and did not get any reviews. So I am moving
> it to next CF with waiting on author as status. Please provide a
> rebased version. Tsunakawa-san, you are listed as a reviewer of this
> patch. If you are not planning to look at it anymore, you may want to
> remove your name from the related CF entry
> https://commitfest.postgresql.org/16/1150/.

Updated version of the patch is attached.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Stephen Frost
Date:
Greetings,

* Konstantin Knizhnik (k.knizhnik@postgrespro.ru) wrote:
> On 30.11.2017 04:59, Michael Paquier wrote:
> >On Wed, Sep 13, 2017 at 2:11 AM, Konstantin Knizhnik
> ><k.knizhnik@postgrespro.ru> wrote:
> >>One more patch passing all regression tests with autoprepare_threshold=1.
> >>I still do not think that it should be switch on by default...
> >This patch does not apply, and did not get any reviews. So I am moving
> >it to next CF with waiting on author as status. Please provide a
> >rebased version. Tsunakawa-san, you are listed as a reviewer of this
> >patch. If you are not planning to look at it anymore, you may want to
> >remove your name from the related CF entry
> >https://commitfest.postgresql.org/16/1150/.
>
> Updated version of the patch is attached.

This patch appears to apply with just a bit of fuzz and make check
passes, so I'm not sure why this is currently marked as 'Waiting for
author'.

I've updated it to be 'Needs review'.  If that's incorrect, feel free to
change it back with an explanation.

Thanks!

Stephen

Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Thomas Munro
Date:
On Sun, Jan 7, 2018 at 11:51 AM, Stephen Frost <sfrost@snowman.net> wrote:
> * Konstantin Knizhnik (k.knizhnik@postgrespro.ru) wrote:
>> Updated version of the patch is attached.
>
> This patch appears to apply with just a bit of fuzz and make check
> passes, so I'm not sure why this is currently marked as 'Waiting for
> author'.
>
> I've updated it to be 'Needs review'.  If that's incorrect, feel free to
> change it back with an explanation.

Hi Konstantin,

/home/travis/build/postgresql-cfbot/postgresql/src/backend/tcop/postgres.c:5249:
undefined reference to `PortalGetHeapMemory'

That's because commit 0f7c49e85518dd846ccd0a044d49a922b9132983 killed
PortalGetHeapMemory.  Looks like it needs to be replaced with
portal->portalContext.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 12.01.2018 03:40, Thomas Munro wrote:
> On Sun, Jan 7, 2018 at 11:51 AM, Stephen Frost <sfrost@snowman.net> wrote:
>> * Konstantin Knizhnik (k.knizhnik@postgrespro.ru) wrote:
>>> Updated version of the patch is attached.
>> This patch appears to apply with just a bit of fuzz and make check
>> passes, so I'm not sure why this is currently marked as 'Waiting for
>> author'.
>>
>> I've updated it to be 'Needs review'.  If that's incorrect, feel free to
>> change it back with an explanation.
> Hi Konstantin,
>
> /home/travis/build/postgresql-cfbot/postgresql/src/backend/tcop/postgres.c:5249:
> undefined reference to `PortalGetHeapMemory'
>
> That's because commit 0f7c49e85518dd846ccd0a044d49a922b9132983 killed
> PortalGetHeapMemory.  Looks like it needs to be replaced with
> portal->portalContext.
>
Hi  Thomas,

Thank you very much for reporting the problem.
Rebased version of the patch is attached.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: Re: [HACKERS] Cached plans and statement generalization

From
David Steele
Date:
Hi Konstantin,

On 1/12/18 7:53 AM, Konstantin Knizhnik wrote:
> 
> 
> On 12.01.2018 03:40, Thomas Munro wrote:
>> On Sun, Jan 7, 2018 at 11:51 AM, Stephen Frost <sfrost@snowman.net>
>> wrote:
>>> * Konstantin Knizhnik (k.knizhnik@postgrespro.ru) wrote:
>>>> Updated version of the patch is attached.
>>> This patch appears to apply with just a bit of fuzz and make check
>>> passes, so I'm not sure why this is currently marked as 'Waiting for
>>> author'.
>>>
>>> I've updated it to be 'Needs review'.  If that's incorrect, feel free to
>>> change it back with an explanation.
>> Hi Konstantin,
>>
>> /home/travis/build/postgresql-cfbot/postgresql/src/backend/tcop/postgres.c:5249:
>>
>> undefined reference to `PortalGetHeapMemory'
>>
>> That's because commit 0f7c49e85518dd846ccd0a044d49a922b9132983 killed
>> PortalGetHeapMemory.  Looks like it needs to be replaced with
>> portal->portalContext.
>>
> Hi  Thomas,
> 
> Thank you very much for reporting the problem.
> Rebased version of the patch is attached.

This patch has received no review or comments since last May and appears
too complex and invasive for the final CF of PG11.

I don't think it makes sense to keep pushing a patch through CFs when it
is not getting reviewed.  I'm planning to mark this as Returned with
Feedback unless there are solid arguments to the contrary.

Thanks,
-- 
-David
david@pgmasters.net


Re: Re: Re: [HACKERS] Cached plans and statement generalization

From
David Steele
Date:
On 3/2/18 9:26 AM, David Steele wrote:
> On 1/12/18 7:53 AM, Konstantin Knizhnik wrote:
>>
>>
>> On 12.01.2018 03:40, Thomas Munro wrote:
>>> On Sun, Jan 7, 2018 at 11:51 AM, Stephen Frost <sfrost@snowman.net>
>>> wrote:
>>>> * Konstantin Knizhnik (k.knizhnik@postgrespro.ru) wrote:
>>>>> Updated version of the patch is attached.
>>>> This patch appears to apply with just a bit of fuzz and make check
>>>> passes, so I'm not sure why this is currently marked as 'Waiting for
>>>> author'.
>>>>
>>>> I've updated it to be 'Needs review'.  If that's incorrect, feel free to
>>>> change it back with an explanation.
>>> Hi Konstantin,
>>>
>>> /home/travis/build/postgresql-cfbot/postgresql/src/backend/tcop/postgres.c:5249:
>>>
>>> undefined reference to `PortalGetHeapMemory'
>>>
>>> That's because commit 0f7c49e85518dd846ccd0a044d49a922b9132983 killed
>>> PortalGetHeapMemory.  Looks like it needs to be replaced with
>>> portal->portalContext.
>>>
>> Hi  Thomas,
>>
>> Thank you very much for reporting the problem.
>> Rebased version of the patch is attached.
> 
> This patch has received no review or comments since last May and appears
> too complex and invasive for the final CF of PG11.
> 
> I don't think it makes sense to keep pushing a patch through CFs when it
> is not getting reviewed.  I'm planning to mark this as Returned with
> Feedback unless there are solid arguments to the contrary.

Marked as Returned with Feedback.

Regards,
-- 
-David
david@pgmasters.net


RE: [HACKERS] Cached plans and statement generalization

From
"Yamaji, Ryo"
Date:
> -----Original Message-----
> From: Konstantin Knizhnik [mailto:k.knizhnik@postgrespro.ru]
> Sent: Friday, January 12, 2018 9:53 PM
> To: Thomas Munro <thomas.munro@enterprisedb.com>; Stephen Frost
> <sfrost@snowman.net>
> Cc: Michael Paquier <michael.paquier@gmail.com>; PostgreSQL mailing
> lists <pgsql-hackers@postgresql.org>; Tsunakawa, Takayuki/綱川 貴之
> <tsunakawa.takay@jp.fujitsu.com>
> Subject: Re: [HACKERS] Cached plans and statement generalization
>
> Thank you very much for reporting the problem.
> Rebased version of the patch is attached.

Hi Konstantin.

I think that this patch excel very much. Because the customer of our
company has the demand that migrates from other DB to PostgreSQL, and
the problem to have to modify the application program to do prepare in
that case occurs. It is possible to solve it by the problem's using this
patch. I want to be helping this patch to be committed. Will you participate
in the following CF? 

To review this patch, I verified it. The verified environment is
PostgreSQL 11beta2. It is necessary to add "executor/spi.h" and "jit/jit.h"
to postgres.c of the patch by the updating of PostgreSQL. Please rebase.

1. I confirmed the influence on the performance by having applied this patch.
The result showed the tendency similar to Konstantin.
-s:100    -c:8    -t: 10000    read-only
simple:                    20251 TPS
prepare:                29502 TPS
simple(autoprepare):            28001 TPS

2. I confirmed the influence on the memory utilization by the length of query that did
autoprepare. Short queries have 1 constant. Long queries have 100 constants.
This result was shown that preparing long query used the memory more.
before prepare:plan cache context: 1032 used
prepare 10 short query statement:plan cache context: 15664 used
prepare 10 long query statement:plan cache context: 558032 used

In this patch, the maximum number of query that can do prepare can be set to autoprepare_limit.
However, is it good in this? I think that I can assume the scene in the following.
- Application side user: To elicit the performance, they want to specify the number of prepared
query.
- Operation side user: To prevent the memory from overflowing, they want to set the maximum value
of the memory utilization.
Therefore, I propose to add the parameter to specify the maximum memory utilization.

3. I confirmed the transition of the amount of the memory when it tried to prepare query
of the number that exceeded the value specified for autoprepare_limit.
[autoprepare_limit=1 and execute 10 different queries]
    plan cache context: 1032 used
    plan cache context: 39832 used
    plan cache context: 78552 used
    plan cache context: 117272 used
    plan cache context: 155952 used
    plan cache context: 194632 used
    plan cache context: 233312 used
    plan cache context: 272032 used
    plan cache context: 310712 used
    plan cache context: 349392 used
    plan cache context: 388072 used

I feel the doubt in an increase of the memory utilization when I execute a lot of
query though cached query is one (autoprepare_limit=1).
This behavior is correct?

Best regards, Yamaji

Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:
Hi Yamaji,


On 31.07.2018 12:12, Yamaji, Ryo wrote:
>> -----Original Message-----
>> From: Konstantin Knizhnik [mailto:k.knizhnik@postgrespro.ru]
>> Sent: Friday, January 12, 2018 9:53 PM
>> To: Thomas Munro <thomas.munro@enterprisedb.com>; Stephen Frost
>> <sfrost@snowman.net>
>> Cc: Michael Paquier <michael.paquier@gmail.com>; PostgreSQL mailing
>> lists <pgsql-hackers@postgresql.org>; Tsunakawa, Takayuki/綱川 貴之
>> <tsunakawa.takay@jp.fujitsu.com>
>> Subject: Re: [HACKERS] Cached plans and statement generalization
>>
>> Thank you very much for reporting the problem.
>> Rebased version of the patch is attached.
> Hi Konstantin.
>
> I think that this patch excel very much. Because the customer of our
> company has the demand that migrates from other DB to PostgreSQL, and
> the problem to have to modify the application program to do prepare in
> that case occurs. It is possible to solve it by the problem's using this
> patch. I want to be helping this patch to be committed. Will you participate
> in the following CF?

This patch will be included in next release of PgPro EE.
Concerning next commit fest - I am not sure.
At previous commitfest it was returned with feedback that it "has 
received no review or comments since last May".
May be your review will help to change this situation.


>
> To review this patch, I verified it. The verified environment is
> PostgreSQL 11beta2. It is necessary to add "executor/spi.h" and "jit/jit.h"
> to postgres.c of the patch by the updating of PostgreSQL. Please rebase.
>
> 1. I confirmed the influence on the performance by having applied this patch.
> The result showed the tendency similar to Konstantin.
> -s:100    -c:8    -t: 10000    read-only
> simple:                    20251 TPS
> prepare:                29502 TPS
> simple(autoprepare):            28001 TPS
>
> 2. I confirmed the influence on the memory utilization by the length of query that did
> autoprepare. Short queries have 1 constant. Long queries have 100 constants.
> This result was shown that preparing long query used the memory more.
> before prepare:plan cache context: 1032 used
> prepare 10 short query statement:plan cache context: 15664 used
> prepare 10 long query statement:plan cache context: 558032 used
>
> In this patch, the maximum number of query that can do prepare can be set to autoprepare_limit.
> However, is it good in this? I think that I can assume the scene in the following.
> - Application side user: To elicit the performance, they want to specify the number of prepared
> query.
> - Operation side user: To prevent the memory from overflowing, they want to set the maximum value
> of the memory utilization.
> Therefore, I propose to add the parameter to specify the maximum memory utilization.

I agree it may be more useful to limit amount of memory used by prepare 
queries, rather than number of prepared statements.
But it is just more difficult to calculate and maintain (I am not sure 
that just looking at CacheMemoryContext is enough for it).
Also, if working set of queries (frequently repeated queries) doesn't 
fir in memory, then autoprepare will be almost useless (because with 
high probability
prepared query will be thrown away from the cache before it can be 
reused). So limiting queries from "application side" seems to be more 
practical.

> 3. I confirmed the transition of the amount of the memory when it tried to prepare query
> of the number that exceeded the value specified for autoprepare_limit.
> [autoprepare_limit=1 and execute 10 different queries]
>     plan cache context: 1032 used
>     plan cache context: 39832 used
>     plan cache context: 78552 used
>     plan cache context: 117272 used
>     plan cache context: 155952 used
>     plan cache context: 194632 used
>     plan cache context: 233312 used
>     plan cache context: 272032 used
>     plan cache context: 310712 used
>     plan cache context: 349392 used
>     plan cache context: 388072 used
>
> I feel the doubt in an increase of the memory utilization when I execute a lot of
> query though cached query is one (autoprepare_limit=1).
> This behavior is correct?

I will check it.



Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 01.08.2018 00:30, Konstantin Knizhnik wrote:
> Hi Yamaji,
>
>
> On 31.07.2018 12:12, Yamaji, Ryo wrote:
>>> -----Original Message-----
>>> From: Konstantin Knizhnik [mailto:k.knizhnik@postgrespro.ru]
>>> Sent: Friday, January 12, 2018 9:53 PM
>>> To: Thomas Munro <thomas.munro@enterprisedb.com>; Stephen Frost
>>> <sfrost@snowman.net>
>>> Cc: Michael Paquier <michael.paquier@gmail.com>; PostgreSQL mailing
>>> lists <pgsql-hackers@postgresql.org>; Tsunakawa, Takayuki/綱川 貴之
>>> <tsunakawa.takay@jp.fujitsu.com>
>>> Subject: Re: [HACKERS] Cached plans and statement generalization
>>>
>>> Thank you very much for reporting the problem.
>>> Rebased version of the patch is attached.
>> Hi Konstantin.
>>
>> I think that this patch excel very much. Because the customer of our
>> company has the demand that migrates from other DB to PostgreSQL, and
>> the problem to have to modify the application program to do prepare in
>> that case occurs. It is possible to solve it by the problem's using this
>> patch. I want to be helping this patch to be committed. Will you 
>> participate
>> in the following CF?
>
> This patch will be included in next release of PgPro EE.
> Concerning next commit fest - I am not sure.
> At previous commitfest it was returned with feedback that it "has 
> received no review or comments since last May".
> May be your review will help to change this situation.
>
>
>>
>> To review this patch, I verified it. The verified environment is
>> PostgreSQL 11beta2. It is necessary to add "executor/spi.h" and 
>> "jit/jit.h"
>> to postgres.c of the patch by the updating of PostgreSQL. Please rebase.

New version of the patch is attached.

Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 31.07.2018 12:12, Yamaji, Ryo wrote:
>
> 3. I confirmed the transition of the amount of the memory when it tried to prepare query
> of the number that exceeded the value specified for autoprepare_limit.
> [autoprepare_limit=1 and execute 10 different queries]
>     plan cache context: 1032 used
>     plan cache context: 39832 used
>     plan cache context: 78552 used
>     plan cache context: 117272 used
>     plan cache context: 155952 used
>     plan cache context: 194632 used
>     plan cache context: 233312 used
>     plan cache context: 272032 used
>     plan cache context: 310712 used
>     plan cache context: 349392 used
>     plan cache context: 388072 used
>
> I feel the doubt in an increase of the memory utilization when I execute a lot of
> query though cached query is one (autoprepare_limit=1).
> This behavior is correct?

I failed to reproduce the problem.
I used the following non-default configuration parameters:

autoprepare_limit=1
autoprepare_threshold=1

create dummy database:

create table foo(x integer primary key, y integer);
insert into foo values (generate_series(1,10000), 0);

and run different queries,  like:

postgres=# select *  from foo where x=1;
postgres=# select *  from foo where x+x=1;
postgres=# select *  from foo where x+x+x=1;
postgres=# select *  from foo where x+x+x+x=1;
...

and check size of CacheMemoryContext using gdb - it is not increased.

Can you please send me your test?




RE: [HACKERS] Cached plans and statement generalization

From
"Yamaji, Ryo"
Date:
> -----Original Message-----
> From: Konstantin Knizhnik [mailto:k.knizhnik@postgrespro.ru]
> Sent: Wednesday, August 1, 2018 4:53 PM
> To: Yamaji, Ryo/山地 亮 <yamaji.ryo@jp.fujitsu.com>
> Cc: PostgreSQL mailing lists <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] Cached plans and statement generalization
> 
> I failed to reproduce the problem.
> I used the following non-default configuration parameters:
> 
> autoprepare_limit=1
> autoprepare_threshold=1
> 
> create dummy database:
> 
> create table foo(x integer primary key, y integer); insert into foo values
> (generate_series(1,10000), 0);
> 
> and run different queries,  like:
> 
> postgres=# select *  from foo where x=1; postgres=# select *  from foo
> where x+x=1; postgres=# select *  from foo where x+x+x=1; postgres=#
> select *  from foo where x+x+x+x=1; ...
> 
> and check size of CacheMemoryContext using gdb - it is not increased.
> 
> Can you please send me your test?


I checked not CacheMemoryContext but "plan cache context". 
Because I think that the memory context that autoprepare mainly uses is "plan cache context". 

Non-default configuration parameter was used as well as Konstantin. 
autoprepare_limit=1
autoprepare_threshold=1

The procedure of the test that I did is shown below. 

create dummy database
create table test (key1 integer, key2 integer, ... , key100 integer);
insert into test values (1, 2, ... , 100);

And, different queries are executed. 
select key1 from test where key1=1 and key2=2 and ... and key100=100;
select key2 from test where key1=1 and key2=2 and ... and key100=100;
select key3 from test where key1=1 and key2=2 and ... and key100=100;...

And, "plan cache context" was confirmed by using gdb.



Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 02.08.2018 08:25, Yamaji, Ryo wrote:
>> -----Original Message-----
>> From: Konstantin Knizhnik [mailto:k.knizhnik@postgrespro.ru]
>> Sent: Wednesday, August 1, 2018 4:53 PM
>> To: Yamaji, Ryo/山地 亮 <yamaji.ryo@jp.fujitsu.com>
>> Cc: PostgreSQL mailing lists <pgsql-hackers@postgresql.org>
>> Subject: Re: [HACKERS] Cached plans and statement generalization
>>
>> I failed to reproduce the problem.
>> I used the following non-default configuration parameters:
>>
>> autoprepare_limit=1
>> autoprepare_threshold=1
>>
>> create dummy database:
>>
>> create table foo(x integer primary key, y integer); insert into foo values
>> (generate_series(1,10000), 0);
>>
>> and run different queries,  like:
>>
>> postgres=# select *  from foo where x=1; postgres=# select *  from foo
>> where x+x=1; postgres=# select *  from foo where x+x+x=1; postgres=#
>> select *  from foo where x+x+x+x=1; ...
>>
>> and check size of CacheMemoryContext using gdb - it is not increased.
>>
>> Can you please send me your test?
>
> I checked not CacheMemoryContext but "plan cache context".
> Because I think that the memory context that autoprepare mainly uses is "plan cache context".
>
> Non-default configuration parameter was used as well as Konstantin.
> autoprepare_limit=1
> autoprepare_threshold=1
>
> The procedure of the test that I did is shown below.
>
> create dummy database
> create table test (key1 integer, key2 integer, ... , key100 integer);
> insert into test values (1, 2, ... , 100);
>
> And, different queries are executed.
> select key1 from test where key1=1 and key2=2 and ... and key100=100;
> select key2 from test where key1=1 and key2=2 and ... and key100=100;
> select key3 from test where key1=1 and key2=2 and ... and key100=100;...
>
> And, "plan cache context" was confirmed by using gdb.
>
>


Thank you.
Unfortunately expression_tree_walker is not consistent with copyObject: 
I tried to use this walker to destroy raw parse tree of autoprepared 
statement, but looks
like some nodes are not visited by expression_tree_walker. I have to 
create separate memory context for each autoprepared statement.
New version  of the patch is attached.

Attachment

RE: [HACKERS] Cached plans and statement generalization

From
"Yamaji, Ryo"
Date:
> -----Original Message-----
> From: Konstantin Knizhnik [mailto:k.knizhnik@postgrespro.ru]
> Sent: Friday, August 3, 2018 7:02 AM
> To: Yamaji, Ryo/山地 亮 <yamaji.ryo@jp.fujitsu.com>
> Cc: PostgreSQL mailing lists <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] Cached plans and statement generalization
>
> Thank you.
> Unfortunately expression_tree_walker is not consistent with copyObject:
> I tried to use this walker to destroy raw parse tree of autoprepared
> statement, but looks like some nodes are not visited by
> expression_tree_walker. I have to create separate memory context for each
> autoprepared statement.
> New version  of the patch is attached.

Thank you for attaching the patch.
The improvement to plan cache context by the new patch was confirmed.


> This patch will be included in next release of PgPro EE.
> Concerning next commit fest - I am not sure.
> At previous commitfest it was returned with feedback that it "has received no review or comments since last May".
> May be your review will help to change this situation.

I want to confirm one point.
If I will have reviewed the autoprepare patch, then are you ready to register
the patch at commit fest in the near future? I fear that autoprepare patch do
not registered at commit fest in the future (for example, you are so busy), and
do not applied to PostgreSQL. If you are not ready to register the patch, I think
I want to register at commit fest instead of you.


> I agree it may be more useful to limit amount of memory used by prepare 
> queries, rather than number of prepared statements.
> But it is just more difficult to calculate and maintain (I am not sure 
> that just looking at CacheMemoryContext is enough for it).
> Also, if working set of queries (frequently repeated queries) doesn't 
> fir in memory, then autoprepare will be almost useless (because with 
> high probability
> prepared query will be thrown away from the cache before it can be 
> reused). So limiting queries from "application side" seems to be more 
> practical.

I see. But I fear that autoprepare process uses irregularity amount of memory
when autoprepare_limit is specified number of prepared statements. I think
that there is scene that autoprepare process use a lot of memory (ex. it
need to prepare a lot of long queries), then other processes (ex. other
backend process in PostgreSQL or process other than PostgreSQL) cannot use
memory. I hope to specify limit amount of memory in the future.

Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 07.08.2018 13:02, Yamaji, Ryo wrote:
>
> I want to confirm one point.
> If I will have reviewed the autoprepare patch, then are you ready to register
> the patch at commit fest in the near future? I fear that autoprepare patch do
> not registered at commit fest in the future (for example, you are so busy), and
> do not applied to PostgreSQL. If you are not ready to register the patch, I think
> I want to register at commit fest instead of you.

I have registered the patch for next commitfest.
For some reasons it doesn't find the latest autoprepare-10.patch and 
still refer to autoprepare-6.patch as the latest attachement.

>
>
>> I agree it may be more useful to limit amount of memory used by prepare
>> queries, rather than number of prepared statements.
>> But it is just more difficult to calculate and maintain (I am not sure
>> that just looking at CacheMemoryContext is enough for it).
>> Also, if working set of queries (frequently repeated queries) doesn't
>> fir in memory, then autoprepare will be almost useless (because with
>> high probability
>> prepared query will be thrown away from the cache before it can be
>> reused). So limiting queries from "application side" seems to be more
>> practical.
> I see. But I fear that autoprepare process uses irregularity amount of memory
> when autoprepare_limit is specified number of prepared statements. I think
> that there is scene that autoprepare process use a lot of memory (ex. it
> need to prepare a lot of long queries), then other processes (ex. other
> backend process in PostgreSQL or process other than PostgreSQL) cannot use
> memory. I hope to specify limit amount of memory in the future.

Right now each prepared statement has two memory contexts: one for raw 
parse tree used as hash table key and another for cached plan itself.
May be it will be possible to combine them. To calculate memory consumed 
by cached plans, it will be necessary to calculate memory usage 
statistic for all this contexts (which requires traversal of all 
context's chunks) and sum them. It is much more complex and expensive 
than current check: (++autoprepare_cached_plans > autoprepare_limit)
although I so not think that it will have measurable impact on 
performance...
May be there should be some faster way to estimate memory consumed by 
prepared statements.

So, the current autoprepare_limit allows to limit number of autoprepared 
statements and prevent memory overflow caused by execution of larger 
number of different statements.
The question is whether we need more precise mechanism which will take 
in account difference between small and large queries. Definitely simple 
query can require 10-100 times less memory than complex query. But 
memory contexts themselves (even with small block size) somehow minimize 
difference in memory footprint of different queries, because of chunked 
allocation.



-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



RE: [HACKERS] Cached plans and statement generalization

From
"Yamaji, Ryo"
Date:
On Tuesday, August 7, 2018 at 0:36 AM, Konstantin Knizhnik wrote: 

> I have registered the patch for next commitfest.
> For some reasons it doesn't find the latest autoprepare-10.patch and still
> refer to autoprepare-6.patch as the latest attachement.

I am sorry for the long delay in my response.
I confirmed it and become reviewer the patch.
I am going to review.


> Right now each prepared statement has two memory contexts: one for raw
> parse tree used as hash table key and another for cached plan itself.
> May be it will be possible to combine them. To calculate memory consumed
> by cached plans, it will be necessary to calculate memory usage statistic
> for all this contexts (which requires traversal of all context's chunks)
> and sum them. It is much more complex and expensive than current check:
> (++autoprepare_cached_plans > autoprepare_limit) although I so not think
> that it will have measurable impact on performance...
> May be there should be some faster way to estimate memory consumed by
> prepared statements.
> 
> So, the current autoprepare_limit allows to limit number of autoprepared
> statements and prevent memory overflow caused by execution of larger
> number of different statements.
> The question is whether we need more precise mechanism which will take
> in account difference between small and large queries. Definitely simple
> query can require 10-100 times less memory than complex query. But memory
> contexts themselves (even with small block size) somehow minimize
> difference in memory footprint of different queries, because of chunked
> allocation.

Thank you for telling me how to implement and its problem.
In many cases when actually designing a system, we estimate the amount of memory
to use and prepare a memory accordingly. Therefore, if we need to estimate memory
usage in both patterns that are limit amount of memory used by prepare queries or
number of prepared statements, I think that there is no problem with the current
implementation.

Apart from the patch, if it can implement an application that outputs estimates of
memory usage when we enter a query, you may be able to use autoprepare more comfortably.
But it sounds difficult..

Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 22.08.2018 07:54, Yamaji, Ryo wrote:
> On Tuesday, August 7, 2018 at 0:36 AM, Konstantin Knizhnik wrote:
>
>> I have registered the patch for next commitfest.
>> For some reasons it doesn't find the latest autoprepare-10.patch and still
>> refer to autoprepare-6.patch as the latest attachement.
> I am sorry for the long delay in my response.
> I confirmed it and become reviewer the patch.
> I am going to review.
>
>
>> Right now each prepared statement has two memory contexts: one for raw
>> parse tree used as hash table key and another for cached plan itself.
>> May be it will be possible to combine them. To calculate memory consumed
>> by cached plans, it will be necessary to calculate memory usage statistic
>> for all this contexts (which requires traversal of all context's chunks)
>> and sum them. It is much more complex and expensive than current check:
>> (++autoprepare_cached_plans > autoprepare_limit) although I so not think
>> that it will have measurable impact on performance...
>> May be there should be some faster way to estimate memory consumed by
>> prepared statements.
>>
>> So, the current autoprepare_limit allows to limit number of autoprepared
>> statements and prevent memory overflow caused by execution of larger
>> number of different statements.
>> The question is whether we need more precise mechanism which will take
>> in account difference between small and large queries. Definitely simple
>> query can require 10-100 times less memory than complex query. But memory
>> contexts themselves (even with small block size) somehow minimize
>> difference in memory footprint of different queries, because of chunked
>> allocation.
> Thank you for telling me how to implement and its problem.
> In many cases when actually designing a system, we estimate the amount of memory
> to use and prepare a memory accordingly. Therefore, if we need to estimate memory
> usage in both patterns that are limit amount of memory used by prepare queries or
> number of prepared statements, I think that there is no problem with the current
> implementation.
>
> Apart from the patch, if it can implement an application that outputs estimates of
> memory usage when we enter a query, you may be able to use autoprepare more comfortably.
> But it sounds difficult..

I have added autoprepare_memory_limit parameter which allows to limit 
memory used by  autoprepared statements rather than number of such 
statements.
If value of this parameter is non zero, then  total size of all memory 
contexts used for autoprepared statement is maintained (can be 
innsoected in debugger in autoprepare_used_memory variable).
As I already noticed, calculating memory context size adds extra 
overhead but I do not notice any influence in performance. New version 
of the patch is attached.



-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

RE: [HACKERS] Cached plans and statement generalization

From
"Yamaji, Ryo"
Date:
On Tuesday, August 7, 2018 at 0:36 AM, Konstantin Knizhnik wrote:

> I have registered the patch for next commitfest.
> For some reasons it doesn't find the latest autoprepare-10.patch and still
> refer to autoprepare-6.patch as the latest attachement.

I'm sorry for the late reply. I'm currently reviewing autoprepare.
I could not make it in time for the commit fests in September, 
but I will continue to review for the next commitfest.

Performance tests are good results. The results are shown below.
pgbench -s 100 -c 8 -t 10000 -S (average of 3 trials)
- all autoprepare statements use same memory context.
    18052.22706 TPS
- each autoprepare statement use separate memory context.
    18607.95889 TPS
- calculate memory usage (autoprepare_memory_limit)
    19171.60457 TPS

From the above results, I think that adding/changing functions
will not affect performance. I am currently checking the behavior
when autoprepare_memory_limit is specified.

Re: [HACKERS] Cached plans and statement generalization

From
Dmitry Dolgov
Date:
> On Fri, Sep 28, 2018 at 10:46 AM Yamaji, Ryo <yamaji.ryo@jp.fujitsu.com> wrote:
>
> On Tuesday, August 7, 2018 at 0:36 AM, Konstantin Knizhnik wrote:
>
> > I have registered the patch for next commitfest.
> > For some reasons it doesn't find the latest autoprepare-10.patch and still
> > refer to autoprepare-6.patch as the latest attachement.
>
> I'm sorry for the late reply. I'm currently reviewing autoprepare.
> I could not make it in time for the commit fests in September,
> but I will continue to review for the next commitfest.
>
> Performance tests are good results. The results are shown below.
> pgbench -s 100 -c 8 -t 10000 -S (average of 3 trials)
> - all autoprepare statements use same memory context.
>     18052.22706 TPS
> - each autoprepare statement use separate memory context.
>     18607.95889 TPS
> - calculate memory usage (autoprepare_memory_limit)
>     19171.60457 TPS
>
> From the above results, I think that adding/changing functions
> will not affect performance. I am currently checking the behavior
> when autoprepare_memory_limit is specified.

Hi,

Thanks for reviewing. Since another CF is about to end, maybe you can post the
full review feedback?

> On Tue, Jul 31, 2018 at 11:30 PM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
>
> Concerning next commit fest - I am not sure.
> At previous commitfest it was returned with feedback that it "has
> received no review or comments since last May".
> May be your review will help to change this situation.

Well, I think that wasn't the only reason, and having some balance here would
be nice. Anyway, thanks for working on this patch!

Unfortunately, the patch needs to be rebased one more time. Also, I see there
were concerns about how reliable this patch is. Just out of curiosity, can you
tell now that from v4 to v11 reliability is not a concern anymore?

(If you don't mind, I'll also CC some of the reviewers who saw previous
versions).


RE: [HACKERS] Cached plans and statement generalization

From
"Yamaji, Ryo"
Date:
On Fri, Nov 30, 2018 at 3:48 AM, Dmitry Dolgov wrote:

> Hi,
> 
> Thanks for reviewing. Since another CF is about to end, maybe you can
> post the full review feedback?

Since I had been busy with my private work, I couldn't review.
I want to review next commit fest. I am sorry I postponed many times.

Re: [HACKERS] Cached plans and statement generalization

From
Dmitry Dolgov
Date:
> On Fri, Nov 30, 2018 at 3:06 AM Yamaji, Ryo <yamaji.ryo@jp.fujitsu.com> wrote:
>
> On Fri, Nov 30, 2018 at 3:48 AM, Dmitry Dolgov wrote:
>
> > Hi,
> >
> > Thanks for reviewing. Since another CF is about to end, maybe you can
> > post the full review feedback?
>
> Since I had been busy with my private work, I couldn't review.
> I want to review next commit fest. I am sorry I postponed many times.

No worries, just don't forget to post a review when it will be ready.


RE: [HACKERS] Cached plans and statement generalization

From
"Nagaura, Ryohei"
Date:
Hi,

Although I became your reviewer, it seems to be difficult to feedback in this CF.
I continue to review, so would you update your patch please?
Until then I review your current patch.

There is one question.
date_1.out which maybe is copy of date.out includes trailing space and gaps of indent
e.g., line 3368 and 3380 in your current patch have 
    space in each end of line
    different indent.
This is also seen in date.out.
I'm not sure whether it is ok to add new file including the above features just because a existing file includes.
Is it ok?

Best regards,
---------------------
Ryohei Nagaura

Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 29.01.2019 4:38, Nagaura, Ryohei wrote:
> Hi,
>
> Although I became your reviewer, it seems to be difficult to feedback in this CF.
> I continue to review, so would you update your patch please?
> Until then I review your current patch.
>
> There is one question.
> date_1.out which maybe is copy of date.out includes trailing space and gaps of indent
> e.g., line 3368 and 3380 in your current patch have
>     space in each end of line
>     different indent.
> This is also seen in date.out.
> I'm not sure whether it is ok to add new file including the above features just because a existing file includes.
> Is it ok?
>
> Best regards,
> ---------------------
> Ryohei Nagaura
Rebased version of the patch is attached.
Concerning spaces in date.out and date_1.out - it is ok.
This files are just output of test execution and it is not possible to 
expect lack of trailing spaces in output of test scripts execution.



-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

RE: [HACKERS] Cached plans and statement generalization

From
"Nagaura, Ryohei"
Date:
Hi,

On Mon, Jan 28, 2019 at 10:46 PM, Konstantin Knizhnik wrote:
> Rebased version of the patch is attached.
Thank you for your quick rebase.

> This files are just output of test execution and it is not possible to expect lack of
> trailing spaces in output of test scripts execution.
I understand this is because regression tests use exact matching.
I learned a lot. Thanks.

Best regards,
---------------------
Ryohei Nagaura


RE: [HACKERS] Cached plans and statement generalization

From
"Yamaji, Ryo"
Date:
On Tue, Jan 29, 2019 at 10:46 AM, Konstantin Knizhnik wrote:
> Rebased version of the patch is attached.

I'm sorry for the late review.
I confirmed behavior of autoprepare-12.patch. It is summarized below.

・parameter
Expected behavior was shown according to the set value.
However, I think that it is be more kind to describe that autoprepare
hold infinite statements when the setting value of
autoprepare_ (memory_) limit is 0 in the manual.
There is no problem as operation.

・pg_autoprepared_statements
I confirmed that I could refer properly.

・autoprepare cache retention period
I confirmed that autoprepared statements were deleted when the set
statement or the DDL statement was executed. Although it differs from
the explicit prepare statements, it does not matter as a autoprepare.

・performance
This patch does not confirm the basic performance of autoprepare because
it confirmed that there was no performance problem with the previous
patch (autoprepare-11.patch). However, because it was argued that
performance degradation would occur when prepared statements execute to
a partition table, I expected that autoprepare might exhibit similar
behavior, and measured the performance. I also predicted that the
plan_cache_mode setting does not apply to autoprepare, and we also
measured the plan_cache_mode by conditions.
Below results (this result is TPS)

    plan_cache_mode     simple  simple(autoprepare) prepare
    auto                130     121                 121.5
    force_custom_plan   132.5   90.7                122.7
    force_generic_plan  126.7   14.7                24.7

Performance degradation was observed when plan_cache_mode was specified
for autoprepare. Is this behavior correct? I do not know why this is the
results.

Below performance test procedure

drop table if exists rt;
create table rt (a int, b int, c int) partition by range (a);
\o /dev/null
select 'create table rt' || x::text || ' partition of rt for values from (' ||
 (x)::text || ') to (' || (x+1)::text || ');' from generate_series(0, 1024) x;
\gexec
\o

pgbench -p port -T 60 -c 1 -n -f test.sql (-M prepared) postgres

test.sql
\set a random (0, 1023)
select * from rt where a = :a;

Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:
Thank you very much for the review!

On 19.03.2019 5:56, Yamaji, Ryo wrote:
> On Tue, Jan 29, 2019 at 10:46 AM, Konstantin Knizhnik wrote:
>> Rebased version of the patch is attached.
> I'm sorry for the late review.
> I confirmed behavior of autoprepare-12.patch. It is summarized below.
>
> ・parameter
> Expected behavior was shown according to the set value.
> However, I think that it is be more kind to describe that autoprepare
> hold infinite statements when the setting value of
> autoprepare_ (memory_) limit is 0 in the manual.
> There is no problem as operation.

Sorry, I do not completely understand your concern.
Description of autoprepare_ (memory_) limit includes explanation of zero 
value:

autoprepare_limit:
0 means unlimited number of autoprepared queries. Too large number of 
prepared queries can cause backend memory overflow and slowdown 
execution speed (because of increased lookup time.

autoprepare_memory_limit:
0 means that there is no memory limit. Calculating memory used by 
prepared queries adds some extra overhead,
so non-zero value of this parameter may cause some slowdown. 
autoprepare_limit is much faster way to limit number of autoprepared 
statements


Do you think that this descriptions are unclear and should be rewritten?

> ・pg_autoprepared_statements
> I confirmed that I could refer properly.
>
> ・autoprepare cache retention period
> I confirmed that autoprepared statements were deleted when the set
> statement or the DDL statement was executed. Although it differs from
> the explicit prepare statements, it does not matter as a autoprepare.
>
> ・performance
> This patch does not confirm the basic performance of autoprepare because
> it confirmed that there was no performance problem with the previous
> patch (autoprepare-11.patch). However, because it was argued that
> performance degradation would occur when prepared statements execute to
> a partition table, I expected that autoprepare might exhibit similar
> behavior, and measured the performance. I also predicted that the
> plan_cache_mode setting does not apply to autoprepare, and we also
> measured the plan_cache_mode by conditions.
> Below results (this result is TPS)
>
>      plan_cache_mode     simple  simple(autoprepare) prepare
>      auto                130     121                 121.5
>      force_custom_plan   132.5   90.7                122.7
>      force_generic_plan  126.7   14.7                24.7
>
> Performance degradation was observed when plan_cache_mode was specified
> for autoprepare. Is this behavior correct? I do not know why this is the
> results.
>
> Below performance test procedure
>
> drop table if exists rt;
> create table rt (a int, b int, c int) partition by range (a);
> \o /dev/null
> select 'create table rt' || x::text || ' partition of rt for values from (' ||
>   (x)::text || ') to (' || (x+1)::text || ');' from generate_series(0, 1024) x;
> \gexec
> \o
>
> pgbench -p port -T 60 -c 1 -n -f test.sql (-M prepared) postgres
>
> test.sql
> \set a random (0, 1023)
> select * from rt where a = :a;
Autoprepare is using the same functions from plancache.c so 
plan_cache_mode settings affect
autoprepare as well as explicitly preprepared statements.

Below are my results of select-only pgbench:

     plan_cache_mode     simple  simple(autoprepare) prepare
     auto                23k     42k                 50k
     force_custom_plan   23k     24k                 26k
     force_generic_plan  23k     44k                 50k


As you can see force_custom_plan slowdowns both explicitly and 
autoprepared statements.
Unfortunately generic plans are not working well with partitioned table 
because disabling partition pruning.
At my system result of your query execution is the following:

     plan_cache_mode     simple  simple(autoprepare) prepare
     auto                232     220                 219
     force_custom_plan   234     175                 211
     force_generic_plan  230     48                  48


The conclusion is that forcing generic plan can cause slowdown of 
queries on partitioned tables.
If plan cache mode is not enforced, then standard Postgres strategy of 
comparing efficiency of generic and custom plans
works well.

Attached please find rebased version of the patch.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:
New version of the patch with fixed error of autopreparing CTE queries.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:
New version of the patching disabling autoprepare for rules and handling 
planner error.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Thomas Munro
Date:
On Wed, Apr 10, 2019 at 12:52 AM Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> New version of the patching disabling autoprepare for rules and handling
> planner error.

Hi Konstantin,

This doesn't apply. Could we please have a fresh rebase for the new Commitfest?

Thanks,

-- 
Thomas Munro
https://enterprisedb.com



Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 01.07.2019 12:51, Thomas Munro wrote:
> On Wed, Apr 10, 2019 at 12:52 AM Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> New version of the patching disabling autoprepare for rules and handling
>> planner error.
> Hi Konstantin,
>
> This doesn't apply. Could we please have a fresh rebase for the new Commitfest?
>
> Thanks,
>
Attached please find rebased version of the patch.
Also this version can be found in autoprepare branch of this repository 
https://github.com/postgrespro/postgresql.builtin_pool.git
on github.

Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Thomas Munro
Date:
On Tue, Jul 2, 2019 at 3:29 AM Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> Attached please find rebased version of the patch.
> Also this version can be found in autoprepare branch of this repository
> https://github.com/postgrespro/postgresql.builtin_pool.git
> on github.

Thanks.  I haven't looked at the code but this seems like interesting
work and I hope it will get some review.  I guess this is bound to use
a lot of memory.  I guess we'd eventually want to figure out how to
share the autoprepared plan cache between sessions, which is obviously
a whole can of worms.

A couple of trivial comments with my CF manager hat on:

1.  Can you please fix the documentation?  It doesn't build.
Obviously reviewing the goals, design and implementation are more
important than the documentation at this point, but if that is fixed
then the CF bot will be able to run check-world every day and we might
learn something about the code.
2.  Accidental editor junk included:  src/include/catalog/pg_proc.dat.~1~


--
Thomas Munro
https://enterprisedb.com



Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 08.07.2019 2:23, Thomas Munro wrote:
> On Tue, Jul 2, 2019 at 3:29 AM Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> Attached please find rebased version of the patch.
>> Also this version can be found in autoprepare branch of this repository
>> https://github.com/postgrespro/postgresql.builtin_pool.git
>> on github.
> Thanks.  I haven't looked at the code but this seems like interesting
> work and I hope it will get some review.  I guess this is bound to use
> a lot of memory.  I guess we'd eventually want to figure out how to
> share the autoprepared plan cache between sessions, which is obviously
> a whole can of worms.
>
> A couple of trivial comments with my CF manager hat on:
>
> 1.  Can you please fix the documentation?  It doesn't build.
> Obviously reviewing the goals, design and implementation are more
> important than the documentation at this point, but if that is fixed
> then the CF bot will be able to run check-world every day and we might
> learn something about the code.
> 2.  Accidental editor junk included:  src/include/catalog/pg_proc.dat.~1~


Sorry, are you tests autoprepare-16.patch I have sent in the last e-mail?
I can not reproduce the problem with building documentation:

knizhnik@xps:~/postgresql/doc$ make
make -C ../src/backend generated-headers
make[1]: Entering directory '/home/knizhnik/postgresql/src/backend'
make -C catalog distprep generated-header-symlinks
make[2]: Entering directory '/home/knizhnik/postgresql/src/backend/catalog'
make[2]: Nothing to be done for 'distprep'.
make[2]: Nothing to be done for 'generated-header-symlinks'.
make[2]: Leaving directory '/home/knizhnik/postgresql/src/backend/catalog'
make -C utils distprep generated-header-symlinks
make[2]: Entering directory '/home/knizhnik/postgresql/src/backend/utils'
make[2]: Nothing to be done for 'distprep'.
make[2]: Nothing to be done for 'generated-header-symlinks'.
make[2]: Leaving directory '/home/knizhnik/postgresql/src/backend/utils'
make[1]: Leaving directory '/home/knizhnik/postgresql/src/backend'
make -C src all
make[1]: Entering directory '/home/knizhnik/postgresql/doc/src'
make -C sgml all
make[2]: Entering directory '/home/knizhnik/postgresql/doc/src/sgml'
/usr/bin/xmllint --path . --noout --valid postgres.sgml
/usr/bin/xsltproc --path . --stringparam pg.version '12devel' 
stylesheet.xsl postgres.sgml
cp ./stylesheet.css html/
touch html-stamp
/usr/bin/xmllint --path . --noout --valid postgres.sgml
/usr/bin/xsltproc --path . --stringparam pg.version '12devel' 
stylesheet-man.xsl postgres.sgml
touch man-stamp
make[2]: Leaving directory '/home/knizhnik/postgresql/doc/src/sgml'
make[1]: Leaving directory '/home/knizhnik/postgresql/doc/src'


Also autoporepare-16.patch doesn't include any junk

src/include/catalog/pg_proc.dat.~1~





Re: [HACKERS] Cached plans and statement generalization

From
Thomas Munro
Date:
On Tue, Jul 9, 2019 at 7:32 AM Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> Sorry, are you tests autoprepare-16.patch I have sent in the last e-mail?
> I can not reproduce the problem with building documentation:

+      <term><varname>autoprepare_threshold</varname> (<type>integer/type>)

The problem is that "<type>integer/type>".  (Missing "<").  There more
than one of those.  Not sure why that doesn't fail for you.

> Also autoporepare-16.patch doesn't include any junk
>
> src/include/catalog/pg_proc.dat.~1~

Oops, yeah, sorry for the noise.  I did something stupid in my apply script.

-- 
Thomas Munro
https://enterprisedb.com



Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 09.07.2019 15:16, Thomas Munro wrote:
> On Tue, Jul 9, 2019 at 7:32 AM Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> Sorry, are you tests autoprepare-16.patch I have sent in the last e-mail?
>> I can not reproduce the problem with building documentation:
> +      <term><varname>autoprepare_threshold</varname> (<type>integer/type>)
>
> The problem is that "<type>integer/type>".  (Missing "<").  There more
> than one of those.  Not sure why that doesn't fail for you.
> or the noise.  I did something stupid in my apply script.
>

Sorry, there were really several syntax problems.
I didn't understand why I have not noticed them before.
Fixed patch version of the path is attached.

Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Heikki Linnakangas
Date:
On 09/07/2019 23:59, Konstantin Knizhnik wrote:
> Fixed patch version of the path is attached.

Much of the patch and the discussion has been around the raw parsing, 
and guessing which literals are actually parameters that have been 
inlined into the SQL text. Do we really need to do that, though? The 
documentation mentions pgbouncer and other connection poolers, where you 
don't have session state, as a use case for this. But such connection 
poolers could and should still be using the extended query protocol, 
with Parse+Bind messages and query parameters, even if they don't use 
named prepared statements. I'd want to encourage applications and 
middleware to use out-of-band query parameters, to avoid SQL injection 
attacks, regardless of whether they use prepared statements or cache 
query plans. So how about dropping all the raw parse tree stuff, and 
doing the automatic caching of plans based on the SQL string, somewhere 
in the exec_parse_message? Check the autoprepare cache in 
exec_parse_message(), if it was an "unnamed" prepared statement, i.e. if 
the prepared statement name is an empty string.

(I'm actually not sure if exec_parse/bind_message is the right place for 
this, but I saw that your current patch did it in exec_simple_query, and 
exec_parse/bind_message are the equivalent of that for the extended 
query protocol).

- Heikki



Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 31.07.2019 19:56, Heikki Linnakangas wrote:
> On 09/07/2019 23:59, Konstantin Knizhnik wrote:
>> Fixed patch version of the path is attached.
>
> Much of the patch and the discussion has been around the raw parsing, 
> and guessing which literals are actually parameters that have been 
> inlined into the SQL text. Do we really need to do that, though? The 
> documentation mentions pgbouncer and other connection poolers, where 
> you don't have session state, as a use case for this. But such 
> connection poolers could and should still be using the extended query 
> protocol, with Parse+Bind messages and query parameters, even if they 
> don't use named prepared statements. I'd want to encourage 
> applications and middleware to use out-of-band query parameters, to 
> avoid SQL injection attacks, regardless of whether they use prepared 
> statements or cache query plans. So how about dropping all the raw 
> parse tree stuff, and doing the automatic caching of plans based on 
> the SQL string, somewhere in the exec_parse_message? Check the 
> autoprepare cache in exec_parse_message(), if it was an "unnamed" 
> prepared statement, i.e. if the prepared statement name is an empty 
> string.
>
> (I'm actually not sure if exec_parse/bind_message is the right place 
> for this, but I saw that your current patch did it in 
> exec_simple_query, and exec_parse/bind_message are the equivalent of 
> that for the extended query protocol).

It will significantly simplify this patch and eliminate all complexity 
and troubles  caused by replacing string literals with parameters
if assumption that all client applications are using extended query 
protocol is true.
But I am not sure that we can expect it. At least I myself see many 
applications which are constructing queries by embedding literal values.
May be it is not so good and safe (SQL injection attack), but many 
applications are doing it. And the idea was to improve execution speed
of existed application without changing them.


Also please notice that extended protocol requires passing more message 
which has negative effect on performance.
At my  notebook I get about 21k TPS on "pgbench -S" and 18k TPS on 
"pgbench -M extended -S".
And it is with unix socket connection! I think that in case of remote 
connections difference will be even larger.

So may be committing simple version of this patch which do not need to 
solve any challenged problems is good idea.
But I afraid that it will significantly reduce positive effect of this 
patch.



Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 31.07.2019 19:56, Heikki Linnakangas wrote:
> On 09/07/2019 23:59, Konstantin Knizhnik wrote:
>> Fixed patch version of the path is attached.
>
> Much of the patch and the discussion has been around the raw parsing, 
> and guessing which literals are actually parameters that have been 
> inlined into the SQL text. Do we really need to do that, though? The 
> documentation mentions pgbouncer and other connection poolers, where 
> you don't have session state, as a use case for this. But such 
> connection poolers could and should still be using the extended query 
> protocol, with Parse+Bind messages and query parameters, even if they 
> don't use named prepared statements. I'd want to encourage 
> applications and middleware to use out-of-band query parameters, to 
> avoid SQL injection attacks, regardless of whether they use prepared 
> statements or cache query plans. So how about dropping all the raw 
> parse tree stuff, and doing the automatic caching of plans based on 
> the SQL string, somewhere in the exec_parse_message? Check the 
> autoprepare cache in exec_parse_message(), if it was an "unnamed" 
> prepared statement, i.e. if the prepared statement name is an empty 
> string.
>
> (I'm actually not sure if exec_parse/bind_message is the right place 
> for this, but I saw that your current patch did it in 
> exec_simple_query, and exec_parse/bind_message are the equivalent of 
> that for the extended query protocol).
>
> - Heikki

I decided to implement your proposal. Much simple version of autoprepare 
patch is attached.
At my computer I got the following results:

  pgbench -M simple -S         22495 TPS
  pgbench -M extended -S    47633 TPS
  pgbench -M prepared -S    47683 TPS


So autoprepare speedup execution of queries sent using extended protocol 
more than twice and it is almost the same as with explicitly prepared 
statements.
I failed to create regression test for it because I do not know how to 
force psql to use extended protocol. Any advice is welcome.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Daniel Migowski
Date:
Am 01.08.2019 um 18:56 schrieb Konstantin Knizhnik:
> I decided to implement your proposal. Much simple version of 
> autoprepare patch is attached.
> At my computer I got the following results:
>
>  pgbench -M simple -S         22495 TPS
>  pgbench -M extended -S    47633 TPS
>  pgbench -M prepared -S    47683 TPS
>
>
> So autoprepare speedup execution of queries sent using extended 
> protocol more than twice and it is almost the same as with explicitly 
> prepared statements.
> I failed to create regression test for it because I do not know how to 
> force psql to use extended protocol. Any advice is welcome.

I am very interested in such a patch, because currently I use the same 
functionality within my JDBC driver and having this directly in 
PostgreSQL would surely speed things up.

I have two suggestions however:

1. Please allow to gather information about the autoprepared statements 
by returning them in pg_prepared_statements view. I would love to 
monitor usage of them as well as the memory consumption that occurs. I 
suggested a patch to display that in 
https://www.postgresql.org/message-id/41ED3F5450C90F4D8381BC4D8DF6BBDCF02E10B4@EXCHANGESERVER.ikoffice.de

2. Please not only use a LRU list, but maybe something which would 
prefer statements that get reused at all. We often create ad hoc 
statements with parameters which don't really need to be kept. Maybe I 
can suggest an implementation of an LRU list where a reusal of a 
statement not only pulls it to the top of the list but also increases a 
reuse counter. When then a statement would drop off the end of the list 
one checks if the reusal count is non-zero, and only really drops it if 
the resual count is zero. Else the reusal count is decremented (or 
halved) and the element is also placed at the start of the LRU list, so 
it is kept a bit longer.

Thanks,

Daniel Migowski





Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 02.08.2019 11:25, Daniel Migowski wrote:
> Am 01.08.2019 um 18:56 schrieb Konstantin Knizhnik:
>> I decided to implement your proposal. Much simple version of 
>> autoprepare patch is attached.
>> At my computer I got the following results:
>>
>>  pgbench -M simple -S         22495 TPS
>>  pgbench -M extended -S    47633 TPS
>>  pgbench -M prepared -S    47683 TPS
>>
>>
>> So autoprepare speedup execution of queries sent using extended 
>> protocol more than twice and it is almost the same as with explicitly 
>> prepared statements.
>> I failed to create regression test for it because I do not know how 
>> to force psql to use extended protocol. Any advice is welcome.
>
> I am very interested in such a patch, because currently I use the same 
> functionality within my JDBC driver and having this directly in 
> PostgreSQL would surely speed things up.
>
> I have two suggestions however:
>
> 1. Please allow to gather information about the autoprepared 
> statements by returning them in pg_prepared_statements view. I would 
> love to monitor usage of them as well as the memory consumption that 
> occurs. I suggested a patch to display that in 
> https://www.postgresql.org/message-id/41ED3F5450C90F4D8381BC4D8DF6BBDCF02E10B4@EXCHANGESERVER.ikoffice.de
>
Sorry, but there is pg_autoprepared_statements view. I think that it 
will be better to distinguish explicitly and implicitly prepared 
statements, will not it?
Do you want to add more information to this view? Right now it shows 
query string, types of parameters and number of this query execution.

> 2. Please not only use a LRU list, but maybe something which would 
> prefer statements that get reused at all. We often create ad hoc 
> statements with parameters which don't really need to be kept. Maybe I 
> can suggest an implementation of an LRU list where a reusal of a 
> statement not only pulls it to the top of the list but also increases 
> a reuse counter. When then a statement would drop off the end of the 
> list one checks if the reusal count is non-zero, and only really drops 
> it if the resual count is zero. Else the reusal count is decremented 
> (or halved) and the element is also placed at the start of the LRU 
> list, so it is kept a bit longer.
>
There are many variations of LRU and alternatives: clock, segmented LRU, ...
I agree that classical LRU may be not the best replacement policy for 
autoprepare.
Application is either use static queries, either constructing them 
dynamically. It will be better to distinguish queries executed at least 
twice from one-shot queries.
So may be SLRU will be the best choice. But actually situation you have 
described (We often create ad hoc statements with parameters which don't 
really need to be kept)
is not applicable to atutoprepare. Autoprepare deals only with 
statements which are actually executed.

We have another patch which limits number of stored prepared plans to 
avoid memory overflow (in case of using stored procedures which 
implicitly prepare all issued queries).
Here choice of replacement policy is more critical.



> Thanks,
>
> Daniel Migowski
>
>
>
>

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Cached plans and statement generalization

From
Daniel Migowski
Date:
Am 02.08.2019 um 10:57 schrieb Konstantin Knizhnik:
> On 02.08.2019 11:25, Daniel Migowski wrote:
>>
>> I have two suggestions however:
>>
>> 1. Please allow to gather information about the autoprepared 
>> statements by returning them in pg_prepared_statements view. I would 
>> love to monitor usage of them as well as the memory consumption that 
>> occurs. I suggested a patch to display that in 
>> https://www.postgresql.org/message-id/41ED3F5450C90F4D8381BC4D8DF6BBDCF02E10B4@EXCHANGESERVER.ikoffice.de 
>>
>>
> Sorry, but there is pg_autoprepared_statements view. I think that it 
> will be better to distinguish explicitly and implicitly prepared 
> statements, will not it?
> Do you want to add more information to this view? Right now it shows 
> query string, types of parameters and number of this query execution.
My sorry, I didn't notice it in the patch. If there is another view for 
those that's perfectly fine.
>> 2. Please not only use a LRU list, but maybe something which would 
>> prefer statements that get reused at all. We often create ad hoc 
>> statements with parameters which don't really need to be kept. Maybe 
>> I can suggest an implementation of an LRU list where a reusal of a 
>> statement not only pulls it to the top of the list but also increases 
>> a reuse counter. When then a statement would drop off the end of the 
>> list one checks if the reusal count is non-zero, and only really 
>> drops it if the resual count is zero. Else the reusal count is 
>> decremented (or halved) and the element is also placed at the start 
>> of the LRU list, so it is kept a bit longer.
>>
> There are many variations of LRU and alternatives: clock, segmented 
> LRU, ...
> I agree that classical LRU may be not the best replacement policy for 
> autoprepare.
> Application is either use static queries, either constructing them 
> dynamically. It will be better to distinguish queries executed at 
> least twice from one-shot queries.
> So may be SLRU will be the best choice. But actually situation you 
> have described (We often create ad hoc statements with parameters 
> which don't really need to be kept)
> is not applicable to atutoprepare. Autoprepare deals only with 
> statements which are actually executed.
>
> We have another patch which limits number of stored prepared plans to 
> avoid memory overflow (in case of using stored procedures which 
> implicitly prepare all issued queries).
> Here choice of replacement policy is more critical.

Alright, just wanted to have something better than a LRU, so it is not a 
stepback from the implementation in the JDBC driver for us.

Kind regards,
Daniel Migowski




Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 01.08.2019 19:56, Konstantin Knizhnik wrote:
>
>
> On 31.07.2019 19:56, Heikki Linnakangas wrote:
>> On 09/07/2019 23:59, Konstantin Knizhnik wrote:
>>> Fixed patch version of the path is attached.
>>
>> Much of the patch and the discussion has been around the raw parsing, 
>> and guessing which literals are actually parameters that have been 
>> inlined into the SQL text. Do we really need to do that, though? The 
>> documentation mentions pgbouncer and other connection poolers, where 
>> you don't have session state, as a use case for this. But such 
>> connection poolers could and should still be using the extended query 
>> protocol, with Parse+Bind messages and query parameters, even if they 
>> don't use named prepared statements. I'd want to encourage 
>> applications and middleware to use out-of-band query parameters, to 
>> avoid SQL injection attacks, regardless of whether they use prepared 
>> statements or cache query plans. So how about dropping all the raw 
>> parse tree stuff, and doing the automatic caching of plans based on 
>> the SQL string, somewhere in the exec_parse_message? Check the 
>> autoprepare cache in exec_parse_message(), if it was an "unnamed" 
>> prepared statement, i.e. if the prepared statement name is an empty 
>> string.
>>
>> (I'm actually not sure if exec_parse/bind_message is the right place 
>> for this, but I saw that your current patch did it in 
>> exec_simple_query, and exec_parse/bind_message are the equivalent of 
>> that for the extended query protocol).
>>
>> - Heikki
>
> I decided to implement your proposal. Much simple version of 
> autoprepare patch is attached.
> At my computer I got the following results:
>
>  pgbench -M simple -S         22495 TPS
>  pgbench -M extended -S    47633 TPS
>  pgbench -M prepared -S    47683 TPS
>
>
> So autoprepare speedup execution of queries sent using extended 
> protocol more than twice and it is almost the same as with explicitly 
> prepared statements.
> I failed to create regression test for it because I do not know how to 
> force psql to use extended protocol. Any advice is welcome.
>

Slightly improved and rebased version of the patch.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Alvaro Herrera
Date:
This patch fails the "rules" tests. Please fix.


-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 25.09.2019 23:06, Alvaro Herrera wrote:
> This patch fails the "rules" tests. Please fix.
>
>
Sorry,
New version of the patch with corrected expected output for rules test 
is attached.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Michael Paquier
Date:
On Thu, Sep 26, 2019 at 10:23:38AM +0300, Konstantin Knizhnik wrote:
> Sorry,
> New version of the patch with corrected expected output for rules test is
> attached.

It looks like the documentation is failing to build.  Could you fix
that?  There may be other issues as well.  I have moved the patch to
next CF, waiting on author.
--
Michael

Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Konstantin Knizhnik
Date:

On 01.12.2019 6:26, Michael Paquier wrote:
> On Thu, Sep 26, 2019 at 10:23:38AM +0300, Konstantin Knizhnik wrote:
>> Sorry,
>> New version of the patch with corrected expected output for rules test is
>> attached.
> It looks like the documentation is failing to build.  Could you fix
> that?  There may be other issues as well.  I have moved the patch to
> next CF, waiting on author.
> --
> Michael

Sorry, issues with documentation are fixed.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: [HACKERS] Cached plans and statement generalization

From
Tom Lane
Date:
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
> [ autoprepare-extended-4.patch ]

The cfbot is showing that this doesn't apply anymore; there's
some doubtless-trivial conflict in prepare.c.

However ... TBH I've been skeptical of this whole proposal from the
beginning, on the grounds that it would probably hurt more use-cases
than it helps.  The latest approach doesn't really do anything to
assuage that fear, because now that you've restricted it to extended
query mode, the feature amounts to nothing more than deliberately
overriding the application's choice to use unnamed rather than named
(prepared) statements.  How often is that going to be a good idea?
And when it is, wouldn't it be better to fix the app?  The client is
likely to have a far better idea of which statements would benefit
from this treatment than the server will; and in this context,
the client-side changes needed would really be trivial.  The original
proposal, scary as it was, at least supported the argument that you
could use it to improve applications that were too dumb/hoary to
parameterize commands for themselves.

I'm also unimpressed by benchmark testing that relies entirely on
pgbench's default scenario, because that scenario consists entirely
of queries that are perfectly adapted to plan-only-once treatment.
In the real world, we constantly see complaints about cases where the
plancache mistakenly decides that a generic plan is acceptable.  I think
that extending that behavior to more cases is going to be a net loss,
until we find some way to make it smarter and more reliable.  At the
very least, you need to show some worst-case numbers alongside these
best-case numbers --- what happens with queries where we conclude we
must replan every time, so that the plancache becomes pure overhead?

The pgbench test case is laughably unrealistic in another way, in that
there are so few distinct queries it issues, so that there's no
stress at all on the cache-management aspect of this problem.

In short, I really think we ought to reject this patch and move on.
Maybe it could be resurrected sometime in the future when we have a
better handle on when to cache plans and when not.

If you want to press forward with it anyway, certainly the lack of
any tests in this patch is another big objection.  Perhaps you
could create a pgbench TAP script that exercises the logic.

            regards, tom lane



Re: [HACKERS] Cached plans and statement generalization

From
Daniel Gustafsson
Date:
> On 1 Mar 2020, at 20:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> In short, I really think we ought to reject this patch and move on.
> Maybe it could be resurrected sometime in the future when we have a
> better handle on when to cache plans and when not.
>
> If you want to press forward with it anyway, certainly the lack of
> any tests in this patch is another big objection.  Perhaps you
> could create a pgbench TAP script that exercises the logic.

Based on Tom's review, and that nothing has been submitted since, I've marked
this entry as returned with feedback.  Feel to open a new entry if you want to
address Tom's comments and take this further.

cheers ./daniel


Re: [HACKERS] Cached plans and statement generalization

From
Юрий Соколов
Date:
вс, 1 мар. 2020 г. в 22:26, Tom Lane <tgl@sss.pgh.pa.us>:
>
> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
> > [ autoprepare-extended-4.patch ]
>
> The cfbot is showing that this doesn't apply anymore; there's
> some doubtless-trivial conflict in prepare.c.
>
> However ... TBH I've been skeptical of this whole proposal from the
> beginning, on the grounds that it would probably hurt more use-cases
> than it helps.  The latest approach doesn't really do anything to
> assuage that fear, because now that you've restricted it to extended
> query mode, the feature amounts to nothing more than deliberately
> overriding the application's choice to use unnamed rather than named
> (prepared) statements.  How often is that going to be a good idea?
> And when it is, wouldn't it be better to fix the app?  The client is
> likely to have a far better idea of which statements would benefit
> from this treatment than the server will; and in this context,
> the client-side changes needed would really be trivial.  The original
> proposal, scary as it was, at least supported the argument that you
> could use it to improve applications that were too dumb/hoary to
> parameterize commands for themselves.

The theme of this thread:
- it is not possible to reliably "fix the app" when pgbouncer or internal
  driver connection multiplexing are used.
- another widely spread case is frameworks (ORM or other).
  There is no way to prepare a concrete query because it is buried under
  levels of abstractions.

Whole "autoprepare" thing is a workaround for absence of "really trivial
client-side changes" in these conditions.

regards,
Yura