Thread: [HACKERS] Cached plans and statement generalization
Hi hackers, There were a lot of discussions about query plan caching in hackers mailing list, but I failed to find some clear answer for my question and the current consensus on this question in Postgres community. As far as I understand current state is the following: 1. We have per-connection prepared statements. 2. Queries executed inside plpgsql code are implicitly prepared. It is not always possible or convenient to use prepared statements. For example, if pgbouncer is used to perform connection pooling. Another use case (which is actually the problem I am trying to solve now) is partitioning. Efficient execution of query to partitioned table requires hardcoded value for partitioning key. Only in this case optimizer will be able to construct efficient query plan which access only affected tables (partitions). My small benchmark for distributed partitioned table based on pg_pathman + postgres_fdw shows 3 times degrade of performance in case of using prepared statements. But without prepared statements substantial amount of time is spent in query compilation and planning. I was be able to speed up benchmark more than two time by sending prepared queries directly to the remote nodes. So what I am thinking now is implicit query caching. If the same query with different literal values is repeated many times, then we can try to generalize this query and replace it with prepared query with parameters. I am not considering now shared query cache: is seems to be much harder to implement. But local caching of generalized queries seems to be not so difficult to implement and requires not so much changes in Postgres code. And it can be useful not only for sharding, but for many other cases where prepared statements can not be used. I wonder if such option was already considered and if it was for some reasons rejected: can you point me at this reasons? -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hi, Konstantin!
On Mon, Apr 24, 2017 at 11:46 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
There were a lot of discussions about query plan caching in hackers mailing list, but I failed to find some clear answer for my question and the current consensus on this question in Postgres community. As far as I understand current state is the following:
1. We have per-connection prepared statements.
2. Queries executed inside plpgsql code are implicitly prepared.
It is not always possible or convenient to use prepared statements.
For example, if pgbouncer is used to perform connection pooling.
Another use case (which is actually the problem I am trying to solve now) is partitioning.
Efficient execution of query to partitioned table requires hardcoded value for partitioning key.
Only in this case optimizer will be able to construct efficient query plan which access only affected tables (partitions).
My small benchmark for distributed partitioned table based on pg_pathman + postgres_fdw shows 3 times degrade of performance in case of using prepared statements.
But without prepared statements substantial amount of time is spent in query compilation and planning. I was be able to speed up benchmark more than two time by
sending prepared queries directly to the remote nodes.
I don't think it's correct to ask PostgreSQL hackers about problem which arises with pg_pathman while pg_pathman is an extension supported by Postgres Pro.
Since we have declarative partitioning committed to 10, I think that community should address this issue in the context of declarative partitioning.
However, it's unlikely we can spot this issue with declarative partitioning because it still uses very inefficient constraint exclusion mechanism. Thus, issues you are writing about would become visible on declarative partitioning only when constraint exclusion would be replaced with something more efficient.
Long story short, could you reproduce this issue without pg_pathman?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 24.04.2017 13:24, Alexander Korotkov wrote:
Hi, Konstantin!On Mon, Apr 24, 2017 at 11:46 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:There were a lot of discussions about query plan caching in hackers mailing list, but I failed to find some clear answer for my question and the current consensus on this question in Postgres community. As far as I understand current state is the following:
1. We have per-connection prepared statements.
2. Queries executed inside plpgsql code are implicitly prepared.
It is not always possible or convenient to use prepared statements.
For example, if pgbouncer is used to perform connection pooling.
Another use case (which is actually the problem I am trying to solve now) is partitioning.
Efficient execution of query to partitioned table requires hardcoded value for partitioning key.
Only in this case optimizer will be able to construct efficient query plan which access only affected tables (partitions).
My small benchmark for distributed partitioned table based on pg_pathman + postgres_fdw shows 3 times degrade of performance in case of using prepared statements.
But without prepared statements substantial amount of time is spent in query compilation and planning. I was be able to speed up benchmark more than two time by
sending prepared queries directly to the remote nodes.I don't think it's correct to ask PostgreSQL hackers about problem which arises with pg_pathman while pg_pathman is an extension supported by Postgres Pro.Since we have declarative partitioning committed to 10, I think that community should address this issue in the context of declarative partitioning.However, it's unlikely we can spot this issue with declarative partitioning because it still uses very inefficient constraint exclusion mechanism. Thus, issues you are writing about would become visible on declarative partitioning only when constraint exclusion would be replaced with something more efficient.Long story short, could you reproduce this issue without pg_pathman?
Sorry, I have mentioned pg_pathman just as example.
The same problems takes place with partitioning based on standard Postgres inheritance mechanism (when I manually create derived tables and specify constraints for them).
I didn't test yet declarative partitioning committed to 10, but I expect the that it will also suffer from this problem (because is based on inheritance).
But as I wrote, I think that the problem with plan caching is wider and is not bounded just to partitioning.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hi, On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote: > So what I am thinking now is implicit query caching. If the same query with > different literal values is repeated many times, then we can try to > generalize this query and replace it with prepared query with > parameters. That's not actuall all that easy: - You pretty much do parse analysis to be able to do an accurate match. How much overhead is parse analysis vs. planningin your cases? - The invalidation infrastructure for this, if not tied to to fully parse-analyzed statements, is going to be hell. - Migrating to parameters can actually cause significant slowdowns, not nice if that happens implicitly. Greetings, Andres Freund
On 24.04.2017 21:43, Andres Freund wrote:
Well, first of all I want to share results I already get: pgbench with default parameters, scale 10 and one connection:
So autoprepare is as efficient as explicit prepare and can increase performance almost two times.
My current implementation is replacing with parameters only string literals in the query, i.e. select * from T where x='123'; -> select * from T where x=$1;
It greatly simplifies matching of parameters - it is just necessary to locate '\'' character and then correctly handle pairs of quotes.
Handling of integer and real literals is really challenged task.
One source of problems is negation: it is not so easy to correctly understand whether minus should be treated as part of literal or as operator:
(-1), (1-1), (1-1)-1
Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1".
Fully correct substitution can be done by first performing parsing the query, then transform parse tree, replacing literal nodes with parameter nodes and finally deparse tree into generalized query. postgres_fdw already contains such deparse code. It can be moved to postgres core and reused for autoprepare (and may be somewhere else).
But in this case overhead will be much higher.
I still think that query parsing time is significantly smaller than time needed for building and optimizing query execution plan.
But it should be measured if community will be interested in such approach.
There is obvious question: how I managed to get this pgbench results if currently only substitution of string literals is supported and queries constructed by pgbench don't contain string literals? I just made small patch in pgbench replaceVariable method wrapping value's representation in quotes. It has almost no impact on performance (3482 TPS vs. 3492 TPS),
but allows autoprepare to deal with pgbench queries.
I attached my patch to this mail. It is just first version of the patch (based on REL9_6_STABLE branch) just to illustrate proposed approach.
I will be glad to receive any comments and if such optimization is considered to be useful, I will continue work on this patch.
--
Hi, On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:So what I am thinking now is implicit query caching. If the same query with different literal values is repeated many times, then we can try to generalize this query and replace it with prepared query with parameters.That's not actuall all that easy: - You pretty much do parse analysis to be able to do an accurate match. How much overhead is parse analysis vs. planning in your cases? - The invalidation infrastructure for this, if not tied to to fully parse-analyzed statements, is going to be hell. - Migrating to parameters can actually cause significant slowdowns, not nice if that happens implicitly.
Well, first of all I want to share results I already get: pgbench with default parameters, scale 10 and one connection:
protocol | TPS |
simple | 3492 |
extended | 2927 |
prepared | 6865 |
simple + autoprepare | 6844 |
So autoprepare is as efficient as explicit prepare and can increase performance almost two times.
My current implementation is replacing with parameters only string literals in the query, i.e. select * from T where x='123'; -> select * from T where x=$1;
It greatly simplifies matching of parameters - it is just necessary to locate '\'' character and then correctly handle pairs of quotes.
Handling of integer and real literals is really challenged task.
One source of problems is negation: it is not so easy to correctly understand whether minus should be treated as part of literal or as operator:
(-1), (1-1), (1-1)-1
Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1".
Fully correct substitution can be done by first performing parsing the query, then transform parse tree, replacing literal nodes with parameter nodes and finally deparse tree into generalized query. postgres_fdw already contains such deparse code. It can be moved to postgres core and reused for autoprepare (and may be somewhere else).
But in this case overhead will be much higher.
I still think that query parsing time is significantly smaller than time needed for building and optimizing query execution plan.
But it should be measured if community will be interested in such approach.
There is obvious question: how I managed to get this pgbench results if currently only substitution of string literals is supported and queries constructed by pgbench don't contain string literals? I just made small patch in pgbench replaceVariable method wrapping value's representation in quotes. It has almost no impact on performance (3482 TPS vs. 3492 TPS),
but allows autoprepare to deal with pgbench queries.
I attached my patch to this mail. It is just first version of the patch (based on REL9_6_STABLE branch) just to illustrate proposed approach.
I will be glad to receive any comments and if such optimization is considered to be useful, I will continue work on this patch.
--
Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
You will also need to deal with modifiers in types such as VARCHAR(10). Not sure if there are specific functions which can only deal with literals (?) as well.On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1”.
Doug Doole did this work in DB2 LUW and he may be able to point to more places to watch out for semantically.
Generally, in my experience, this feature is very valuable when dealing with (poorly designed) web apps that just glue together strings.
Protecting it under a GUC would allow to only do the work if it’s deemed likely to help.
Another rule I find useful is to abort any efforts to substitute literals if any bind variable is found in the query.
That can be used as a cue that the author of the SQL left the remaining literals in on purpose.
A follow up feature would be to formalize different flavors of peeking.
I.e. can you produce a generic plan, but still recruit the initial set of bind values/substituted literals to dos costing?
Cheers
Serge Rielau
PS: FWIW, I like this feature.
On 25.04.2017 19:12, Serge Rielau wrote:
You will also need to deal with modifiers in types such as VARCHAR(10). Not sure if there are specific functions which can only deal with literals (?) as well.On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1”.
Sorry, I do not completely understand how presence of type modifiers can affect string literals used in query.
Can you provide me some example?
Doug Doole did this work in DB2 LUW and he may be able to point to more places to watch out for semantically.Generally, in my experience, this feature is very valuable when dealing with (poorly designed) web apps that just glue together strings.
I do not think that this optimization will be useful only for poorly designed application.
I already pointed on two use cases where prepapred statements can not be used:
1. pgbouncer without session-level pooling.
2. partitioning
Here situation is the same as for explicitly prepared statements, isn't it?Protecting it under a GUC would allow to only do the work if it’s deemed likely to help.Another rule I find useful is to abort any efforts to substitute literals if any bind variable is found in the query.That can be used as a cue that the author of the SQL left the remaining literals in on purpose.A follow up feature would be to formalize different flavors of peeking.I.e. can you produce a generic plan, but still recruit the initial set of bind values/substituted literals to dos costing?
Sometimes it is preferrable to use specialized plan rather than generic plan.
I am not sure if postgres now is able to do it.
CheersSerge RielauPS: FWIW, I like this feature.
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote: > On 24.04.2017 21:43, Andres Freund wrote: > > Hi, > > > > On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote: > > > So what I am thinking now is implicit query caching. If the same query with > > > different literal values is repeated many times, then we can try to > > > generalize this query and replace it with prepared query with > > > parameters. > > That's not actuall all that easy: > > - You pretty much do parse analysis to be able to do an accurate match. > > How much overhead is parse analysis vs. planning in your cases? > > - The invalidation infrastructure for this, if not tied to to fully > > parse-analyzed statements, is going to be hell. > > - Migrating to parameters can actually cause significant slowdowns, not > > nice if that happens implicitly. > > Well, first of all I want to share results I already get: pgbench with > default parameters, scale 10 and one connection: > > protocol > TPS > simple > 3492 > extended > 2927 > prepared > 6865 > simple + autoprepare > 6844 If this is string mashing on the unparsed query, as it appears to be, it's going to be a perennial source of security issues. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Apr 25, 2017 at 9:45 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
On 25.04.2017 19:12, Serge Rielau wrote:Sorry, I do not completely understand how presence of type modifiers can affect string literals used in query.You will also need to deal with modifiers in types such as VARCHAR(10). Not sure if there are specific functions which can only deal with literals (?) as well.On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1”.
Can you provide me some example?
SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;
You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.
Also some OLAP syntax like “rows preceding”
It pretty much boils down to whether you can do some shallow parsing rather than expending the effort to build the parse tree.
Cheers
Serge
On 04/25/2017 07:54 PM, David Fetter wrote: > On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote: >> On 24.04.2017 21:43, Andres Freund wrote: >>> Hi, >>> >>> On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote: >>>> So what I am thinking now is implicit query caching. If the same query with >>>> different literal values is repeated many times, then we can try to >>>> generalize this query and replace it with prepared query with >>>> parameters. >>> That's not actuall all that easy: >>> - You pretty much do parse analysis to be able to do an accurate match. >>> How much overhead is parse analysis vs. planning in your cases? >>> - The invalidation infrastructure for this, if not tied to to fully >>> parse-analyzed statements, is going to be hell. >>> - Migrating to parameters can actually cause significant slowdowns, not >>> nice if that happens implicitly. >> Well, first of all I want to share results I already get: pgbench with >> default parameters, scale 10 and one connection: >> >> protocol >> TPS >> simple >> 3492 >> extended >> 2927 >> prepared >> 6865 >> simple + autoprepare >> 6844 > If this is string mashing on the unparsed query, as it appears to be, > it's going to be a perennial source of security issues. Sorry, may be I missed something, but I can not understand how security can be violated by extracting string literals fromquery. I am just copying bytes from one buffer to another. I do not try to somehow interpret this parameters. What Iam doing is very similar with standard prepared statements. And moreover query is parsed! Only query which was already parsed and executed (but with different values of parameters)can be autoprepared. > > Best, > David. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 04/25/2017 08:09 PM, Serge Rielau wrote:
On Tue, Apr 25, 2017 at 9:45 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:On 25.04.2017 19:12, Serge Rielau wrote:Sorry, I do not completely understand how presence of type modifiers can affect string literals used in query.You will also need to deal with modifiers in types such as VARCHAR(10). Not sure if there are specific functions which can only deal with literals (?) as well.On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1”.
Can you provide me some example?SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.
I am substituting only string literals. So the query above will be transformed to
SELECT $1::CHAR(10) || $2, 5 + 6;
What's wrong with it?
Also some OLAP syntax like “rows preceding”It pretty much boils down to whether you can do some shallow parsing rather than expending the effort to build the parse tree.CheersSerge
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.
I am substituting only string literals. So the query above will be transformed to
SELECT $1::CHAR(10) || $2, 5 + 6;
What's wrong with it?
Oh, well that leaves a lot of opportunities on the table, doesn’t it?
Cheers
Serge
On 04/25/2017 11:40 PM, Serge Rielau wrote:
On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.
I am substituting only string literals. So the query above will be transformed to
SELECT $1::CHAR(10) || $2, 5 + 6;
What's wrong with it?Oh, well that leaves a lot of opportunities on the table, doesn’t it?
Well, actually my primary intention was not to make badly designed programs (not using prepared statements) work faster.
I wanted to address cases when it is not possible to use prepared statements.
If we want to substitute with parameters as much literals as possible, then parse+deparse tree seems to be the only reasonable approach.
I will try to implement it also, just to estimate parsing overhead.
CheersSerge
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
When I did this in DB2, I didn't use the parser - it was too expensive. I just tokenized the statement and used some simple rules to bypass the invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd disallow replacement replacement until I hit the end of the current subquery or statement.
There are a few limitations to this approach. For example, DB2 allowed you to cast using function notation like VARCHAR(foo, 10). This meant I would never replace the second parameter of any VARCHAR function. Now it's possible that when the statement was fully compiled we'd find that VARCHAR(foo,10) actually resolved to BOB.VARCHAR() instead of the built-in cast function. Our thinking that these cases were rare enough that we wouldn't worry about them. (Of course, PostgreSQL's ::VARCHAR(10) syntax avoids this problem completely.)
Because SQL is so structured, the implementation ended up being quite simple (a few hundred line of code) with no significant maintenance issues. (Other developers had no problem adding in new cases where constants had to be preserved.)
The simple tokenizer was also fairly extensible. I'd prototyped using the same code to also normalize statements (uppercase all keywords, collapse whitespace to a single blank, etc.) but that feature was never added to the product.
- Doug
On Tue, Apr 25, 2017 at 1:47 PM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
On 04/25/2017 11:40 PM, Serge Rielau wrote:On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.
I am substituting only string literals. So the query above will be transformed to
SELECT $1::CHAR(10) || $2, 5 + 6;
What's wrong with it?Oh, well that leaves a lot of opportunities on the table, doesn’t it?Well, actually my primary intention was not to make badly designed programs (not using prepared statements) work faster.
I wanted to address cases when it is not possible to use prepared statements.
If we want to substitute with parameters as much literals as possible, then parse+deparse tree seems to be the only reasonable approach.
I will try to implement it also, just to estimate parsing overhead.CheersSerge-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 2017-04-25 21:11:08 +0000, Doug Doole wrote: > When I did this in DB2, I didn't use the parser - it was too expensive. I > just tokenized the statement and used some simple rules to bypass the > invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd > disallow replacement replacement until I hit the end of the current > subquery or statement. How did you manage plan invalidation and such? - Andres
Plan invalidation was no different than for any SQL statement. DB2 keeps a list of the objects the statement depends on. If any of the objects changes in an incompatible way the plan is invalidated and kicked out of the cache.
I suspect what is more interesting is plan lookup. DB2 has something called the "compilation environment". This is a collection of everything that impacts how a statement is compiled (SQL path, optimization level, etc.). Plan lookup is done using both the statement text and the compilation environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your path is ANDRES, MYTEAM, SYSIBM we will have different compilation environments. If we both issue "SELECT * FROM T" we'll end up with different cache entries even if T in both of our statements resolves to MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then execute "SELECT * FROM T" again, I have a new compilation environment so the second invocation of the statement will create a new entry in the cache. The first entry is not kicked out - it will still be there for re-use if I change my SQL path back to my original value (modulo LRU for cache memory management of course).
With literal replacement, the cache entry is on the modified statement text. Given the modified statement text and the compilation environment, you're guaranteed to get the right plan entry.
On Tue, Apr 25, 2017 at 2:47 PM Andres Freund <andres@anarazel.de> wrote:
On 2017-04-25 21:11:08 +0000, Doug Doole wrote:
> When I did this in DB2, I didn't use the parser - it was too expensive. I
> just tokenized the statement and used some simple rules to bypass the
> invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
> disallow replacement replacement until I hit the end of the current
> subquery or statement.
How did you manage plan invalidation and such?
- Andres
On Tue, Apr 25, 2017 at 11:35:21PM +0300, Konstantin Knizhnik wrote: > On 04/25/2017 07:54 PM, David Fetter wrote: > > On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote: > > > On 24.04.2017 21:43, Andres Freund wrote: > > > > Hi, > > > > > > > > On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote: > > > > > So what I am thinking now is implicit query caching. If the same query with > > > > > different literal values is repeated many times, then we can try to > > > > > generalize this query and replace it with prepared query with > > > > > parameters. > > > > That's not actuall all that easy: > > > > - You pretty much do parse analysis to be able to do an accurate match. > > > > How much overhead is parse analysis vs. planning in your cases? > > > > - The invalidation infrastructure for this, if not tied to to fully > > > > parse-analyzed statements, is going to be hell. > > > > - Migrating to parameters can actually cause significant slowdowns, not > > > > nice if that happens implicitly. > > > Well, first of all I want to share results I already get: pgbench with > > > default parameters, scale 10 and one connection: > > > > > > protocol > > > TPS > > > simple > > > 3492 > > > extended > > > 2927 > > > prepared > > > 6865 > > > simple + autoprepare > > > 6844 > > If this is string mashing on the unparsed query, as it appears to be, > > it's going to be a perennial source of security issues. > > Sorry, may be I missed something, but I can not understand how > security can be violated by extracting string literals from query. I > am just copying bytes from one buffer to another. I do not try to > somehow interpret this parameters. What I am doing is very similar > with standard prepared statements. And moreover query is parsed! > Only query which was already parsed and executed (but with different > values of parameters) can be autoprepared. I don't have an exploit yet. What concerns me is attackers' access to what is in essence the ability to poke at RULEs when they only have privileges to read. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Hi, (FWIW, on this list we don't do top-quotes) On 2017-04-25 22:21:22 +0000, Doug Doole wrote: > Plan invalidation was no different than for any SQL statement. DB2 keeps a > list of the objects the statement depends on. If any of the objects changes > in an incompatible way the plan is invalidated and kicked out of the cache. > > I suspect what is more interesting is plan lookup. DB2 has something called > the "compilation environment". This is a collection of everything that > impacts how a statement is compiled (SQL path, optimization level, etc.). > Plan lookup is done using both the statement text and the compilation > environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your > path is ANDRES, MYTEAM, SYSIBM we will have different compilation > environments. If we both issue "SELECT * FROM T" we'll end up with > different cache entries even if T in both of our statements resolves to > MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then > execute "SELECT * FROM T" again, I have a new compilation environment so > the second invocation of the statement will create a new entry in the > cache. The first entry is not kicked out - it will still be there for > re-use if I change my SQL path back to my original value (modulo LRU for > cache memory management of course). It's not always that simple, at least in postgres, unless you disregard search_path. Consider e.g. cases like CREATE SCHEMA a; CREATE SCHEMA b; CREATE TABLE a.foobar(somecol int); SET search_patch = 'b,a'; SELECT * FROM foobar; CREATE TABLE b.foobar(anothercol int); SELECT * FROM foobar; -- may not be cached plan from before! it sounds - my memory of DB2 is very faint, and I never used it much - like similar issues could arise in DB2 too? Greetings, Andres Freund
what is in essence the ability to poke at RULEs when they only haveI don't have an exploit yet. What concerns me is attackers' access to
privileges to read.
If they want to see how it works they can read the source code. In terms of runtime data it would limited to whatever the session itself created. In most cases the presence of the cache would be invisible. I suppose it might appear if one were to explain a query, reset the session, explain another query and then re-explain the original. If the chosen plan in the second pass differed because of the presence of the leading query it would be noticeable but not revealing. Albeit I'm a far cry from a security expert...
David J.
(FWIW, on this list we don't do top-quotes)
I know. Forgot and just did "reply all". My bad.
It's not always that simple, at least in postgres, unless you disregard
search_path. Consider e.g. cases like
CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!
it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?
DB2 does handle this case. Unfortunately I don't know the details of how it worked though.
A naive option would be to invalidate anything that depends on table or view *.FOOBAR. You could probably make it a bit smarter by also requiring that schema A appear in the path.
- Doug
On 4/25/17, 6:34 PM, "pgsql-hackers-owner@postgresql.org on behalf of Andres Freund" <pgsql-hackers-owner@postgresql.orgon behalf of andres@anarazel.de> wrote: It's not always that simple, at least in postgres, unless you disregard search_path. Consider e.g. cases like CREATESCHEMA a; CREATE SCHEMA b; CREATE TABLE a.foobar(somecol int); SET search_patch = 'b,a'; SELECT * FROM foobar; CREATE TABLE b.foobar(anothercol int); SELECT * FROM foobar; -- may not be cached plan from before! it sounds- my memory of DB2 is very faint, and I never used it much - like similar issues could arise in DB2 too? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changesto your subscription: http://www.postgresql.org/mailpref/pgsql-hackers You may need to store the reloid of each relation in the cached query to ensure that the schema bindings are the same, andalso invalidate dependent cache entries if a referenced relation is dropped. Regards, Jim Finnerty
via Newton Mail
On Tue, Apr 25, 2017 at 3:48 PM, Doug Doole <ddoole@salesforce.com> wrote:
It's not always that simple, at least in postgres, unless you disregard
search_path. Consider e.g. cases like
CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!
it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?DB2 does handle this case. Unfortunately I don't know the details of how it worked though.A naive option would be to invalidate anything that depends on table or view *.FOOBAR. You could probably make it a bit smarter by also requiring that schema A appear in the path.
While this specific scenario does not arise in DB2 since it uses CURRENT SCHEMA only for tables (much to my dislike) your examples holds for functions and types which are resolved by path.
For encapsulated SQL (in views, functions) conservative semantics are enforced via including the timestamp. For dynamic SQL the problem you describe does exist though and I think it is handled in the way Doug describes.
However, as noted by Doug the topic of plan invalidation is really orthogonal to normalizing the queries. All it does is provide more opportunities to run into any pre-existing bugs.
Cheers
Serge
PS: I’m just starting to look at the plan invalidation code in PG because we are dealing with potentially 10s of thousands of cached SQL statements. So these complete wipe outs or walks of every plan in the cache don’t scale.
From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Konstantin > Knizhnik > Well, first of all I want to share results I already get: pgbench with default > parameters, scale 10 and one connection: > > So autoprepare is as efficient as explicit prepare and can increase > performance almost two times. This sounds great. BTW, when are the autoprepared statements destroyed? Are you considering some upper limit on the number of prepared statements? Regards Takayuki Tsunakawa
On 26.04.2017 00:47, Andres Freund wrote: > On 2017-04-25 21:11:08 +0000, Doug Doole wrote: >> When I did this in DB2, I didn't use the parser - it was too expensive. I >> just tokenized the statement and used some simple rules to bypass the >> invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd >> disallow replacement replacement until I hit the end of the current >> subquery or statement. > How did you manage plan invalidation and such? The same mechanism as for prepared statements. Cached plans are linked in the list by SaveCachedPlan function and are invalidated by PlanCacheRelCallback. > - Andres > > -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 26.04.2017 01:34, Andres Freund wrote: > Hi, > > (FWIW, on this list we don't do top-quotes) > > On 2017-04-25 22:21:22 +0000, Doug Doole wrote: >> Plan invalidation was no different than for any SQL statement. DB2 keeps a >> list of the objects the statement depends on. If any of the objects changes >> in an incompatible way the plan is invalidated and kicked out of the cache. >> >> I suspect what is more interesting is plan lookup. DB2 has something called >> the "compilation environment". This is a collection of everything that >> impacts how a statement is compiled (SQL path, optimization level, etc.). >> Plan lookup is done using both the statement text and the compilation >> environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your >> path is ANDRES, MYTEAM, SYSIBM we will have different compilation >> environments. If we both issue "SELECT * FROM T" we'll end up with >> different cache entries even if T in both of our statements resolves to >> MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then >> execute "SELECT * FROM T" again, I have a new compilation environment so >> the second invocation of the statement will create a new entry in the >> cache. The first entry is not kicked out - it will still be there for >> re-use if I change my SQL path back to my original value (modulo LRU for >> cache memory management of course). > It's not always that simple, at least in postgres, unless you disregard > search_path. Consider e.g. cases like > > CREATE SCHEMA a; > CREATE SCHEMA b; > CREATE TABLE a.foobar(somecol int); > SET search_patch = 'b,a'; > SELECT * FROM foobar; > CREATE TABLE b.foobar(anothercol int); > SELECT * FROM foobar; -- may not be cached plan from before! > > it sounds - my memory of DB2 is very faint, and I never used it much - > like similar issues could arise in DB2 too? There is the same problem with explicitly prepared statements, isn't it? Certainly in case of using prepared statements it is responsibility of programmer to avoid such collisions. And in case of autoprepare programmer it is hidden from programming. But there is guc variable controlling autoprepare feature and by default it is switched off. So if programmer or DBA enables it, then them should take in account effects of such decision. By the way, isn't it a bug in PostgreSQL that altering search path is not invalidating cached plans? As I already mentioned, the same problem can be reproduced with explicitly prepared statements. > > Greetings, > > Andres Freund -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 26.04.2017 04:00, Tsunakawa, Takayuki wrote: > From: pgsql-hackers-owner@postgresql.org >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Konstantin >> Knizhnik >> Well, first of all I want to share results I already get: pgbench with default >> parameters, scale 10 and one connection: >> >> So autoprepare is as efficient as explicit prepare and can increase >> performance almost two times. > This sounds great. > > BTW, when are the autoprepared statements destroyed? Right now them are destroyed only in case of receiving invalidation message (when catalog is changed). Prepared statements are local to backend and are located in backend's memory. It is unlikely, that there will be too much different queries which cause memory overflow. But in theory such situation is certainly possible. > Are you considering some upper limit on the number of prepared statements? In this case we need some kind of LRU for maintaining cache of autoprepared statements. I think that it is good idea to have such limited cached - it can avoid memory overflow problem. I will try to implement it. > Regards > Takayuki Tsunakawa > > -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 26.04.2017 10:49, Konstantin Knizhnik wrote:
On 26.04.2017 04:00, Tsunakawa, Takayuki wrote: Are you considering some upper limit on the number of prepared statements?
In this case we need some kind of LRU for maintaining cache of autoprepared statements.
I think that it is good idea to have such limited cached - it can avoid memory overflow problem.
I will try to implement it.
I attach new patch which allows to limit the number of autoprepared statements (autoprepare_limit GUC variable).
Also I did more measurements, now with several concurrent connections and read-only statements.
Results of pgbench with 10 connections, scale 10 and read-only statements are below:
Protocol | TPS |
extended | 87k |
prepared | 209k |
simple+autoprepare | 206k |
As you can see, autoprepare provides more than 2 times speed improvement.
Also I tried to measure overhead of parsing (to be able to substitute all literals, not only string literals).
I just added extra call of pg_parse_query. Speed is reduced to 181k.
So overhead is noticeable, but still making such optimization useful.
This is why I want to ask question: is it better to implement slower but safer and more universal solution?
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
2017-04-26 12:30 GMT+02:00 Konstantin Knizhnik <k.knizhnik@postgrespro.ru>:
On 26.04.2017 10:49, Konstantin Knizhnik wrote:
On 26.04.2017 04:00, Tsunakawa, Takayuki wrote: Are you considering some upper limit on the number of prepared statements?
In this case we need some kind of LRU for maintaining cache of autoprepared statements.
I think that it is good idea to have such limited cached - it can avoid memory overflow problem.
I will try to implement it.
I attach new patch which allows to limit the number of autoprepared statements (autoprepare_limit GUC variable).
Also I did more measurements, now with several concurrent connections and read-only statements.
Results of pgbench with 10 connections, scale 10 and read-only statements are below:
Protocol TPS extended 87k prepared 209k simple+autoprepare 206k
As you can see, autoprepare provides more than 2 times speed improvement.
Also I tried to measure overhead of parsing (to be able to substitute all literals, not only string literals).
I just added extra call of pg_parse_query. Speed is reduced to 181k.
So overhead is noticeable, but still making such optimization useful.
This is why I want to ask question: is it better to implement slower but safer and more universal solution?
Unsafe solution has not any sense, and it is dangerous (80% of database users has not necessary knowledge). If somebody needs the max possible performance, then he use explicit prepared statements.
Regards
Pavel
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
A naive option would be to invalidate anything that depends on table or view *.FOOBAR. You could probably make it a bit smarter by also requiring that schema A appear in the path.
This has been rumbling around in my head. I wonder if you could solve this problem by registering dependencies on objects which don't yet exist. Consider:
CREATE TABLE C.T1(...);
CREATE TABLE C.T2(...);
SET search_path='A,B,C,D';
SELECT * FROM C.T1, T2;
For T1, you'd register a hard dependency on C.T1 and no virtual dependencies since the table is explicitly qualified.
For T2, you'd register a hard dependency on C.T2 since that is the table that was selected for the query. You'd also register virtual dependencies on A.T2 and B.T2 since if either of those tables (or views) are created you need to recompile the statement. (Note that no virtual dependency is created on D.T2() since that table would never be selected by the compiler.)
The catch is that virtual dependencies would have to be recorded and searched as strings, not OIDs since the objects don't exist. Virtual dependencies only have to be checked during CREATE processing though, so that might not be too bad.
But this is getting off topic - I just wanted to capture the idea while it was rumbling around.
On 04/26/2017 08:08 PM, Doug Doole wrote:
A naive option would be to invalidate anything that depends on table or view *.FOOBAR. You could probably make it a bit smarter by also requiring that schema A appear in the path.This has been rumbling around in my head. I wonder if you could solve this problem by registering dependencies on objects which don't yet exist. Consider:CREATE TABLE C.T1(...);CREATE TABLE C.T2(...);SET search_path='A,B,C,D';SELECT * FROM C.T1, T2;For T1, you'd register a hard dependency on C.T1 and no virtual dependencies since the table is explicitly qualified.For T2, you'd register a hard dependency on C.T2 since that is the table that was selected for the query. You'd also register virtual dependencies on A.T2 and B.T2 since if either of those tables (or views) are created you need to recompile the statement. (Note that no virtual dependency is created on D.T2() since that table would never be selected by the compiler.)The catch is that virtual dependencies would have to be recorded and searched as strings, not OIDs since the objects don't exist. Virtual dependencies only have to be checked during CREATE processing though, so that might not be too bad.But this is getting off topic - I just wanted to capture the idea while it was rumbling around.
I think that it will be enough to handle modification of search path and invalidate prepared statements cache in this case.
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 26.04.2017 13:46, Pavel Stehule wrote:
I attach new patch which allows to limit the number of autoprepared statements (autoprepare_limit GUC variable).
Also I did more measurements, now with several concurrent connections and read-only statements.
Results of pgbench with 10 connections, scale 10 and read-only statements are below:
Protocol TPS extended 87k prepared 209k simple+autoprepare 206k
As you can see, autoprepare provides more than 2 times speed improvement.
Also I tried to measure overhead of parsing (to be able to substitute all literals, not only string literals).
I just added extra call of pg_parse_query. Speed is reduced to 181k.
So overhead is noticeable, but still making such optimization useful.
This is why I want to ask question: is it better to implement slower but safer and more universal solution?Unsafe solution has not any sense, and it is dangerous (80% of database users has not necessary knowledge). If somebody needs the max possible performance, then he use explicit prepared statements.
I attached new patch to this mail. I completely reimplement my original approach and now use parse tree transformation.
New pgbench (-S -c 10) results are the following:
Protocol | TPS |
extended | 87k |
prepared | 209k |
simple+autoprepare | 185k |
So there is some slowdown comparing with my original implementation and explicitly prepared statements, but still it provide more than two times speed-up comparing with unprepared queries. And it doesn't require to change existed applications.
As far as most of real production application are working with DBMS through some connection pool (pgbouncer,...), I think that such optimization will be useful.
Isn't it interesting if If we can increase system throughput almost two times by just setting one parameter in configuration file?
I also tried to enable autoprepare by default and run regression tests. 7 tests are not passed because of the following reasons:
1. Slightly different error reporting (for example error location is not always identically specified).
2. Difference in query behavior caused by changed local settings (Andres gives an example with search_path, and date test is failed because of changing datestyle).
3. Problems with indirect dependencies (when table is altered only cached plans directly depending on this relation and invalidated, but not plans with indirect dependencies).
4. Not performing domain checks for null values.
I do not think that this issues can cause problems for real application.
Also it is possible to limit number of autoprepared statements using autoprepare_limit parameter, avoid possible backend memory overflow in case of larger number of unique queries sent by application. LRU discipline is used to drop least recently used plans.
Any comments and suggestions for future improvement of this patch are welcome.
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Fri, Apr 28, 2017 at 6:01 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
+ PG_TRY();
+ {
+ query = parse_analyze_varparams(parse_tree,
+ query_string,
+ ¶m_types,
+ &num_params);
+ }
+ PG_CATCH();
+ {
+ /*
+ * In case of analyze errors revert back to original query processing
+ * and disable autoprepare for this query to avoid such problems in future.
+ */
+ FlushErrorState();
+ if (snapshot_set) {
+ PopActiveSnapshot();
+ }
+ entry->disable_autoprepare = true;
+ undo_query_plan_changes(parse_tree, const_param_list);
+ MemoryContextSwitchTo(old_context);
+ return false;
+ }
+ PG_END_TRY();
--
Any comments and suggestions for future improvement of this patch are welcome.
+ PG_TRY();
+ {
+ query = parse_analyze_varparams(parse_tree,
+ query_string,
+ ¶m_types,
+ &num_params);
+ }
+ PG_CATCH();
+ {
+ /*
+ * In case of analyze errors revert back to original query processing
+ * and disable autoprepare for this query to avoid such problems in future.
+ */
+ FlushErrorState();
+ if (snapshot_set) {
+ PopActiveSnapshot();
+ }
+ entry->disable_autoprepare = true;
+ undo_query_plan_changes(parse_tree, const_param_list);
+ MemoryContextSwitchTo(old_context);
+ return false;
+ }
+ PG_END_TRY();
This is definitely not a safe way of using TRY/CATCH.
+
+ /* Convert literal value to parameter value */
+ switch (const_param->literal->val.type)
+ {
+ /*
+ * Convert from integer literal
+ */
+ case T_Integer:
+ switch (ptype) {
+ case INT8OID:
+ params->params[paramno].value = Int64GetDatum((int64)const_param->literal->val.val.ival);
+ break;
+ case INT4OID:
+ params->params[paramno].value = Int32GetDatum((int32)const_param->literal->val.val.ival);
+ break;
+ case INT2OID:
+ if (const_param->literal->val.val.ival < SHRT_MIN
+ || const_param->literal->val.val.ival > SHRT_MAX)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("smallint out of range")));
+ }
+ params->params[paramno].value = Int16GetDatum((int16)const_param->literal->val.val.ival);
+ break;
+ case FLOAT4OID:
+ params->params[paramno].value = Float4GetDatum((float)const_param->literal->val.val.ival);
+ break;
+ case FLOAT8OID:
+ params->params[paramno].value = Float8GetDatum((double)const_param->literal->val.val.ival);
+ break;
+ case INT4RANGEOID:
+ sprintf(buf, "[%ld,%ld]", const_param->literal->val.val.ival, const_param->literal->val.val.ival);
+ getTypeInputInfo(ptype, &typinput, &typioparam);
+ params->params[paramno].value = OidInputFunctionCall(typinput, buf, typioparam, -1);
+ break;
+ default:
+ pg_lltoa(const_param->literal->val.val.ival, buf);
+ getTypeInputInfo(ptype, &typinput, &typioparam);
+ params->params[paramno].value = OidInputFunctionCall(typinput, buf, typioparam, -1);
+ }
+ break;
+ case T_Null:
+ params->params[paramno].isnull = true;
+ break;
+ default:
+ /*
+ * Convert from string literal
+ */
+ getTypeInputInfo(ptype, &typinput, &typioparam);
+ params->params[paramno].value = OidInputFunctionCall(typinput, const_param->literal->val.val.str, typioparam, -1);
+ }
+
+ /* Convert literal value to parameter value */
+ switch (const_param->literal->val.type)
+ {
+ /*
+ * Convert from integer literal
+ */
+ case T_Integer:
+ switch (ptype) {
+ case INT8OID:
+ params->params[paramno].value = Int64GetDatum((int64)const_param->literal->val.val.ival);
+ break;
+ case INT4OID:
+ params->params[paramno].value = Int32GetDatum((int32)const_param->literal->val.val.ival);
+ break;
+ case INT2OID:
+ if (const_param->literal->val.val.ival < SHRT_MIN
+ || const_param->literal->val.val.ival > SHRT_MAX)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("smallint out of range")));
+ }
+ params->params[paramno].value = Int16GetDatum((int16)const_param->literal->val.val.ival);
+ break;
+ case FLOAT4OID:
+ params->params[paramno].value = Float4GetDatum((float)const_param->literal->val.val.ival);
+ break;
+ case FLOAT8OID:
+ params->params[paramno].value = Float8GetDatum((double)const_param->literal->val.val.ival);
+ break;
+ case INT4RANGEOID:
+ sprintf(buf, "[%ld,%ld]", const_param->literal->val.val.ival, const_param->literal->val.val.ival);
+ getTypeInputInfo(ptype, &typinput, &typioparam);
+ params->params[paramno].value = OidInputFunctionCall(typinput, buf, typioparam, -1);
+ break;
+ default:
+ pg_lltoa(const_param->literal->val.val.ival, buf);
+ getTypeInputInfo(ptype, &typinput, &typioparam);
+ params->params[paramno].value = OidInputFunctionCall(typinput, buf, typioparam, -1);
+ }
+ break;
+ case T_Null:
+ params->params[paramno].isnull = true;
+ break;
+ default:
+ /*
+ * Convert from string literal
+ */
+ getTypeInputInfo(ptype, &typinput, &typioparam);
+ params->params[paramno].value = OidInputFunctionCall(typinput, const_param->literal->val.val.str, typioparam, -1);
+ }
I don't see something with a bunch of hard-coded rules for particular type OIDs having any chance of being acceptable.
This patch seems to duplicate a large amount of existing code. That would be a good thing to avoid.
It could use a visit from the style police and a spell-checker, too.
--
On 01.05.2017 18:52, Robert Haas wrote:
On Fri, Apr 28, 2017 at 6:01 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:Any comments and suggestions for future improvement of this patch are welcome.
+ PG_TRY();
+ {
+ query = parse_analyze_varparams(parse_tree,
+ query_string,
+ ¶m_types,
+ &num_params);
+ }
+ PG_CATCH();
+ {
+ /*
+ * In case of analyze errors revert back to original query processing
+ * and disable autoprepare for this query to avoid such problems in future.
+ */
+ FlushErrorState();
+ if (snapshot_set) {
+ PopActiveSnapshot();
+ }
+ entry->disable_autoprepare = true;
+ undo_query_plan_changes(parse_tree, const_param_list);
+ MemoryContextSwitchTo(old_context);
+ return false;
+ }
+ PG_END_TRY();This is definitely not a safe way of using TRY/CATCH.
+
+ /* Convert literal value to parameter value */
+ switch (const_param->literal->val.type)
+ {
+ /*
+ * Convert from integer literal
+ */
+ case T_Integer:
+ switch (ptype) {
+ case INT8OID:
+ params->params[paramno].value = Int64GetDatum((int64)const_param->literal->val.val.ival);
+ break;
+ case INT4OID:
+ params->params[paramno].value = Int32GetDatum((int32)const_param->literal->val.val.ival);
+ break;
+ case INT2OID:
+ if (const_param->literal->val.val.ival < SHRT_MIN
+ || const_param->literal->val.val.ival > SHRT_MAX)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("smallint out of range")));
+ }
+ params->params[paramno].value = Int16GetDatum((int16)const_param->literal->val.val.ival);
+ break;
+ case FLOAT4OID:
+ params->params[paramno].value = Float4GetDatum((float)const_param->literal->val.val.ival);
+ break;
+ case FLOAT8OID:
+ params->params[paramno].value = Float8GetDatum((double)const_param->literal->val.val.ival);
+ break;
+ case INT4RANGEOID:
+ sprintf(buf, "[%ld,%ld]", const_param->literal->val.val.ival, const_param->literal->val.val.ival);
+ getTypeInputInfo(ptype, &typinput, &typioparam);
+ params->params[paramno].value = OidInputFunctionCall(typinput, buf, typioparam, -1);
+ break;
+ default:
+ pg_lltoa(const_param->literal->val.val.ival, buf);
+ getTypeInputInfo(ptype, &typinput, &typioparam);
+ params->params[paramno].value = OidInputFunctionCall(typinput, buf, typioparam, -1);
+ }
+ break;
+ case T_Null:
+ params->params[paramno].isnull = true;
+ break;
+ default:
+ /*
+ * Convert from string literal
+ */
+ getTypeInputInfo(ptype, &typinput, &typioparam);
+ params->params[paramno].value = OidInputFunctionCall(typinput, const_param->literal->val.val.str, typioparam, -1);
+ }I don't see something with a bunch of hard-coded rules for particular type OIDs having any chance of being acceptable.
Well, what I need is to convert literal value represented in Value struct to parameter datum value.
Struct "value" contains union with integer literal and text.
So this peace of code is just provides efficient handling of most common cases (integer parameters) and uses type's input function in other cases.
This patch seems to duplicate a large amount of existing code. That would be a good thing to avoid.
Yes, I have to copy a lot of code from exec_parse_message + exec_bind_message + exec_execute_message functions.
Definitely copying of code is bad flaw. It will be much better and easier just to call three original functions instead of mixing gathering their code into the new function.
But I failed to do it because
1. Autoprepare should be integrated into exec_simple_query. Before executing query in normal way, I need to perform cache lookup for previously prepared plan for this generalized query.
And generalization of query requires building of query tree (query parsing). In other words, parsing should be done before I can call exec_parse_message.
2. exec_bind_message works with parameters passed by client though libpwq protocol, while autoprepare deals with values of parameters extracted from literals.
3. I do not want to generate dummy name for autoprepared query to handle it as normal prepared statement.
And I can not use unnamed statements because I want lifetime of autoprepared statements will be larger than one statement.
4. I have to use slightly different memory context policy than named or unnamed prepared statements.
Also this three exec_* functions contain prolog/epilog code which is needed because them are serving separate libpq requests.
But in case of autoprepared statements them need to be executed in the context of single libpq message, so most of this code is redundant.
It could use a visit from the style police and a spell-checker, too.
I will definitely fix style and misspelling - I have not submitted yet this patch to commit fest and there is long enough time to next commitfest.
My primary intention of publishing this patch is receive feedback on the proposed approach.
I already got two very useful advices: limit number of cached statements and pay more attention to safety.
This is why I have reimplemented my original approach with substituting string literals with parameters without building parse tree.
Right now I am mostly thinking about two problems:
1. Finding out best criteria of detecting literals which need to be replaced with parameters and which not. It is clear that replacing "limit 10" with "limit $10"
will have not so much sense and can cause worse execution plan. So right now I just ignore sort, group by and limit parts. But may be it is possible to find some more flexible approach.
2. Which type to chose for parameters. I can try to infer type from context (current solution), or try to use type of literal.
The problem with first approach is that query compiler is not always able to do it and even if type can be determined, it may be too generic (for example numeric instead of real
or range instead of integer). The problem with second approach is opposite: type inferred from literal type can be too restrictive - quite often integer literals are used to specify values of floating point constant. The best solution is first try to determine parameter type from context and then refine it based on literal type. But it will require repeat of query analysis.
Not sure if it is possible.
Thanks for your feedback.
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tue, May 2, 2017 at 5:50 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: >> I don't see something with a bunch of hard-coded rules for particular type >> OIDs having any chance of being acceptable. > > Well, what I need is ... Regarding this... > Definitely copying of code is bad flaw. It will be much better and easier > just to call three original functions instead of mixing gathering their code > into the new function. > But I failed to do it because ... ...and also this: I am sympathetic to the fact that this is a hard problem to solve. I'm just telling you that the way you've got it is not acceptable and nobody's going to commit it like that (or if they do, they will end up having to revert it). If you want to have a technical discussion about what might be a way to change the patch to be more acceptable, cool, but I don't want to get into a long debate about whether what you have is acceptable or not; I've already said what I think about that and I believe that opinion will be widely shared. I am not trying to beat you up here, just trying to be clear. > Thanks for your feedback. Sure thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 02.05.2017 21:26, Robert Haas wrote: > > I am sympathetic to the fact that this is a hard problem to solve. > I'm just telling you that the way you've got it is not acceptable and > nobody's going to commit it like that (or if they do, they will end up > having to revert it). If you want to have a technical discussion > about what might be a way to change the patch to be more acceptable, > cool, but I don't want to get into a long debate about whether what > you have is acceptable or not; I've already said what I think about > that and I believe that opinion will be widely shared. I am not > trying to beat you up here, just trying to be clear. > > Based on the Robert's feedback and Tom's proposal I have implemented two new versions of autoprepare patch. First version is just refactoring of my original implementation: I have extracted common code into prepare_cached_plan and exec_prepared_plan function to avoid code duplication. Also I rewrote assignment of values to parameters. Now types of parameters are inferred from types of literals, so there may be several prepared plans which are different only by types of parameters. Due to the problem with type coercion for parameters, I have to catch errors in parse_analyze_varparams. Robert, can you please explain why using TRY/CATCH is not safe here: > This is definitely not a safe way of using TRY/CATCH. Second version is based on Tom's suggestion: > Personally I'd think about > replacing the entire literal-with-cast construct with a Param having > already-known type. So here I first patch raw parse tree, replacing A_Const with ParamRef. Such plan is needed to perform cache lookup. Then I restore original raw parse tree and call pg_analyze_and_rewrite. Then I mutate analyzed tree, replacing Const with Param nodes. In this case type coercion is already performed and I know precise types which should be used for parameters. It seems to be more sophisticated approach. And I can not extract common code in prepare_cached_plan, because preparing of plan is currently mix of steps done in exec_simple_query and exec_parse_message. But there is no need to catch analyze errors. Finally performance of both approaches is the same: at pgbench it is 180k TPS on read-only queries, comparing with 80k TPS for not prepared queries. In both cases 7 out of 177 regression tests are not passed (mostly because session environment is changed between subsequent query execution). I am going to continue work on this patch I will be glad to receive any feedback and suggestions for its improvement. In most cases, applications are not accessing Postgres directly, but using some connection pooling layer and so them are not able to use prepared statements. But at simple OLTP Postgres spent more time on building query plan than on execution itself. And it is possible to speedup Postgres about two times at such workload! Another alternative is true shared plan cache. May be it is even more perspective approach, but definitely much more invasive and harder to implement. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Wed, May 10, 2017 at 07:11:07PM +0300, Konstantin Knizhnik wrote: > I am going to continue work on this patch I will be glad to receive any > feedback and suggestions for its improvement. > In most cases, applications are not accessing Postgres directly, but using > some connection pooling layer and so them are not able to use prepared > statements. > But at simple OLTP Postgres spent more time on building query plan than on > execution itself. And it is possible to speedup Postgres about two times at > such workload! > Another alternative is true shared plan cache. May be it is even more > perspective approach, but definitely much more invasive and harder to > implement. Can we back up and get an overview of what you are doing and how you are doing it? Our TODO list suggests this order for successful patches: Desirability -> Design -> Implement -> Test -> Review -> Commit You kind of started at the Implementation/patch level, which makes it hard to evaluate. I think everyone agrees on the Desirability of the feature, but the Design is the tricky part. I think the design questions are: * What information is stored about cached plans? * How are the cached plans invalidated? * How is a query matched against a cached plan? Looking at the options, ideally the plan would be cached at the same query stage as the stage where the incoming query is checked against the cache. However, caching and checking at the same level offers no benefit, so they are going to be different. For example, caching a parse tree at the time it is created, then checking at the same point if the incoming query is the same doesn't help you because you already had to create the parse tree get to that point. A more concrete example is prepared statements. They are stored at the end of planning and matched in the parser. However, you can easily do that since the incoming query specifies the name of the prepared query, so there is no trick to matching. The desire is to cache as late as possible so you cache more work and you have more detail about the referenced objects, which helps with cache invalidation. However, you also want to do cache matching as early as possible to improve performance. So, let's look at some options. One interesting idea from Doug Doole was to do it between the tokenizer and parser. I think they are glued together so you would need a way to run the tokenizer separately and compare that to the tokens you stored for the cached plan. The larger issue is that prepared plans already are checked after parsing, and we know they are a win, so matching any earlier than that just seems like overkill and likely to lead to lots of problems. So, you could do it after parsing but before parse-analysis, which is kind of what prepared queries do. One tricky problem is that we don't bind the query string tokens to database objects until after parse analysis. Doing matching before parse-analysis is going to be tricky, which is why there are so many comments about the approach. Changing search_path can certainly affect it, but creating objects in earlier-mentioned schemas can also change how an object reference in a query is resolved. Even obscure things like the creation of a new operator that has higher precedence in the query could change the plan, though am not sure if our prepared query system even handles that properly. Anyway, that is my feedback. I would like to get an overview of what you are trying to do and the costs/benefits of each option so we can best guide you. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
One interesting idea from Doug Doole was to do it between the tokenizer and parser. I think they are glued together so you would need a way to run the tokenizer separately and compare that to the tokens you stored for the cached plan.
When I did this, we had the same problem that the tokenizer and parser were tightly coupled. Fortunately, I was able to do as you suggest and run the tokenizer separately to do my analysis.
So my model was to do statement generalization before entering the compiler at all. I would tokenize the statement to find the literals and generate a new statement string with placeholders. The new string would the be passed to the compiler which would then tokenize and parse the reworked statement.
This means we incurred the cost of tokenizing twice, but the tokenizer was lightweight enough that it wasn't a problem. In exchange I was able to do statement generalization without touching the compiler - the compiler saw the generalized statement text as any other statement and handled it in the exact same way. (There was just a bit of new code around variable binding.)
On Thu, May 11, 2017 at 05:39:58PM +0000, Douglas Doole wrote: > One interesting idea from Doug Doole was to do it between the tokenizer and > parser. I think they are glued together so you would need a way to run the > tokenizer separately and compare that to the tokens you stored for the > cached plan. > > > When I did this, we had the same problem that the tokenizer and parser were > tightly coupled. Fortunately, I was able to do as you suggest and run the > tokenizer separately to do my analysis. > > So my model was to do statement generalization before entering the compiler at > all. I would tokenize the statement to find the literals and generate a new > statement string with placeholders. The new string would the be passed to the > compiler which would then tokenize and parse the reworked statement. > > This means we incurred the cost of tokenizing twice, but the tokenizer was > lightweight enough that it wasn't a problem. In exchange I was able to do > statement generalization without touching the compiler - the compiler saw the > generalized statement text as any other statement and handled it in the exact > same way. (There was just a bit of new code around variable binding.) Good point. I think we need to do some measurements to see if the parser-only stage is actually significant. I have a hunch that commercial databases have much heavier parsers than we do. This split would also not work if the scanner feeds changes back into the parser. I know C does that for typedefs but I don't think we do. Ideally I would like to see percentage-of-execution numbers for typical queries for scan, parse, parse-analysis, plan, and execute to see where the wins are. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce Momjian <bruce@momjian.us> writes: > Good point. I think we need to do some measurements to see if the > parser-only stage is actually significant. I have a hunch that > commercial databases have much heavier parsers than we do. FWIW, gram.y does show up as significant in many of the profiles I take. I speculate that this is not so much that it eats many CPU cycles, as that the constant tables are so large as to incur lots of cache misses. scan.l is not quite as big a deal for some reason, even though it's also large. regards, tom lane
On May 11, 2017 11:31:02 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Bruce Momjian <bruce@momjian.us> writes: >> Good point. I think we need to do some measurements to see if the >> parser-only stage is actually significant. I have a hunch that >> commercial databases have much heavier parsers than we do. > >FWIW, gram.y does show up as significant in many of the profiles I >take. >I speculate that this is not so much that it eats many CPU cycles, as >that >the constant tables are so large as to incur lots of cache misses. >scan.l >is not quite as big a deal for some reason, even though it's also >large. And that there's a lot of unpredictable branches, leading to a lot if pipeline stalls. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On 05/11/2017 06:12 PM, Bruce Momjian wrote: > On Wed, May 10, 2017 at 07:11:07PM +0300, Konstantin Knizhnik wrote: >> I am going to continue work on this patch I will be glad to receive any >> feedback and suggestions for its improvement. >> In most cases, applications are not accessing Postgres directly, but using >> some connection pooling layer and so them are not able to use prepared >> statements. >> But at simple OLTP Postgres spent more time on building query plan than on >> execution itself. And it is possible to speedup Postgres about two times at >> such workload! >> Another alternative is true shared plan cache. May be it is even more >> perspective approach, but definitely much more invasive and harder to >> implement. > Can we back up and get an overview of what you are doing and how you are > doing it? Our TODO list suggests this order for successful patches: > > Desirability -> Design -> Implement -> Test -> Review -> Commit > > You kind of started at the Implementation/patch level, which makes it > hard to evaluate. > > I think everyone agrees on the Desirability of the feature, but the > Design is the tricky part. I think the design questions are: > > * What information is stored about cached plans? > * How are the cached plans invalidated? > * How is a query matched against a cached plan? > > Looking at the options, ideally the plan would be cached at the same > query stage as the stage where the incoming query is checked against the > cache. However, caching and checking at the same level offers no > benefit, so they are going to be different. For example, caching a > parse tree at the time it is created, then checking at the same point if > the incoming query is the same doesn't help you because you already had > to create the parse tree get to that point. > > A more concrete example is prepared statements. They are stored at the > end of planning and matched in the parser. However, you can easily do > that since the incoming query specifies the name of the prepared query, > so there is no trick to matching. > > The desire is to cache as late as possible so you cache more work and > you have more detail about the referenced objects, which helps with > cache invalidation. However, you also want to do cache matching as > early as possible to improve performance. > > So, let's look at some options. One interesting idea from Doug Doole > was to do it between the tokenizer and parser. I think they are glued > together so you would need a way to run the tokenizer separately and > compare that to the tokens you stored for the cached plan. The larger > issue is that prepared plans already are checked after parsing, and we > know they are a win, so matching any earlier than that just seems like > overkill and likely to lead to lots of problems. > > So, you could do it after parsing but before parse-analysis, which is > kind of what prepared queries do. One tricky problem is that we don't > bind the query string tokens to database objects until after parse > analysis. > > Doing matching before parse-analysis is going to be tricky, which is why > there are so many comments about the approach. Changing search_path can > certainly affect it, but creating objects in earlier-mentioned schemas > can also change how an object reference in a query is resolved. Even > obscure things like the creation of a new operator that has higher > precedence in the query could change the plan, though am not sure if > our prepared query system even handles that properly. > > Anyway, that is my feedback. I would like to get an overview of what > you are trying to do and the costs/benefits of each option so we can > best guide you. > Sorry, for luck of overview. I have started with small prototype just to investigate if such optimization makes sense or not. When I get more than two time advantage in performance on standard pgbench, I come to conclusion that this optimization can be really very useful and now try to find the best way of its implementation. I have started with simplest approach when string literals are replaced with parameters. It is done before parsing. And can be done very fast - just need to locate data in quotes. But this approach is not safe and universal: you will not be able to speedup most of the existed queries without rewritingthem. This is why I have provided second implementation which replace literals with parameters after raw parsing. Certainly it is slower than first approach. But still provide significant advantage in performance: more than two times atpgbench. Then I tried to run regression tests and find several situations where type analysis is not correctly performed in case ofreplacing literals with parameters. So my third attempt is to replace constant nodes with parameters in already analyzed tree. Now answering your questions: * What information is stored about cached plans? Key to locate cached plan is raw parse tree. Value is saved CachedPlanSource. * How are the cached plans invalidated? In the same way as plans for explicitly prepared statements. * How is a query matched against a cached plan? Hash function is calculated for raw parse tree and equal() function is used to compare raw plans with literal nodes replacedwith parameters. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 05/11/2017 09:31 PM, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: >> Good point. I think we need to do some measurements to see if the >> parser-only stage is actually significant. I have a hunch that >> commercial databases have much heavier parsers than we do. > FWIW, gram.y does show up as significant in many of the profiles I take. > I speculate that this is not so much that it eats many CPU cycles, as that > the constant tables are so large as to incur lots of cache misses. scan.l > is not quite as big a deal for some reason, even though it's also large. > > regards, tom lane Yes, my results shows that pg_parse_query adds not so much overhead: 206k TPS for my first variant with string literal substitution and modified query text used as hash key vs. 181k. TPS for version with patching raw parse tree constructed by pg_parse_query. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 2017-05-11 22:48:26 +0300, Konstantin Knizhnik wrote: > On 05/11/2017 09:31 PM, Tom Lane wrote: > > Bruce Momjian <bruce@momjian.us> writes: > > > Good point. I think we need to do some measurements to see if the > > > parser-only stage is actually significant. I have a hunch that > > > commercial databases have much heavier parsers than we do. > > FWIW, gram.y does show up as significant in many of the profiles I take. > > I speculate that this is not so much that it eats many CPU cycles, as that > > the constant tables are so large as to incur lots of cache misses. scan.l > > is not quite as big a deal for some reason, even though it's also large. > > > > regards, tom lane > Yes, my results shows that pg_parse_query adds not so much overhead: > 206k TPS for my first variant with string literal substitution and modified query text used as hash key vs. > 181k. TPS for version with patching raw parse tree constructed by pg_parse_query. Those numbers and your statement seem to contradict each other? - Andres
On 05/11/2017 10:52 PM, Andres Freund wrote: > On 2017-05-11 22:48:26 +0300, Konstantin Knizhnik wrote: >> On 05/11/2017 09:31 PM, Tom Lane wrote: >>> Bruce Momjian <bruce@momjian.us> writes: >>>> Good point. I think we need to do some measurements to see if the >>>> parser-only stage is actually significant. I have a hunch that >>>> commercial databases have much heavier parsers than we do. >>> FWIW, gram.y does show up as significant in many of the profiles I take. >>> I speculate that this is not so much that it eats many CPU cycles, as that >>> the constant tables are so large as to incur lots of cache misses. scan.l >>> is not quite as big a deal for some reason, even though it's also large. >>> >>> regards, tom lane >> Yes, my results shows that pg_parse_query adds not so much overhead: >> 206k TPS for my first variant with string literal substitution and modified query text used as hash key vs. >> 181k. TPS for version with patching raw parse tree constructed by pg_parse_query. > Those numbers and your statement seem to contradict each other? Oops, my parse error:( I incorrectly read Tom's statement. Actually, I also was afraid that price of parsing is large enough and this is why my first attempt was to avoid parsing. But then I find out that most of the time is spent in analyze and planning (see attached profile): pg_parse_query: 4.23% pg_analyze_and_rewrite 8.45% pg_plan_queries: 15.49% > > - Andres -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, May 11, 2017 at 10:41:45PM +0300, Konstantin Knizhnik wrote: > This is why I have provided second implementation which replace > literals with parameters after raw parsing. Certainly it is slower > than first approach. But still provide significant advantage in > performance: more than two times at pgbench. Then I tried to run > regression tests and find several situations where type analysis is > not correctly performed in case of replacing literals with parameters. So the issue is that per-command output from the parser, SelectStmt, only has strings for identifers, e.g. table and column names, so you can't be sure it is the same as the cached entry you matched. I suppose if you cleared the cache every time someone created an object or changed search_path, it might work. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On 12.05.2017 03:58, Bruce Momjian wrote: > On Thu, May 11, 2017 at 10:41:45PM +0300, Konstantin Knizhnik wrote: >> This is why I have provided second implementation which replace >> literals with parameters after raw parsing. Certainly it is slower >> than first approach. But still provide significant advantage in >> performance: more than two times at pgbench. Then I tried to run >> regression tests and find several situations where type analysis is >> not correctly performed in case of replacing literals with parameters. > So the issue is that per-command output from the parser, SelectStmt, > only has strings for identifers, e.g. table and column names, so you > can't be sure it is the same as the cached entry you matched. I suppose > if you cleared the cache every time someone created an object or changed > search_path, it might work. > Definitely changing session context (search_path, date/time format, ...) may cause incorrect behavior of cached statements. Actually you may get the same problem with explicitly prepared statements (certainly, in the last case, you better understand what going on and it is your choice whether to use or not to use prepared statement). The fact of failure of 7 regression tests means that autoprepare can really change behavior of existed program. This is why my suggestion is to switch off this feature by default. But in 99.9% real cases (my estimation plucked out of thin air:) there will be no such problems with autoprepare. And it can significantly improve performance of OLTP applications which are not able to use prepared statements (because of working through pgbouncer or any other reasons). Can autoprepare slow down the system? Yes, it can. It can happen if application perform larger number of unique queries and autoprepare cache size is not limited. In this case large (and infinitely growing) number of stored plans can consume a lot of memory and, what is even worse, slowdown cache lookup. This is why I by default limit number of cached statements (autoprepare_limit parameter) by 100. I am almost sure that there will be some other issues with autoprepare which I have not encountered yet (because I mostly tested it on pgbench and Postgres regression tests). But I am also sure that benefit of doubling system performance is good motivation to continue work in this direction. My main concern is whether to continue to improve current approach with local (per-backend) cache of prepared statements. Or create shared cache (as in Oracle). It is much more difficult to implement shared cache (the same problem with session context, different catalog snapshots, cache invalidation,...) but it also provides more opportunities for queries optimization and tuning. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Fri, May 12, 2017 at 10:50:41AM +0300, Konstantin Knizhnik wrote: > Definitely changing session context (search_path, date/time format, ...) may > cause incorrect behavior of cached statements. I wonder if we should clear the cache whenever any SET command is issued. > Actually you may get the same problem with explicitly prepared statements > (certainly, in the last case, you better understand what going on and it is > your choice whether to use or not to use prepared statement). > > The fact of failure of 7 regression tests means that autoprepare can really > change behavior of existed program. This is why my suggestion is to switch > off this feature by default. I would like to see us target something that can be enabled by default. Even if it only improves performance by 5%, it would be better overall than a feature that improves performance by 90% but is only used by 1% of our users. > But in 99.9% real cases (my estimation plucked out of thin air:) there will > be no such problems with autoprepare. And it can significantly improve > performance of OLTP applications > which are not able to use prepared statements (because of working through > pgbouncer or any other reasons). Right, but we can't ship something that is 99.9% accurate when the inaccuracy is indeterminate. The bottom line is that Postgres has a very high bar, and I realize you are just prototyping at this point, but the final product is going to have to address all the intricacies of the issue for it to be added. > Can autoprepare slow down the system? > Yes, it can. It can happen if application perform larger number of unique > queries and autoprepare cache size is not limited. > In this case large (and infinitely growing) number of stored plans can > consume a lot of memory and, what is even worse, slowdown cache lookup. > This is why I by default limit number of cached statements > (autoprepare_limit parameter) by 100. Yes, good idea. > I am almost sure that there will be some other issues with autoprepare which > I have not encountered yet (because I mostly tested it on pgbench and > Postgres regression tests). > But I am also sure that benefit of doubling system performance is good > motivation to continue work in this direction. Right, you are still testing to see where the edges are. > My main concern is whether to continue to improve current approach with > local (per-backend) cache of prepared statements. > Or create shared cache (as in Oracle). It is much more difficult to > implement shared cache (the same problem with session context, different > catalog snapshots, cache invalidation,...) > but it also provides more opportunities for queries optimization and tuning. I would continue in the per-backend cache direction at this point because we don't even have that solved yet. The global cache is going to be even more complex. Ultimately I think we are going to want global and local caches because the plans of the local cache are much more likely to be accurate. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On 12.05.2017 18:23, Bruce Momjian wrote: > On Fri, May 12, 2017 at 10:50:41AM +0300, Konstantin Knizhnik wrote: >> Definitely changing session context (search_path, date/time format, ...) may >> cause incorrect behavior of cached statements. > I wonder if we should clear the cache whenever any SET command is > issued. Well, with autoprepare cache disabled on each set variable, alter system and any slow utility statement only one regression test is not passed. And only because of different error message context: *** /home/knizhnik/postgresql.master/src/test/regress/expected/date.out 2017-04-11 18:07:56.497461208 +0300 --- /home/knizhnik/postgresql.master/src/test/regress/results/date.out 2017-05-12 20:21:19.767566302 +0300 *************** *** 1443,1452 **** -- SELECT EXTRACT(MICROSEC FROM DATE 'infinity'); -- ERROR: timestamp units "microsec" not recognized ERROR: timestamp units "microsec" not recognized - CONTEXT: SQL function "date_part" statement 1 SELECT EXTRACT(UNDEFINED FROM DATE 'infinity'); -- ERROR: timestamp units "undefined" not supported ERROR: timestamp units "undefined" not supported - CONTEXT: SQL function "date_part" statement 1 -- test constructors select make_date(2013, 7, 15); make_date --- 1443,1450 ---- ====================================================================== >> Actually you may get the same problem with explicitly prepared statements >> (certainly, in the last case, you better understand what going on and it is >> your choice whether to use or not to use prepared statement). >> >> The fact of failure of 7 regression tests means that autoprepare can really >> change behavior of existed program. This is why my suggestion is to switch >> off this feature by default. > I would like to see us target something that can be enabled by default. > Even if it only improves performance by 5%, it would be better overall > than a feature that improves performance by 90% but is only used by 1% > of our users. I have to agree with you here. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On May 12, 2017 12:50:41 AM PDT, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: >Definitely changing session context (search_path, date/time format, >...) >may cause incorrect behavior of cached statements. >Actually you may get the same problem with explicitly prepared >statements (certainly, in the last case, you better understand what >going on and it is your choice whether to use or not to use prepared >statement). > >The fact of failure of 7 regression tests means that autoprepare can >really change behavior of existed program. This is why my suggestion is > >to switch off this feature by default. I can't see us agreeing with this feature unless it actually reliably works, even if disabled by default. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Fri, May 12, 2017 at 08:35:26PM +0300, Konstantin Knizhnik wrote: > > > On 12.05.2017 18:23, Bruce Momjian wrote: > >On Fri, May 12, 2017 at 10:50:41AM +0300, Konstantin Knizhnik wrote: > >>Definitely changing session context (search_path, date/time format, ...) may > >>cause incorrect behavior of cached statements. > >I wonder if we should clear the cache whenever any SET command is > >issued. > > Well, with autoprepare cache disabled on each set variable, alter system and > any slow utility statement > only one regression test is not passed. And only because of different error > message context: Great. We have to think of how we would keep this reliable when future changes happen. Simple is often better because it limits the odds of breakage. > >>Actually you may get the same problem with explicitly prepared statements > >>(certainly, in the last case, you better understand what going on and it is > >>your choice whether to use or not to use prepared statement). > >> > >>The fact of failure of 7 regression tests means that autoprepare can really > >>change behavior of existed program. This is why my suggestion is to switch > >>off this feature by default. > >I would like to see us target something that can be enabled by default. > >Even if it only improves performance by 5%, it would be better overall > >than a feature that improves performance by 90% but is only used by 1% > >of our users. > > I have to agree with you here. Good. I know there are database systems that will try to get the most performance possible no matter how complex it is for users to configure or to maintain, but that isn't us. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Wed, May 10, 2017 at 12:11 PM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > Robert, can you please explain why using TRY/CATCH is not safe here: >> >> This is definitely not a safe way of using TRY/CATCH. This has been discussed many, many times on this mailing list before, and I don't really want to go over it again here. We really need a README or some documentation about this so that we don't have to keep answering this same question again and again. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 15.05.2017 18:31, Robert Haas wrote: > On Wed, May 10, 2017 at 12:11 PM, Konstantin Knizhnik > <k.knizhnik@postgrespro.ru> wrote: >> Robert, can you please explain why using TRY/CATCH is not safe here: >>> This is definitely not a safe way of using TRY/CATCH. > This has been discussed many, many times on this mailing list before, > and I don't really want to go over it again here. We really need a > README or some documentation about this so that we don't have to keep > answering this same question again and again. > First of all I want to notice that new version of my patch is not using PG_TRY/PG_CATCH. But I still want to clarify for myself whats wrong with this constructions. I searched both hackers mailing list archive and world-wide using google but failed to find any references except of sharing non-volatilie variables between try and catch blocks. Can you please point me at the thread where this problem was discussed or just explain in few words the source of the problem? From my own experience I found out that PG_TRY/PG_CATCH mechanism is not providing proper cleanup (unlike C++ exceptions). If there are opened relations, catalog cache entries,... then throwing error will not release them. It will cause no problems if error is handled in PostgresMain which aborts current transaction and releases all resources in any case. But if I want to ignore this error and continue query execution, then warnings about resources leaks can be reported. Is it want you mean by unsafety of PG_TRY/PG_CATCH constructions? -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 2017-05-18 11:57:57 +0300, Konstantin Knizhnik wrote: > From my own experience I found out that PG_TRY/PG_CATCH mechanism is not > providing proper cleanup (unlike C++ exceptions). Right, simply because there's no portable way to transparently do so. Would be possible on elf glibc platforms, but ... > If there are opened relations, catalog cache entries,... then throwing error > will not release them. > It will cause no problems if error is handled in PostgresMain which aborts > current transaction and releases all resources in any case. > But if I want to ignore this error and continue query execution, then > warnings about resources leaks can be reported. > Is it want you mean by unsafety of PG_TRY/PG_CATCH constructions? There's worse than just leaking resources. Everything touching the database might cause persistent corruption if you don't roll back. Consider an insert that failed with a foreign key exception, done from some function. If you ignore that error, the row will still be visible, but the foreign key will be violated. If you want to continue after a PG_CATCH you have to use a subtransaction/savepoint for the PG_TRY contents, like several PLs do. - Andres
On 10.05.2017 19:11, Konstantin Knizhnik wrote: > > Based on the Robert's feedback and Tom's proposal I have implemented > two new versions of autoprepare patch. > > First version is just refactoring of my original implementation: I > have extracted common code into prepare_cached_plan and > exec_prepared_plan > function to avoid code duplication. Also I rewrote assignment of > values to parameters. Now types of parameters are inferred from types > of literals, so there may be several > prepared plans which are different only by types of parameters. Due to > the problem with type coercion for parameters, I have to catch errors > in parse_analyze_varparams. > Attached please find rebased version of the autoprepare patch based on Tom's proposal (perform analyze for tree with constant literals and then replace them with parameters). Also I submitted this patch for the Autum commitfest. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Fri, May 26, 2017 at 3:54 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > Attached please find rebased version of the autoprepare patch based on Tom's > proposal (perform analyze for tree with constant literals and then replace > them with parameters). > Also I submitted this patch for the Autum commitfest. The patch didn't survive the Summer bitrotfest. Could you please rebase it? -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 09.09.2017 06:35, Thomas Munro wrote:
Attached please find rebased version of the patch.On Fri, May 26, 2017 at 3:54 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:Attached please find rebased version of the autoprepare patch based on Tom's proposal (perform analyze for tree with constant literals and then replace them with parameters). Also I submitted this patch for the Autum commitfest.The patch didn't survive the Summer bitrotfest. Could you please rebase it?
There are the updated performance results (pgbench -s 100 -c 1):
protocol (-M) | read-write | read-only (-S) |
simple | 3327 | 19325 |
extended | 2256 | 16908 |
prepared | 6145 | 39326 |
simple+autoprepare | 4950 | 34841 |
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 11.09.2017 12:24, Konstantin Knizhnik wrote:
Attached please find rebased version of the patch.
There are the updated performance results (pgbench -s 100 -c 1):
protocol (-M) read-write read-only (-S) simple 3327 19325 extended 2256 16908 prepared 6145 39326 simple+autoprepare 4950 34841
One more patch passing all regression tests with autoprepare_threshold=1.
I still do not think that it should be switch on by default...
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Wed, Sep 13, 2017 at 2:11 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > One more patch passing all regression tests with autoprepare_threshold=1. > I still do not think that it should be switch on by default... This patch does not apply, and did not get any reviews. So I am moving it to next CF with waiting on author as status. Please provide a rebased version. Tsunakawa-san, you are listed as a reviewer of this patch. If you are not planning to look at it anymore, you may want to remove your name from the related CF entry https://commitfest.postgresql.org/16/1150/. -- Michael
From: Michael Paquier [mailto:michael.paquier@gmail.com] > This patch does not apply, and did not get any reviews. So I am moving it > to next CF with waiting on author as status. Please provide a rebased version. > Tsunakawa-san, you are listed as a reviewer of this patch. If you are not > planning to look at it anymore, you may want to remove your name from the > related CF entry https://commitfest.postgresql.org/16/1150/. Sorry, but I'm planning to review this next month because this proposal could alleviate some problem we faced. Regards Takayuki Tsunakawa
On Thu, Nov 30, 2017 at 11:07 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: > From: Michael Paquier [mailto:michael.paquier@gmail.com] >> This patch does not apply, and did not get any reviews. So I am moving it >> to next CF with waiting on author as status. Please provide a rebased version. >> Tsunakawa-san, you are listed as a reviewer of this patch. If you are not >> planning to look at it anymore, you may want to remove your name from the >> related CF entry https://commitfest.postgresql.org/16/1150/. > > Sorry, but I'm planning to review this next month because this proposal could alleviate some problem we faced. No problem. Thanks for the update. -- Michael
On 30.11.2017 04:59, Michael Paquier wrote: > On Wed, Sep 13, 2017 at 2:11 AM, Konstantin Knizhnik > <k.knizhnik@postgrespro.ru> wrote: >> One more patch passing all regression tests with autoprepare_threshold=1. >> I still do not think that it should be switch on by default... > This patch does not apply, and did not get any reviews. So I am moving > it to next CF with waiting on author as status. Please provide a > rebased version. Tsunakawa-san, you are listed as a reviewer of this > patch. If you are not planning to look at it anymore, you may want to > remove your name from the related CF entry > https://commitfest.postgresql.org/16/1150/. Updated version of the patch is attached. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Greetings, * Konstantin Knizhnik (k.knizhnik@postgrespro.ru) wrote: > On 30.11.2017 04:59, Michael Paquier wrote: > >On Wed, Sep 13, 2017 at 2:11 AM, Konstantin Knizhnik > ><k.knizhnik@postgrespro.ru> wrote: > >>One more patch passing all regression tests with autoprepare_threshold=1. > >>I still do not think that it should be switch on by default... > >This patch does not apply, and did not get any reviews. So I am moving > >it to next CF with waiting on author as status. Please provide a > >rebased version. Tsunakawa-san, you are listed as a reviewer of this > >patch. If you are not planning to look at it anymore, you may want to > >remove your name from the related CF entry > >https://commitfest.postgresql.org/16/1150/. > > Updated version of the patch is attached. This patch appears to apply with just a bit of fuzz and make check passes, so I'm not sure why this is currently marked as 'Waiting for author'. I've updated it to be 'Needs review'. If that's incorrect, feel free to change it back with an explanation. Thanks! Stephen
Attachment
On Sun, Jan 7, 2018 at 11:51 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Konstantin Knizhnik (k.knizhnik@postgrespro.ru) wrote: >> Updated version of the patch is attached. > > This patch appears to apply with just a bit of fuzz and make check > passes, so I'm not sure why this is currently marked as 'Waiting for > author'. > > I've updated it to be 'Needs review'. If that's incorrect, feel free to > change it back with an explanation. Hi Konstantin, /home/travis/build/postgresql-cfbot/postgresql/src/backend/tcop/postgres.c:5249: undefined reference to `PortalGetHeapMemory' That's because commit 0f7c49e85518dd846ccd0a044d49a922b9132983 killed PortalGetHeapMemory. Looks like it needs to be replaced with portal->portalContext. -- Thomas Munro http://www.enterprisedb.com
On 12.01.2018 03:40, Thomas Munro wrote: > On Sun, Jan 7, 2018 at 11:51 AM, Stephen Frost <sfrost@snowman.net> wrote: >> * Konstantin Knizhnik (k.knizhnik@postgrespro.ru) wrote: >>> Updated version of the patch is attached. >> This patch appears to apply with just a bit of fuzz and make check >> passes, so I'm not sure why this is currently marked as 'Waiting for >> author'. >> >> I've updated it to be 'Needs review'. If that's incorrect, feel free to >> change it back with an explanation. > Hi Konstantin, > > /home/travis/build/postgresql-cfbot/postgresql/src/backend/tcop/postgres.c:5249: > undefined reference to `PortalGetHeapMemory' > > That's because commit 0f7c49e85518dd846ccd0a044d49a922b9132983 killed > PortalGetHeapMemory. Looks like it needs to be replaced with > portal->portalContext. > Hi Thomas, Thank you very much for reporting the problem. Rebased version of the patch is attached. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Hi Konstantin, On 1/12/18 7:53 AM, Konstantin Knizhnik wrote: > > > On 12.01.2018 03:40, Thomas Munro wrote: >> On Sun, Jan 7, 2018 at 11:51 AM, Stephen Frost <sfrost@snowman.net> >> wrote: >>> * Konstantin Knizhnik (k.knizhnik@postgrespro.ru) wrote: >>>> Updated version of the patch is attached. >>> This patch appears to apply with just a bit of fuzz and make check >>> passes, so I'm not sure why this is currently marked as 'Waiting for >>> author'. >>> >>> I've updated it to be 'Needs review'. If that's incorrect, feel free to >>> change it back with an explanation. >> Hi Konstantin, >> >> /home/travis/build/postgresql-cfbot/postgresql/src/backend/tcop/postgres.c:5249: >> >> undefined reference to `PortalGetHeapMemory' >> >> That's because commit 0f7c49e85518dd846ccd0a044d49a922b9132983 killed >> PortalGetHeapMemory. Looks like it needs to be replaced with >> portal->portalContext. >> > Hi Thomas, > > Thank you very much for reporting the problem. > Rebased version of the patch is attached. This patch has received no review or comments since last May and appears too complex and invasive for the final CF of PG11. I don't think it makes sense to keep pushing a patch through CFs when it is not getting reviewed. I'm planning to mark this as Returned with Feedback unless there are solid arguments to the contrary. Thanks, -- -David david@pgmasters.net
On 3/2/18 9:26 AM, David Steele wrote: > On 1/12/18 7:53 AM, Konstantin Knizhnik wrote: >> >> >> On 12.01.2018 03:40, Thomas Munro wrote: >>> On Sun, Jan 7, 2018 at 11:51 AM, Stephen Frost <sfrost@snowman.net> >>> wrote: >>>> * Konstantin Knizhnik (k.knizhnik@postgrespro.ru) wrote: >>>>> Updated version of the patch is attached. >>>> This patch appears to apply with just a bit of fuzz and make check >>>> passes, so I'm not sure why this is currently marked as 'Waiting for >>>> author'. >>>> >>>> I've updated it to be 'Needs review'. If that's incorrect, feel free to >>>> change it back with an explanation. >>> Hi Konstantin, >>> >>> /home/travis/build/postgresql-cfbot/postgresql/src/backend/tcop/postgres.c:5249: >>> >>> undefined reference to `PortalGetHeapMemory' >>> >>> That's because commit 0f7c49e85518dd846ccd0a044d49a922b9132983 killed >>> PortalGetHeapMemory. Looks like it needs to be replaced with >>> portal->portalContext. >>> >> Hi Thomas, >> >> Thank you very much for reporting the problem. >> Rebased version of the patch is attached. > > This patch has received no review or comments since last May and appears > too complex and invasive for the final CF of PG11. > > I don't think it makes sense to keep pushing a patch through CFs when it > is not getting reviewed. I'm planning to mark this as Returned with > Feedback unless there are solid arguments to the contrary. Marked as Returned with Feedback. Regards, -- -David david@pgmasters.net
> -----Original Message----- > From: Konstantin Knizhnik [mailto:k.knizhnik@postgrespro.ru] > Sent: Friday, January 12, 2018 9:53 PM > To: Thomas Munro <thomas.munro@enterprisedb.com>; Stephen Frost > <sfrost@snowman.net> > Cc: Michael Paquier <michael.paquier@gmail.com>; PostgreSQL mailing > lists <pgsql-hackers@postgresql.org>; Tsunakawa, Takayuki/綱川 貴之 > <tsunakawa.takay@jp.fujitsu.com> > Subject: Re: [HACKERS] Cached plans and statement generalization > > Thank you very much for reporting the problem. > Rebased version of the patch is attached. Hi Konstantin. I think that this patch excel very much. Because the customer of our company has the demand that migrates from other DB to PostgreSQL, and the problem to have to modify the application program to do prepare in that case occurs. It is possible to solve it by the problem's using this patch. I want to be helping this patch to be committed. Will you participate in the following CF? To review this patch, I verified it. The verified environment is PostgreSQL 11beta2. It is necessary to add "executor/spi.h" and "jit/jit.h" to postgres.c of the patch by the updating of PostgreSQL. Please rebase. 1. I confirmed the influence on the performance by having applied this patch. The result showed the tendency similar to Konstantin. -s:100 -c:8 -t: 10000 read-only simple: 20251 TPS prepare: 29502 TPS simple(autoprepare): 28001 TPS 2. I confirmed the influence on the memory utilization by the length of query that did autoprepare. Short queries have 1 constant. Long queries have 100 constants. This result was shown that preparing long query used the memory more. before prepare:plan cache context: 1032 used prepare 10 short query statement:plan cache context: 15664 used prepare 10 long query statement:plan cache context: 558032 used In this patch, the maximum number of query that can do prepare can be set to autoprepare_limit. However, is it good in this? I think that I can assume the scene in the following. - Application side user: To elicit the performance, they want to specify the number of prepared query. - Operation side user: To prevent the memory from overflowing, they want to set the maximum value of the memory utilization. Therefore, I propose to add the parameter to specify the maximum memory utilization. 3. I confirmed the transition of the amount of the memory when it tried to prepare query of the number that exceeded the value specified for autoprepare_limit. [autoprepare_limit=1 and execute 10 different queries] plan cache context: 1032 used plan cache context: 39832 used plan cache context: 78552 used plan cache context: 117272 used plan cache context: 155952 used plan cache context: 194632 used plan cache context: 233312 used plan cache context: 272032 used plan cache context: 310712 used plan cache context: 349392 used plan cache context: 388072 used I feel the doubt in an increase of the memory utilization when I execute a lot of query though cached query is one (autoprepare_limit=1). This behavior is correct? Best regards, Yamaji
Hi Yamaji, On 31.07.2018 12:12, Yamaji, Ryo wrote: >> -----Original Message----- >> From: Konstantin Knizhnik [mailto:k.knizhnik@postgrespro.ru] >> Sent: Friday, January 12, 2018 9:53 PM >> To: Thomas Munro <thomas.munro@enterprisedb.com>; Stephen Frost >> <sfrost@snowman.net> >> Cc: Michael Paquier <michael.paquier@gmail.com>; PostgreSQL mailing >> lists <pgsql-hackers@postgresql.org>; Tsunakawa, Takayuki/綱川 貴之 >> <tsunakawa.takay@jp.fujitsu.com> >> Subject: Re: [HACKERS] Cached plans and statement generalization >> >> Thank you very much for reporting the problem. >> Rebased version of the patch is attached. > Hi Konstantin. > > I think that this patch excel very much. Because the customer of our > company has the demand that migrates from other DB to PostgreSQL, and > the problem to have to modify the application program to do prepare in > that case occurs. It is possible to solve it by the problem's using this > patch. I want to be helping this patch to be committed. Will you participate > in the following CF? This patch will be included in next release of PgPro EE. Concerning next commit fest - I am not sure. At previous commitfest it was returned with feedback that it "has received no review or comments since last May". May be your review will help to change this situation. > > To review this patch, I verified it. The verified environment is > PostgreSQL 11beta2. It is necessary to add "executor/spi.h" and "jit/jit.h" > to postgres.c of the patch by the updating of PostgreSQL. Please rebase. > > 1. I confirmed the influence on the performance by having applied this patch. > The result showed the tendency similar to Konstantin. > -s:100 -c:8 -t: 10000 read-only > simple: 20251 TPS > prepare: 29502 TPS > simple(autoprepare): 28001 TPS > > 2. I confirmed the influence on the memory utilization by the length of query that did > autoprepare. Short queries have 1 constant. Long queries have 100 constants. > This result was shown that preparing long query used the memory more. > before prepare:plan cache context: 1032 used > prepare 10 short query statement:plan cache context: 15664 used > prepare 10 long query statement:plan cache context: 558032 used > > In this patch, the maximum number of query that can do prepare can be set to autoprepare_limit. > However, is it good in this? I think that I can assume the scene in the following. > - Application side user: To elicit the performance, they want to specify the number of prepared > query. > - Operation side user: To prevent the memory from overflowing, they want to set the maximum value > of the memory utilization. > Therefore, I propose to add the parameter to specify the maximum memory utilization. I agree it may be more useful to limit amount of memory used by prepare queries, rather than number of prepared statements. But it is just more difficult to calculate and maintain (I am not sure that just looking at CacheMemoryContext is enough for it). Also, if working set of queries (frequently repeated queries) doesn't fir in memory, then autoprepare will be almost useless (because with high probability prepared query will be thrown away from the cache before it can be reused). So limiting queries from "application side" seems to be more practical. > 3. I confirmed the transition of the amount of the memory when it tried to prepare query > of the number that exceeded the value specified for autoprepare_limit. > [autoprepare_limit=1 and execute 10 different queries] > plan cache context: 1032 used > plan cache context: 39832 used > plan cache context: 78552 used > plan cache context: 117272 used > plan cache context: 155952 used > plan cache context: 194632 used > plan cache context: 233312 used > plan cache context: 272032 used > plan cache context: 310712 used > plan cache context: 349392 used > plan cache context: 388072 used > > I feel the doubt in an increase of the memory utilization when I execute a lot of > query though cached query is one (autoprepare_limit=1). > This behavior is correct? I will check it.
On 01.08.2018 00:30, Konstantin Knizhnik wrote: > Hi Yamaji, > > > On 31.07.2018 12:12, Yamaji, Ryo wrote: >>> -----Original Message----- >>> From: Konstantin Knizhnik [mailto:k.knizhnik@postgrespro.ru] >>> Sent: Friday, January 12, 2018 9:53 PM >>> To: Thomas Munro <thomas.munro@enterprisedb.com>; Stephen Frost >>> <sfrost@snowman.net> >>> Cc: Michael Paquier <michael.paquier@gmail.com>; PostgreSQL mailing >>> lists <pgsql-hackers@postgresql.org>; Tsunakawa, Takayuki/綱川 貴之 >>> <tsunakawa.takay@jp.fujitsu.com> >>> Subject: Re: [HACKERS] Cached plans and statement generalization >>> >>> Thank you very much for reporting the problem. >>> Rebased version of the patch is attached. >> Hi Konstantin. >> >> I think that this patch excel very much. Because the customer of our >> company has the demand that migrates from other DB to PostgreSQL, and >> the problem to have to modify the application program to do prepare in >> that case occurs. It is possible to solve it by the problem's using this >> patch. I want to be helping this patch to be committed. Will you >> participate >> in the following CF? > > This patch will be included in next release of PgPro EE. > Concerning next commit fest - I am not sure. > At previous commitfest it was returned with feedback that it "has > received no review or comments since last May". > May be your review will help to change this situation. > > >> >> To review this patch, I verified it. The verified environment is >> PostgreSQL 11beta2. It is necessary to add "executor/spi.h" and >> "jit/jit.h" >> to postgres.c of the patch by the updating of PostgreSQL. Please rebase. New version of the patch is attached.
Attachment
On 31.07.2018 12:12, Yamaji, Ryo wrote: > > 3. I confirmed the transition of the amount of the memory when it tried to prepare query > of the number that exceeded the value specified for autoprepare_limit. > [autoprepare_limit=1 and execute 10 different queries] > plan cache context: 1032 used > plan cache context: 39832 used > plan cache context: 78552 used > plan cache context: 117272 used > plan cache context: 155952 used > plan cache context: 194632 used > plan cache context: 233312 used > plan cache context: 272032 used > plan cache context: 310712 used > plan cache context: 349392 used > plan cache context: 388072 used > > I feel the doubt in an increase of the memory utilization when I execute a lot of > query though cached query is one (autoprepare_limit=1). > This behavior is correct? I failed to reproduce the problem. I used the following non-default configuration parameters: autoprepare_limit=1 autoprepare_threshold=1 create dummy database: create table foo(x integer primary key, y integer); insert into foo values (generate_series(1,10000), 0); and run different queries, like: postgres=# select * from foo where x=1; postgres=# select * from foo where x+x=1; postgres=# select * from foo where x+x+x=1; postgres=# select * from foo where x+x+x+x=1; ... and check size of CacheMemoryContext using gdb - it is not increased. Can you please send me your test?
> -----Original Message----- > From: Konstantin Knizhnik [mailto:k.knizhnik@postgrespro.ru] > Sent: Wednesday, August 1, 2018 4:53 PM > To: Yamaji, Ryo/山地 亮 <yamaji.ryo@jp.fujitsu.com> > Cc: PostgreSQL mailing lists <pgsql-hackers@postgresql.org> > Subject: Re: [HACKERS] Cached plans and statement generalization > > I failed to reproduce the problem. > I used the following non-default configuration parameters: > > autoprepare_limit=1 > autoprepare_threshold=1 > > create dummy database: > > create table foo(x integer primary key, y integer); insert into foo values > (generate_series(1,10000), 0); > > and run different queries, like: > > postgres=# select * from foo where x=1; postgres=# select * from foo > where x+x=1; postgres=# select * from foo where x+x+x=1; postgres=# > select * from foo where x+x+x+x=1; ... > > and check size of CacheMemoryContext using gdb - it is not increased. > > Can you please send me your test? I checked not CacheMemoryContext but "plan cache context". Because I think that the memory context that autoprepare mainly uses is "plan cache context". Non-default configuration parameter was used as well as Konstantin. autoprepare_limit=1 autoprepare_threshold=1 The procedure of the test that I did is shown below. create dummy database create table test (key1 integer, key2 integer, ... , key100 integer); insert into test values (1, 2, ... , 100); And, different queries are executed. select key1 from test where key1=1 and key2=2 and ... and key100=100; select key2 from test where key1=1 and key2=2 and ... and key100=100; select key3 from test where key1=1 and key2=2 and ... and key100=100;... And, "plan cache context" was confirmed by using gdb.
On 02.08.2018 08:25, Yamaji, Ryo wrote: >> -----Original Message----- >> From: Konstantin Knizhnik [mailto:k.knizhnik@postgrespro.ru] >> Sent: Wednesday, August 1, 2018 4:53 PM >> To: Yamaji, Ryo/山地 亮 <yamaji.ryo@jp.fujitsu.com> >> Cc: PostgreSQL mailing lists <pgsql-hackers@postgresql.org> >> Subject: Re: [HACKERS] Cached plans and statement generalization >> >> I failed to reproduce the problem. >> I used the following non-default configuration parameters: >> >> autoprepare_limit=1 >> autoprepare_threshold=1 >> >> create dummy database: >> >> create table foo(x integer primary key, y integer); insert into foo values >> (generate_series(1,10000), 0); >> >> and run different queries, like: >> >> postgres=# select * from foo where x=1; postgres=# select * from foo >> where x+x=1; postgres=# select * from foo where x+x+x=1; postgres=# >> select * from foo where x+x+x+x=1; ... >> >> and check size of CacheMemoryContext using gdb - it is not increased. >> >> Can you please send me your test? > > I checked not CacheMemoryContext but "plan cache context". > Because I think that the memory context that autoprepare mainly uses is "plan cache context". > > Non-default configuration parameter was used as well as Konstantin. > autoprepare_limit=1 > autoprepare_threshold=1 > > The procedure of the test that I did is shown below. > > create dummy database > create table test (key1 integer, key2 integer, ... , key100 integer); > insert into test values (1, 2, ... , 100); > > And, different queries are executed. > select key1 from test where key1=1 and key2=2 and ... and key100=100; > select key2 from test where key1=1 and key2=2 and ... and key100=100; > select key3 from test where key1=1 and key2=2 and ... and key100=100;... > > And, "plan cache context" was confirmed by using gdb. > > Thank you. Unfortunately expression_tree_walker is not consistent with copyObject: I tried to use this walker to destroy raw parse tree of autoprepared statement, but looks like some nodes are not visited by expression_tree_walker. I have to create separate memory context for each autoprepared statement. New version of the patch is attached.
Attachment
> -----Original Message----- > From: Konstantin Knizhnik [mailto:k.knizhnik@postgrespro.ru] > Sent: Friday, August 3, 2018 7:02 AM > To: Yamaji, Ryo/山地 亮 <yamaji.ryo@jp.fujitsu.com> > Cc: PostgreSQL mailing lists <pgsql-hackers@postgresql.org> > Subject: Re: [HACKERS] Cached plans and statement generalization > > Thank you. > Unfortunately expression_tree_walker is not consistent with copyObject: > I tried to use this walker to destroy raw parse tree of autoprepared > statement, but looks like some nodes are not visited by > expression_tree_walker. I have to create separate memory context for each > autoprepared statement. > New version of the patch is attached. Thank you for attaching the patch. The improvement to plan cache context by the new patch was confirmed. > This patch will be included in next release of PgPro EE. > Concerning next commit fest - I am not sure. > At previous commitfest it was returned with feedback that it "has received no review or comments since last May". > May be your review will help to change this situation. I want to confirm one point. If I will have reviewed the autoprepare patch, then are you ready to register the patch at commit fest in the near future? I fear that autoprepare patch do not registered at commit fest in the future (for example, you are so busy), and do not applied to PostgreSQL. If you are not ready to register the patch, I think I want to register at commit fest instead of you. > I agree it may be more useful to limit amount of memory used by prepare > queries, rather than number of prepared statements. > But it is just more difficult to calculate and maintain (I am not sure > that just looking at CacheMemoryContext is enough for it). > Also, if working set of queries (frequently repeated queries) doesn't > fir in memory, then autoprepare will be almost useless (because with > high probability > prepared query will be thrown away from the cache before it can be > reused). So limiting queries from "application side" seems to be more > practical. I see. But I fear that autoprepare process uses irregularity amount of memory when autoprepare_limit is specified number of prepared statements. I think that there is scene that autoprepare process use a lot of memory (ex. it need to prepare a lot of long queries), then other processes (ex. other backend process in PostgreSQL or process other than PostgreSQL) cannot use memory. I hope to specify limit amount of memory in the future.
On 07.08.2018 13:02, Yamaji, Ryo wrote: > > I want to confirm one point. > If I will have reviewed the autoprepare patch, then are you ready to register > the patch at commit fest in the near future? I fear that autoprepare patch do > not registered at commit fest in the future (for example, you are so busy), and > do not applied to PostgreSQL. If you are not ready to register the patch, I think > I want to register at commit fest instead of you. I have registered the patch for next commitfest. For some reasons it doesn't find the latest autoprepare-10.patch and still refer to autoprepare-6.patch as the latest attachement. > > >> I agree it may be more useful to limit amount of memory used by prepare >> queries, rather than number of prepared statements. >> But it is just more difficult to calculate and maintain (I am not sure >> that just looking at CacheMemoryContext is enough for it). >> Also, if working set of queries (frequently repeated queries) doesn't >> fir in memory, then autoprepare will be almost useless (because with >> high probability >> prepared query will be thrown away from the cache before it can be >> reused). So limiting queries from "application side" seems to be more >> practical. > I see. But I fear that autoprepare process uses irregularity amount of memory > when autoprepare_limit is specified number of prepared statements. I think > that there is scene that autoprepare process use a lot of memory (ex. it > need to prepare a lot of long queries), then other processes (ex. other > backend process in PostgreSQL or process other than PostgreSQL) cannot use > memory. I hope to specify limit amount of memory in the future. Right now each prepared statement has two memory contexts: one for raw parse tree used as hash table key and another for cached plan itself. May be it will be possible to combine them. To calculate memory consumed by cached plans, it will be necessary to calculate memory usage statistic for all this contexts (which requires traversal of all context's chunks) and sum them. It is much more complex and expensive than current check: (++autoprepare_cached_plans > autoprepare_limit) although I so not think that it will have measurable impact on performance... May be there should be some faster way to estimate memory consumed by prepared statements. So, the current autoprepare_limit allows to limit number of autoprepared statements and prevent memory overflow caused by execution of larger number of different statements. The question is whether we need more precise mechanism which will take in account difference between small and large queries. Definitely simple query can require 10-100 times less memory than complex query. But memory contexts themselves (even with small block size) somehow minimize difference in memory footprint of different queries, because of chunked allocation. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tuesday, August 7, 2018 at 0:36 AM, Konstantin Knizhnik wrote: > I have registered the patch for next commitfest. > For some reasons it doesn't find the latest autoprepare-10.patch and still > refer to autoprepare-6.patch as the latest attachement. I am sorry for the long delay in my response. I confirmed it and become reviewer the patch. I am going to review. > Right now each prepared statement has two memory contexts: one for raw > parse tree used as hash table key and another for cached plan itself. > May be it will be possible to combine them. To calculate memory consumed > by cached plans, it will be necessary to calculate memory usage statistic > for all this contexts (which requires traversal of all context's chunks) > and sum them. It is much more complex and expensive than current check: > (++autoprepare_cached_plans > autoprepare_limit) although I so not think > that it will have measurable impact on performance... > May be there should be some faster way to estimate memory consumed by > prepared statements. > > So, the current autoprepare_limit allows to limit number of autoprepared > statements and prevent memory overflow caused by execution of larger > number of different statements. > The question is whether we need more precise mechanism which will take > in account difference between small and large queries. Definitely simple > query can require 10-100 times less memory than complex query. But memory > contexts themselves (even with small block size) somehow minimize > difference in memory footprint of different queries, because of chunked > allocation. Thank you for telling me how to implement and its problem. In many cases when actually designing a system, we estimate the amount of memory to use and prepare a memory accordingly. Therefore, if we need to estimate memory usage in both patterns that are limit amount of memory used by prepare queries or number of prepared statements, I think that there is no problem with the current implementation. Apart from the patch, if it can implement an application that outputs estimates of memory usage when we enter a query, you may be able to use autoprepare more comfortably. But it sounds difficult..
On 22.08.2018 07:54, Yamaji, Ryo wrote: > On Tuesday, August 7, 2018 at 0:36 AM, Konstantin Knizhnik wrote: > >> I have registered the patch for next commitfest. >> For some reasons it doesn't find the latest autoprepare-10.patch and still >> refer to autoprepare-6.patch as the latest attachement. > I am sorry for the long delay in my response. > I confirmed it and become reviewer the patch. > I am going to review. > > >> Right now each prepared statement has two memory contexts: one for raw >> parse tree used as hash table key and another for cached plan itself. >> May be it will be possible to combine them. To calculate memory consumed >> by cached plans, it will be necessary to calculate memory usage statistic >> for all this contexts (which requires traversal of all context's chunks) >> and sum them. It is much more complex and expensive than current check: >> (++autoprepare_cached_plans > autoprepare_limit) although I so not think >> that it will have measurable impact on performance... >> May be there should be some faster way to estimate memory consumed by >> prepared statements. >> >> So, the current autoprepare_limit allows to limit number of autoprepared >> statements and prevent memory overflow caused by execution of larger >> number of different statements. >> The question is whether we need more precise mechanism which will take >> in account difference between small and large queries. Definitely simple >> query can require 10-100 times less memory than complex query. But memory >> contexts themselves (even with small block size) somehow minimize >> difference in memory footprint of different queries, because of chunked >> allocation. > Thank you for telling me how to implement and its problem. > In many cases when actually designing a system, we estimate the amount of memory > to use and prepare a memory accordingly. Therefore, if we need to estimate memory > usage in both patterns that are limit amount of memory used by prepare queries or > number of prepared statements, I think that there is no problem with the current > implementation. > > Apart from the patch, if it can implement an application that outputs estimates of > memory usage when we enter a query, you may be able to use autoprepare more comfortably. > But it sounds difficult.. I have added autoprepare_memory_limit parameter which allows to limit memory used by autoprepared statements rather than number of such statements. If value of this parameter is non zero, then total size of all memory contexts used for autoprepared statement is maintained (can be innsoected in debugger in autoprepare_used_memory variable). As I already noticed, calculating memory context size adds extra overhead but I do not notice any influence in performance. New version of the patch is attached. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Tuesday, August 7, 2018 at 0:36 AM, Konstantin Knizhnik wrote: > I have registered the patch for next commitfest. > For some reasons it doesn't find the latest autoprepare-10.patch and still > refer to autoprepare-6.patch as the latest attachement. I'm sorry for the late reply. I'm currently reviewing autoprepare. I could not make it in time for the commit fests in September, but I will continue to review for the next commitfest. Performance tests are good results. The results are shown below. pgbench -s 100 -c 8 -t 10000 -S (average of 3 trials) - all autoprepare statements use same memory context. 18052.22706 TPS - each autoprepare statement use separate memory context. 18607.95889 TPS - calculate memory usage (autoprepare_memory_limit) 19171.60457 TPS From the above results, I think that adding/changing functions will not affect performance. I am currently checking the behavior when autoprepare_memory_limit is specified.
> On Fri, Sep 28, 2018 at 10:46 AM Yamaji, Ryo <yamaji.ryo@jp.fujitsu.com> wrote: > > On Tuesday, August 7, 2018 at 0:36 AM, Konstantin Knizhnik wrote: > > > I have registered the patch for next commitfest. > > For some reasons it doesn't find the latest autoprepare-10.patch and still > > refer to autoprepare-6.patch as the latest attachement. > > I'm sorry for the late reply. I'm currently reviewing autoprepare. > I could not make it in time for the commit fests in September, > but I will continue to review for the next commitfest. > > Performance tests are good results. The results are shown below. > pgbench -s 100 -c 8 -t 10000 -S (average of 3 trials) > - all autoprepare statements use same memory context. > 18052.22706 TPS > - each autoprepare statement use separate memory context. > 18607.95889 TPS > - calculate memory usage (autoprepare_memory_limit) > 19171.60457 TPS > > From the above results, I think that adding/changing functions > will not affect performance. I am currently checking the behavior > when autoprepare_memory_limit is specified. Hi, Thanks for reviewing. Since another CF is about to end, maybe you can post the full review feedback? > On Tue, Jul 31, 2018 at 11:30 PM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > > Concerning next commit fest - I am not sure. > At previous commitfest it was returned with feedback that it "has > received no review or comments since last May". > May be your review will help to change this situation. Well, I think that wasn't the only reason, and having some balance here would be nice. Anyway, thanks for working on this patch! Unfortunately, the patch needs to be rebased one more time. Also, I see there were concerns about how reliable this patch is. Just out of curiosity, can you tell now that from v4 to v11 reliability is not a concern anymore? (If you don't mind, I'll also CC some of the reviewers who saw previous versions).
On Fri, Nov 30, 2018 at 3:48 AM, Dmitry Dolgov wrote: > Hi, > > Thanks for reviewing. Since another CF is about to end, maybe you can > post the full review feedback? Since I had been busy with my private work, I couldn't review. I want to review next commit fest. I am sorry I postponed many times.
> On Fri, Nov 30, 2018 at 3:06 AM Yamaji, Ryo <yamaji.ryo@jp.fujitsu.com> wrote: > > On Fri, Nov 30, 2018 at 3:48 AM, Dmitry Dolgov wrote: > > > Hi, > > > > Thanks for reviewing. Since another CF is about to end, maybe you can > > post the full review feedback? > > Since I had been busy with my private work, I couldn't review. > I want to review next commit fest. I am sorry I postponed many times. No worries, just don't forget to post a review when it will be ready.
Hi, Although I became your reviewer, it seems to be difficult to feedback in this CF. I continue to review, so would you update your patch please? Until then I review your current patch. There is one question. date_1.out which maybe is copy of date.out includes trailing space and gaps of indent e.g., line 3368 and 3380 in your current patch have space in each end of line different indent. This is also seen in date.out. I'm not sure whether it is ok to add new file including the above features just because a existing file includes. Is it ok? Best regards, --------------------- Ryohei Nagaura
On 29.01.2019 4:38, Nagaura, Ryohei wrote: > Hi, > > Although I became your reviewer, it seems to be difficult to feedback in this CF. > I continue to review, so would you update your patch please? > Until then I review your current patch. > > There is one question. > date_1.out which maybe is copy of date.out includes trailing space and gaps of indent > e.g., line 3368 and 3380 in your current patch have > space in each end of line > different indent. > This is also seen in date.out. > I'm not sure whether it is ok to add new file including the above features just because a existing file includes. > Is it ok? > > Best regards, > --------------------- > Ryohei Nagaura Rebased version of the patch is attached. Concerning spaces in date.out and date_1.out - it is ok. This files are just output of test execution and it is not possible to expect lack of trailing spaces in output of test scripts execution. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Hi, On Mon, Jan 28, 2019 at 10:46 PM, Konstantin Knizhnik wrote: > Rebased version of the patch is attached. Thank you for your quick rebase. > This files are just output of test execution and it is not possible to expect lack of > trailing spaces in output of test scripts execution. I understand this is because regression tests use exact matching. I learned a lot. Thanks. Best regards, --------------------- Ryohei Nagaura
On Tue, Jan 29, 2019 at 10:46 AM, Konstantin Knizhnik wrote: > Rebased version of the patch is attached. I'm sorry for the late review. I confirmed behavior of autoprepare-12.patch. It is summarized below. ・parameter Expected behavior was shown according to the set value. However, I think that it is be more kind to describe that autoprepare hold infinite statements when the setting value of autoprepare_ (memory_) limit is 0 in the manual. There is no problem as operation. ・pg_autoprepared_statements I confirmed that I could refer properly. ・autoprepare cache retention period I confirmed that autoprepared statements were deleted when the set statement or the DDL statement was executed. Although it differs from the explicit prepare statements, it does not matter as a autoprepare. ・performance This patch does not confirm the basic performance of autoprepare because it confirmed that there was no performance problem with the previous patch (autoprepare-11.patch). However, because it was argued that performance degradation would occur when prepared statements execute to a partition table, I expected that autoprepare might exhibit similar behavior, and measured the performance. I also predicted that the plan_cache_mode setting does not apply to autoprepare, and we also measured the plan_cache_mode by conditions. Below results (this result is TPS) plan_cache_mode simple simple(autoprepare) prepare auto 130 121 121.5 force_custom_plan 132.5 90.7 122.7 force_generic_plan 126.7 14.7 24.7 Performance degradation was observed when plan_cache_mode was specified for autoprepare. Is this behavior correct? I do not know why this is the results. Below performance test procedure drop table if exists rt; create table rt (a int, b int, c int) partition by range (a); \o /dev/null select 'create table rt' || x::text || ' partition of rt for values from (' || (x)::text || ') to (' || (x+1)::text || ');' from generate_series(0, 1024) x; \gexec \o pgbench -p port -T 60 -c 1 -n -f test.sql (-M prepared) postgres test.sql \set a random (0, 1023) select * from rt where a = :a;
Thank you very much for the review! On 19.03.2019 5:56, Yamaji, Ryo wrote: > On Tue, Jan 29, 2019 at 10:46 AM, Konstantin Knizhnik wrote: >> Rebased version of the patch is attached. > I'm sorry for the late review. > I confirmed behavior of autoprepare-12.patch. It is summarized below. > > ・parameter > Expected behavior was shown according to the set value. > However, I think that it is be more kind to describe that autoprepare > hold infinite statements when the setting value of > autoprepare_ (memory_) limit is 0 in the manual. > There is no problem as operation. Sorry, I do not completely understand your concern. Description of autoprepare_ (memory_) limit includes explanation of zero value: autoprepare_limit: 0 means unlimited number of autoprepared queries. Too large number of prepared queries can cause backend memory overflow and slowdown execution speed (because of increased lookup time. autoprepare_memory_limit: 0 means that there is no memory limit. Calculating memory used by prepared queries adds some extra overhead, so non-zero value of this parameter may cause some slowdown. autoprepare_limit is much faster way to limit number of autoprepared statements Do you think that this descriptions are unclear and should be rewritten? > ・pg_autoprepared_statements > I confirmed that I could refer properly. > > ・autoprepare cache retention period > I confirmed that autoprepared statements were deleted when the set > statement or the DDL statement was executed. Although it differs from > the explicit prepare statements, it does not matter as a autoprepare. > > ・performance > This patch does not confirm the basic performance of autoprepare because > it confirmed that there was no performance problem with the previous > patch (autoprepare-11.patch). However, because it was argued that > performance degradation would occur when prepared statements execute to > a partition table, I expected that autoprepare might exhibit similar > behavior, and measured the performance. I also predicted that the > plan_cache_mode setting does not apply to autoprepare, and we also > measured the plan_cache_mode by conditions. > Below results (this result is TPS) > > plan_cache_mode simple simple(autoprepare) prepare > auto 130 121 121.5 > force_custom_plan 132.5 90.7 122.7 > force_generic_plan 126.7 14.7 24.7 > > Performance degradation was observed when plan_cache_mode was specified > for autoprepare. Is this behavior correct? I do not know why this is the > results. > > Below performance test procedure > > drop table if exists rt; > create table rt (a int, b int, c int) partition by range (a); > \o /dev/null > select 'create table rt' || x::text || ' partition of rt for values from (' || > (x)::text || ') to (' || (x+1)::text || ');' from generate_series(0, 1024) x; > \gexec > \o > > pgbench -p port -T 60 -c 1 -n -f test.sql (-M prepared) postgres > > test.sql > \set a random (0, 1023) > select * from rt where a = :a; Autoprepare is using the same functions from plancache.c so plan_cache_mode settings affect autoprepare as well as explicitly preprepared statements. Below are my results of select-only pgbench: plan_cache_mode simple simple(autoprepare) prepare auto 23k 42k 50k force_custom_plan 23k 24k 26k force_generic_plan 23k 44k 50k As you can see force_custom_plan slowdowns both explicitly and autoprepared statements. Unfortunately generic plans are not working well with partitioned table because disabling partition pruning. At my system result of your query execution is the following: plan_cache_mode simple simple(autoprepare) prepare auto 232 220 219 force_custom_plan 234 175 211 force_generic_plan 230 48 48 The conclusion is that forcing generic plan can cause slowdown of queries on partitioned tables. If plan cache mode is not enforced, then standard Postgres strategy of comparing efficiency of generic and custom plans works well. Attached please find rebased version of the patch. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
New version of the patch with fixed error of autopreparing CTE queries. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
New version of the patching disabling autoprepare for rules and handling planner error. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Wed, Apr 10, 2019 at 12:52 AM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > New version of the patching disabling autoprepare for rules and handling > planner error. Hi Konstantin, This doesn't apply. Could we please have a fresh rebase for the new Commitfest? Thanks, -- Thomas Munro https://enterprisedb.com
On 01.07.2019 12:51, Thomas Munro wrote: > On Wed, Apr 10, 2019 at 12:52 AM Konstantin Knizhnik > <k.knizhnik@postgrespro.ru> wrote: >> New version of the patching disabling autoprepare for rules and handling >> planner error. > Hi Konstantin, > > This doesn't apply. Could we please have a fresh rebase for the new Commitfest? > > Thanks, > Attached please find rebased version of the patch. Also this version can be found in autoprepare branch of this repository https://github.com/postgrespro/postgresql.builtin_pool.git on github.
Attachment
On Tue, Jul 2, 2019 at 3:29 AM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > Attached please find rebased version of the patch. > Also this version can be found in autoprepare branch of this repository > https://github.com/postgrespro/postgresql.builtin_pool.git > on github. Thanks. I haven't looked at the code but this seems like interesting work and I hope it will get some review. I guess this is bound to use a lot of memory. I guess we'd eventually want to figure out how to share the autoprepared plan cache between sessions, which is obviously a whole can of worms. A couple of trivial comments with my CF manager hat on: 1. Can you please fix the documentation? It doesn't build. Obviously reviewing the goals, design and implementation are more important than the documentation at this point, but if that is fixed then the CF bot will be able to run check-world every day and we might learn something about the code. 2. Accidental editor junk included: src/include/catalog/pg_proc.dat.~1~ -- Thomas Munro https://enterprisedb.com
On 08.07.2019 2:23, Thomas Munro wrote: > On Tue, Jul 2, 2019 at 3:29 AM Konstantin Knizhnik > <k.knizhnik@postgrespro.ru> wrote: >> Attached please find rebased version of the patch. >> Also this version can be found in autoprepare branch of this repository >> https://github.com/postgrespro/postgresql.builtin_pool.git >> on github. > Thanks. I haven't looked at the code but this seems like interesting > work and I hope it will get some review. I guess this is bound to use > a lot of memory. I guess we'd eventually want to figure out how to > share the autoprepared plan cache between sessions, which is obviously > a whole can of worms. > > A couple of trivial comments with my CF manager hat on: > > 1. Can you please fix the documentation? It doesn't build. > Obviously reviewing the goals, design and implementation are more > important than the documentation at this point, but if that is fixed > then the CF bot will be able to run check-world every day and we might > learn something about the code. > 2. Accidental editor junk included: src/include/catalog/pg_proc.dat.~1~ Sorry, are you tests autoprepare-16.patch I have sent in the last e-mail? I can not reproduce the problem with building documentation: knizhnik@xps:~/postgresql/doc$ make make -C ../src/backend generated-headers make[1]: Entering directory '/home/knizhnik/postgresql/src/backend' make -C catalog distprep generated-header-symlinks make[2]: Entering directory '/home/knizhnik/postgresql/src/backend/catalog' make[2]: Nothing to be done for 'distprep'. make[2]: Nothing to be done for 'generated-header-symlinks'. make[2]: Leaving directory '/home/knizhnik/postgresql/src/backend/catalog' make -C utils distprep generated-header-symlinks make[2]: Entering directory '/home/knizhnik/postgresql/src/backend/utils' make[2]: Nothing to be done for 'distprep'. make[2]: Nothing to be done for 'generated-header-symlinks'. make[2]: Leaving directory '/home/knizhnik/postgresql/src/backend/utils' make[1]: Leaving directory '/home/knizhnik/postgresql/src/backend' make -C src all make[1]: Entering directory '/home/knizhnik/postgresql/doc/src' make -C sgml all make[2]: Entering directory '/home/knizhnik/postgresql/doc/src/sgml' /usr/bin/xmllint --path . --noout --valid postgres.sgml /usr/bin/xsltproc --path . --stringparam pg.version '12devel' stylesheet.xsl postgres.sgml cp ./stylesheet.css html/ touch html-stamp /usr/bin/xmllint --path . --noout --valid postgres.sgml /usr/bin/xsltproc --path . --stringparam pg.version '12devel' stylesheet-man.xsl postgres.sgml touch man-stamp make[2]: Leaving directory '/home/knizhnik/postgresql/doc/src/sgml' make[1]: Leaving directory '/home/knizhnik/postgresql/doc/src' Also autoporepare-16.patch doesn't include any junk src/include/catalog/pg_proc.dat.~1~
On Tue, Jul 9, 2019 at 7:32 AM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > Sorry, are you tests autoprepare-16.patch I have sent in the last e-mail? > I can not reproduce the problem with building documentation: + <term><varname>autoprepare_threshold</varname> (<type>integer/type>) The problem is that "<type>integer/type>". (Missing "<"). There more than one of those. Not sure why that doesn't fail for you. > Also autoporepare-16.patch doesn't include any junk > > src/include/catalog/pg_proc.dat.~1~ Oops, yeah, sorry for the noise. I did something stupid in my apply script. -- Thomas Munro https://enterprisedb.com
On 09.07.2019 15:16, Thomas Munro wrote: > On Tue, Jul 9, 2019 at 7:32 AM Konstantin Knizhnik > <k.knizhnik@postgrespro.ru> wrote: >> Sorry, are you tests autoprepare-16.patch I have sent in the last e-mail? >> I can not reproduce the problem with building documentation: > + <term><varname>autoprepare_threshold</varname> (<type>integer/type>) > > The problem is that "<type>integer/type>". (Missing "<"). There more > than one of those. Not sure why that doesn't fail for you. > or the noise. I did something stupid in my apply script. > Sorry, there were really several syntax problems. I didn't understand why I have not noticed them before. Fixed patch version of the path is attached.
Attachment
On 09/07/2019 23:59, Konstantin Knizhnik wrote: > Fixed patch version of the path is attached. Much of the patch and the discussion has been around the raw parsing, and guessing which literals are actually parameters that have been inlined into the SQL text. Do we really need to do that, though? The documentation mentions pgbouncer and other connection poolers, where you don't have session state, as a use case for this. But such connection poolers could and should still be using the extended query protocol, with Parse+Bind messages and query parameters, even if they don't use named prepared statements. I'd want to encourage applications and middleware to use out-of-band query parameters, to avoid SQL injection attacks, regardless of whether they use prepared statements or cache query plans. So how about dropping all the raw parse tree stuff, and doing the automatic caching of plans based on the SQL string, somewhere in the exec_parse_message? Check the autoprepare cache in exec_parse_message(), if it was an "unnamed" prepared statement, i.e. if the prepared statement name is an empty string. (I'm actually not sure if exec_parse/bind_message is the right place for this, but I saw that your current patch did it in exec_simple_query, and exec_parse/bind_message are the equivalent of that for the extended query protocol). - Heikki
On 31.07.2019 19:56, Heikki Linnakangas wrote: > On 09/07/2019 23:59, Konstantin Knizhnik wrote: >> Fixed patch version of the path is attached. > > Much of the patch and the discussion has been around the raw parsing, > and guessing which literals are actually parameters that have been > inlined into the SQL text. Do we really need to do that, though? The > documentation mentions pgbouncer and other connection poolers, where > you don't have session state, as a use case for this. But such > connection poolers could and should still be using the extended query > protocol, with Parse+Bind messages and query parameters, even if they > don't use named prepared statements. I'd want to encourage > applications and middleware to use out-of-band query parameters, to > avoid SQL injection attacks, regardless of whether they use prepared > statements or cache query plans. So how about dropping all the raw > parse tree stuff, and doing the automatic caching of plans based on > the SQL string, somewhere in the exec_parse_message? Check the > autoprepare cache in exec_parse_message(), if it was an "unnamed" > prepared statement, i.e. if the prepared statement name is an empty > string. > > (I'm actually not sure if exec_parse/bind_message is the right place > for this, but I saw that your current patch did it in > exec_simple_query, and exec_parse/bind_message are the equivalent of > that for the extended query protocol). It will significantly simplify this patch and eliminate all complexity and troubles caused by replacing string literals with parameters if assumption that all client applications are using extended query protocol is true. But I am not sure that we can expect it. At least I myself see many applications which are constructing queries by embedding literal values. May be it is not so good and safe (SQL injection attack), but many applications are doing it. And the idea was to improve execution speed of existed application without changing them. Also please notice that extended protocol requires passing more message which has negative effect on performance. At my notebook I get about 21k TPS on "pgbench -S" and 18k TPS on "pgbench -M extended -S". And it is with unix socket connection! I think that in case of remote connections difference will be even larger. So may be committing simple version of this patch which do not need to solve any challenged problems is good idea. But I afraid that it will significantly reduce positive effect of this patch.
On 31.07.2019 19:56, Heikki Linnakangas wrote: > On 09/07/2019 23:59, Konstantin Knizhnik wrote: >> Fixed patch version of the path is attached. > > Much of the patch and the discussion has been around the raw parsing, > and guessing which literals are actually parameters that have been > inlined into the SQL text. Do we really need to do that, though? The > documentation mentions pgbouncer and other connection poolers, where > you don't have session state, as a use case for this. But such > connection poolers could and should still be using the extended query > protocol, with Parse+Bind messages and query parameters, even if they > don't use named prepared statements. I'd want to encourage > applications and middleware to use out-of-band query parameters, to > avoid SQL injection attacks, regardless of whether they use prepared > statements or cache query plans. So how about dropping all the raw > parse tree stuff, and doing the automatic caching of plans based on > the SQL string, somewhere in the exec_parse_message? Check the > autoprepare cache in exec_parse_message(), if it was an "unnamed" > prepared statement, i.e. if the prepared statement name is an empty > string. > > (I'm actually not sure if exec_parse/bind_message is the right place > for this, but I saw that your current patch did it in > exec_simple_query, and exec_parse/bind_message are the equivalent of > that for the extended query protocol). > > - Heikki I decided to implement your proposal. Much simple version of autoprepare patch is attached. At my computer I got the following results: pgbench -M simple -S 22495 TPS pgbench -M extended -S 47633 TPS pgbench -M prepared -S 47683 TPS So autoprepare speedup execution of queries sent using extended protocol more than twice and it is almost the same as with explicitly prepared statements. I failed to create regression test for it because I do not know how to force psql to use extended protocol. Any advice is welcome. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Am 01.08.2019 um 18:56 schrieb Konstantin Knizhnik: > I decided to implement your proposal. Much simple version of > autoprepare patch is attached. > At my computer I got the following results: > > pgbench -M simple -S 22495 TPS > pgbench -M extended -S 47633 TPS > pgbench -M prepared -S 47683 TPS > > > So autoprepare speedup execution of queries sent using extended > protocol more than twice and it is almost the same as with explicitly > prepared statements. > I failed to create regression test for it because I do not know how to > force psql to use extended protocol. Any advice is welcome. I am very interested in such a patch, because currently I use the same functionality within my JDBC driver and having this directly in PostgreSQL would surely speed things up. I have two suggestions however: 1. Please allow to gather information about the autoprepared statements by returning them in pg_prepared_statements view. I would love to monitor usage of them as well as the memory consumption that occurs. I suggested a patch to display that in https://www.postgresql.org/message-id/41ED3F5450C90F4D8381BC4D8DF6BBDCF02E10B4@EXCHANGESERVER.ikoffice.de 2. Please not only use a LRU list, but maybe something which would prefer statements that get reused at all. We often create ad hoc statements with parameters which don't really need to be kept. Maybe I can suggest an implementation of an LRU list where a reusal of a statement not only pulls it to the top of the list but also increases a reuse counter. When then a statement would drop off the end of the list one checks if the reusal count is non-zero, and only really drops it if the resual count is zero. Else the reusal count is decremented (or halved) and the element is also placed at the start of the LRU list, so it is kept a bit longer. Thanks, Daniel Migowski
On 02.08.2019 11:25, Daniel Migowski wrote: > Am 01.08.2019 um 18:56 schrieb Konstantin Knizhnik: >> I decided to implement your proposal. Much simple version of >> autoprepare patch is attached. >> At my computer I got the following results: >> >> pgbench -M simple -S 22495 TPS >> pgbench -M extended -S 47633 TPS >> pgbench -M prepared -S 47683 TPS >> >> >> So autoprepare speedup execution of queries sent using extended >> protocol more than twice and it is almost the same as with explicitly >> prepared statements. >> I failed to create regression test for it because I do not know how >> to force psql to use extended protocol. Any advice is welcome. > > I am very interested in such a patch, because currently I use the same > functionality within my JDBC driver and having this directly in > PostgreSQL would surely speed things up. > > I have two suggestions however: > > 1. Please allow to gather information about the autoprepared > statements by returning them in pg_prepared_statements view. I would > love to monitor usage of them as well as the memory consumption that > occurs. I suggested a patch to display that in > https://www.postgresql.org/message-id/41ED3F5450C90F4D8381BC4D8DF6BBDCF02E10B4@EXCHANGESERVER.ikoffice.de > Sorry, but there is pg_autoprepared_statements view. I think that it will be better to distinguish explicitly and implicitly prepared statements, will not it? Do you want to add more information to this view? Right now it shows query string, types of parameters and number of this query execution. > 2. Please not only use a LRU list, but maybe something which would > prefer statements that get reused at all. We often create ad hoc > statements with parameters which don't really need to be kept. Maybe I > can suggest an implementation of an LRU list where a reusal of a > statement not only pulls it to the top of the list but also increases > a reuse counter. When then a statement would drop off the end of the > list one checks if the reusal count is non-zero, and only really drops > it if the resual count is zero. Else the reusal count is decremented > (or halved) and the element is also placed at the start of the LRU > list, so it is kept a bit longer. > There are many variations of LRU and alternatives: clock, segmented LRU, ... I agree that classical LRU may be not the best replacement policy for autoprepare. Application is either use static queries, either constructing them dynamically. It will be better to distinguish queries executed at least twice from one-shot queries. So may be SLRU will be the best choice. But actually situation you have described (We often create ad hoc statements with parameters which don't really need to be kept) is not applicable to atutoprepare. Autoprepare deals only with statements which are actually executed. We have another patch which limits number of stored prepared plans to avoid memory overflow (in case of using stored procedures which implicitly prepare all issued queries). Here choice of replacement policy is more critical. > Thanks, > > Daniel Migowski > > > > -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Am 02.08.2019 um 10:57 schrieb Konstantin Knizhnik: > On 02.08.2019 11:25, Daniel Migowski wrote: >> >> I have two suggestions however: >> >> 1. Please allow to gather information about the autoprepared >> statements by returning them in pg_prepared_statements view. I would >> love to monitor usage of them as well as the memory consumption that >> occurs. I suggested a patch to display that in >> https://www.postgresql.org/message-id/41ED3F5450C90F4D8381BC4D8DF6BBDCF02E10B4@EXCHANGESERVER.ikoffice.de >> >> > Sorry, but there is pg_autoprepared_statements view. I think that it > will be better to distinguish explicitly and implicitly prepared > statements, will not it? > Do you want to add more information to this view? Right now it shows > query string, types of parameters and number of this query execution. My sorry, I didn't notice it in the patch. If there is another view for those that's perfectly fine. >> 2. Please not only use a LRU list, but maybe something which would >> prefer statements that get reused at all. We often create ad hoc >> statements with parameters which don't really need to be kept. Maybe >> I can suggest an implementation of an LRU list where a reusal of a >> statement not only pulls it to the top of the list but also increases >> a reuse counter. When then a statement would drop off the end of the >> list one checks if the reusal count is non-zero, and only really >> drops it if the resual count is zero. Else the reusal count is >> decremented (or halved) and the element is also placed at the start >> of the LRU list, so it is kept a bit longer. >> > There are many variations of LRU and alternatives: clock, segmented > LRU, ... > I agree that classical LRU may be not the best replacement policy for > autoprepare. > Application is either use static queries, either constructing them > dynamically. It will be better to distinguish queries executed at > least twice from one-shot queries. > So may be SLRU will be the best choice. But actually situation you > have described (We often create ad hoc statements with parameters > which don't really need to be kept) > is not applicable to atutoprepare. Autoprepare deals only with > statements which are actually executed. > > We have another patch which limits number of stored prepared plans to > avoid memory overflow (in case of using stored procedures which > implicitly prepare all issued queries). > Here choice of replacement policy is more critical. Alright, just wanted to have something better than a LRU, so it is not a stepback from the implementation in the JDBC driver for us. Kind regards, Daniel Migowski
On 01.08.2019 19:56, Konstantin Knizhnik wrote: > > > On 31.07.2019 19:56, Heikki Linnakangas wrote: >> On 09/07/2019 23:59, Konstantin Knizhnik wrote: >>> Fixed patch version of the path is attached. >> >> Much of the patch and the discussion has been around the raw parsing, >> and guessing which literals are actually parameters that have been >> inlined into the SQL text. Do we really need to do that, though? The >> documentation mentions pgbouncer and other connection poolers, where >> you don't have session state, as a use case for this. But such >> connection poolers could and should still be using the extended query >> protocol, with Parse+Bind messages and query parameters, even if they >> don't use named prepared statements. I'd want to encourage >> applications and middleware to use out-of-band query parameters, to >> avoid SQL injection attacks, regardless of whether they use prepared >> statements or cache query plans. So how about dropping all the raw >> parse tree stuff, and doing the automatic caching of plans based on >> the SQL string, somewhere in the exec_parse_message? Check the >> autoprepare cache in exec_parse_message(), if it was an "unnamed" >> prepared statement, i.e. if the prepared statement name is an empty >> string. >> >> (I'm actually not sure if exec_parse/bind_message is the right place >> for this, but I saw that your current patch did it in >> exec_simple_query, and exec_parse/bind_message are the equivalent of >> that for the extended query protocol). >> >> - Heikki > > I decided to implement your proposal. Much simple version of > autoprepare patch is attached. > At my computer I got the following results: > > pgbench -M simple -S 22495 TPS > pgbench -M extended -S 47633 TPS > pgbench -M prepared -S 47683 TPS > > > So autoprepare speedup execution of queries sent using extended > protocol more than twice and it is almost the same as with explicitly > prepared statements. > I failed to create regression test for it because I do not know how to > force psql to use extended protocol. Any advice is welcome. > Slightly improved and rebased version of the patch. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
This patch fails the "rules" tests. Please fix. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 25.09.2019 23:06, Alvaro Herrera wrote: > This patch fails the "rules" tests. Please fix. > > Sorry, New version of the patch with corrected expected output for rules test is attached. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Thu, Sep 26, 2019 at 10:23:38AM +0300, Konstantin Knizhnik wrote: > Sorry, > New version of the patch with corrected expected output for rules test is > attached. It looks like the documentation is failing to build. Could you fix that? There may be other issues as well. I have moved the patch to next CF, waiting on author. -- Michael
Attachment
On 01.12.2019 6:26, Michael Paquier wrote: > On Thu, Sep 26, 2019 at 10:23:38AM +0300, Konstantin Knizhnik wrote: >> Sorry, >> New version of the patch with corrected expected output for rules test is >> attached. > It looks like the documentation is failing to build. Could you fix > that? There may be other issues as well. I have moved the patch to > next CF, waiting on author. > -- > Michael Sorry, issues with documentation are fixed. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes: > [ autoprepare-extended-4.patch ] The cfbot is showing that this doesn't apply anymore; there's some doubtless-trivial conflict in prepare.c. However ... TBH I've been skeptical of this whole proposal from the beginning, on the grounds that it would probably hurt more use-cases than it helps. The latest approach doesn't really do anything to assuage that fear, because now that you've restricted it to extended query mode, the feature amounts to nothing more than deliberately overriding the application's choice to use unnamed rather than named (prepared) statements. How often is that going to be a good idea? And when it is, wouldn't it be better to fix the app? The client is likely to have a far better idea of which statements would benefit from this treatment than the server will; and in this context, the client-side changes needed would really be trivial. The original proposal, scary as it was, at least supported the argument that you could use it to improve applications that were too dumb/hoary to parameterize commands for themselves. I'm also unimpressed by benchmark testing that relies entirely on pgbench's default scenario, because that scenario consists entirely of queries that are perfectly adapted to plan-only-once treatment. In the real world, we constantly see complaints about cases where the plancache mistakenly decides that a generic plan is acceptable. I think that extending that behavior to more cases is going to be a net loss, until we find some way to make it smarter and more reliable. At the very least, you need to show some worst-case numbers alongside these best-case numbers --- what happens with queries where we conclude we must replan every time, so that the plancache becomes pure overhead? The pgbench test case is laughably unrealistic in another way, in that there are so few distinct queries it issues, so that there's no stress at all on the cache-management aspect of this problem. In short, I really think we ought to reject this patch and move on. Maybe it could be resurrected sometime in the future when we have a better handle on when to cache plans and when not. If you want to press forward with it anyway, certainly the lack of any tests in this patch is another big objection. Perhaps you could create a pgbench TAP script that exercises the logic. regards, tom lane
> On 1 Mar 2020, at 20:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: > In short, I really think we ought to reject this patch and move on. > Maybe it could be resurrected sometime in the future when we have a > better handle on when to cache plans and when not. > > If you want to press forward with it anyway, certainly the lack of > any tests in this patch is another big objection. Perhaps you > could create a pgbench TAP script that exercises the logic. Based on Tom's review, and that nothing has been submitted since, I've marked this entry as returned with feedback. Feel to open a new entry if you want to address Tom's comments and take this further. cheers ./daniel
вс, 1 мар. 2020 г. в 22:26, Tom Lane <tgl@sss.pgh.pa.us>: > > Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes: > > [ autoprepare-extended-4.patch ] > > The cfbot is showing that this doesn't apply anymore; there's > some doubtless-trivial conflict in prepare.c. > > However ... TBH I've been skeptical of this whole proposal from the > beginning, on the grounds that it would probably hurt more use-cases > than it helps. The latest approach doesn't really do anything to > assuage that fear, because now that you've restricted it to extended > query mode, the feature amounts to nothing more than deliberately > overriding the application's choice to use unnamed rather than named > (prepared) statements. How often is that going to be a good idea? > And when it is, wouldn't it be better to fix the app? The client is > likely to have a far better idea of which statements would benefit > from this treatment than the server will; and in this context, > the client-side changes needed would really be trivial. The original > proposal, scary as it was, at least supported the argument that you > could use it to improve applications that were too dumb/hoary to > parameterize commands for themselves. The theme of this thread: - it is not possible to reliably "fix the app" when pgbouncer or internal driver connection multiplexing are used. - another widely spread case is frameworks (ORM or other). There is no way to prepare a concrete query because it is buried under levels of abstractions. Whole "autoprepare" thing is a workaround for absence of "really trivial client-side changes" in these conditions. regards, Yura