Re: [HACKERS] Cached plans and statement generalization - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: [HACKERS] Cached plans and statement generalization
Date
Msg-id 304dfa09-6135-cbcf-d6e5-1ea6c36e6eec@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] Cached plans and statement generalization  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: [HACKERS] Cached plans and statement generalization  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On 26.04.2017 13:46, Pavel Stehule wrote:

I attach new patch which allows to limit the number of autoprepared statements (autoprepare_limit GUC variable).
Also I did more measurements, now with several concurrent connections and read-only statements.
Results of pgbench with 10 connections, scale 10 and read-only statements are below:

Protocol
TPS
extended
87k
prepared
209k
simple+autoprepare
206k

As you can see, autoprepare provides more than 2 times speed improvement.

Also I tried to measure overhead of parsing (to be able to substitute all literals, not only string literals).
I just added extra call of pg_parse_query. Speed is reduced to 181k.
So overhead is noticeable, but still making such optimization useful.
This is why I want to ask question:  is it better to implement slower but safer and more universal solution?

Unsafe solution has not any sense, and it is dangerous (80% of database users has not necessary knowledge). If somebody needs the max possible performance, then he use explicit prepared statements.


I attached new patch to this mail. I completely reimplement my original approach and now use parse tree transformation.
New pgbench (-S -c 10) results are the following:

Protocol
TPS
extended
87k
prepared
209k
simple+autoprepare
185k

So there is some slowdown comparing with my original implementation and explicitly prepared statements, but still it provide more than two times speed-up comparing with unprepared queries. And it doesn't require to change existed applications.
As far as most of real production application are working with DBMS through some connection pool (pgbouncer,...), I think that such optimization will be useful.
Isn't it interesting if If we can increase system throughput almost two times by just setting one parameter in configuration file?

I also tried to enable autoprepare by default and run regression tests. 7 tests are not passed because of the following reasons:
1. Slightly different error reporting (for example error location is not always identically specified).
2. Difference in query behavior caused by  changed local settings (Andres gives an example with search_path,  and date test is failed because of changing datestyle).
3. Problems with indirect dependencies (when table is altered only cached plans directly depending on this relation and invalidated, but not plans with indirect dependencies).
4. Not performing domain checks for null values.
 
I do not think that this issues can cause problems for real application.

Also it is possible to limit number of autoprepared statements using autoprepare_limit parameter, avoid possible backend memory overflow in case of larger number of unique queries sent by application. LRU discipline is used to drop least recently used plans.

Any comments and suggestions for future improvement of this patch are welcome.
-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Attachment

pgsql-hackers by date:

Previous
From: Teodor Sigaev
Date:
Subject: Re: [HACKERS] convert EXSITS to inner join gotcha and bug
Next
From: 高增琦
Date:
Subject: Re: [HACKERS] Dropping a partitioned table takes too long