Re: [HACKERS] Cached plans and statement generalization - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: [HACKERS] Cached plans and statement generalization
Date
Msg-id f0bfcdd6-dba3-e9a8-2108-146e1f880839@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] Cached plans and statement generalization  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] Cached plans and statement generalization  (Serge Rielau <serge@rielau.com>)
Re: [HACKERS] Cached plans and statement generalization  (David Fetter <david@fetter.org>)
Re: [HACKERS] Cached plans and statement generalization  ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>)
List pgsql-hackers
On 24.04.2017 21:43, Andres Freund wrote:
Hi,

On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
So what I am thinking now is implicit query caching. If the same query with
different literal values is repeated many times, then we can try to
generalize this query and replace it with prepared query with
parameters.
That's not actuall all that easy:
- You pretty much do parse analysis to be able to do an accurate match. How much overhead is parse analysis vs. planning in your cases?
- The invalidation infrastructure for this, if not tied to to fully parse-analyzed statements, is going to be hell.
- Migrating to parameters can actually cause significant slowdowns, not nice if that happens implicitly.

Well, first of all I want to share results I already get: pgbench with default parameters,  scale 10 and one connection:

protocol
TPS
simple
3492
extended
2927
prepared
6865
simple + autoprepare
6844

So autoprepare is as efficient as explicit prepare and can increase performance almost two times.

My current implementation is replacing with parameters only string literals in the query, i.e. select * from T where x='123'; -> select * from T where x=$1;
It greatly simplifies matching of parameters - it is just necessary to locate '\'' character and then correctly handle pairs  of quotes.
Handling of integer and real literals is really challenged task.
One source of problems is negation: it is not so easy to correctly understand whether minus should be treated as part of literal or as operator:
(-1), (1-1), (1-1)-1
Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1".

Fully correct substitution can be done by first performing parsing the query, then transform parse tree, replacing literal nodes with parameter nodes and finally deparse tree into generalized query. postgres_fdw already contains such deparse code. It can be moved to postgres core and reused for autoprepare (and may be somewhere else).
But in this case overhead will be much higher.
I still think that query parsing time is significantly smaller than time needed for building and optimizing query execution plan.
But it should be measured if community will be interested in such approach.

There is obvious question: how I managed to get this pgbench results if currently only substitution of string literals is supported and queries constructed by pgbench don't contain string literals? I just made small patch in pgbench replaceVariable method wrapping value's representation in quotes. It has almost no impact on performance (3482 TPS  vs. 3492 TPS),
but allows autoprepare to deal with pgbench queries.

I attached my patch to this mail. It is just first version of the patch (based on REL9_6_STABLE branch) just to illustrate proposed approach.
I will be glad to receive any comments and if such optimization is considered to be useful, I will continue work on this patch.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: [HACKERS] TAP tests - installcheck vs check
Next
From: Fujii Masao
Date:
Subject: Re: [HACKERS] Separation walsender & normal backends