Re: [HACKERS] Cached plans and statement generalization - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: [HACKERS] Cached plans and statement generalization
Date
Msg-id 5538dbd6-fcbb-33a7-49d8-28c5cd2df435@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] Cached plans and statement generalization  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers

On 31.07.2019 19:56, Heikki Linnakangas wrote:
> On 09/07/2019 23:59, Konstantin Knizhnik wrote:
>> Fixed patch version of the path is attached.
>
> Much of the patch and the discussion has been around the raw parsing, 
> and guessing which literals are actually parameters that have been 
> inlined into the SQL text. Do we really need to do that, though? The 
> documentation mentions pgbouncer and other connection poolers, where 
> you don't have session state, as a use case for this. But such 
> connection poolers could and should still be using the extended query 
> protocol, with Parse+Bind messages and query parameters, even if they 
> don't use named prepared statements. I'd want to encourage 
> applications and middleware to use out-of-band query parameters, to 
> avoid SQL injection attacks, regardless of whether they use prepared 
> statements or cache query plans. So how about dropping all the raw 
> parse tree stuff, and doing the automatic caching of plans based on 
> the SQL string, somewhere in the exec_parse_message? Check the 
> autoprepare cache in exec_parse_message(), if it was an "unnamed" 
> prepared statement, i.e. if the prepared statement name is an empty 
> string.
>
> (I'm actually not sure if exec_parse/bind_message is the right place 
> for this, but I saw that your current patch did it in 
> exec_simple_query, and exec_parse/bind_message are the equivalent of 
> that for the extended query protocol).

It will significantly simplify this patch and eliminate all complexity 
and troubles  caused by replacing string literals with parameters
if assumption that all client applications are using extended query 
protocol is true.
But I am not sure that we can expect it. At least I myself see many 
applications which are constructing queries by embedding literal values.
May be it is not so good and safe (SQL injection attack), but many 
applications are doing it. And the idea was to improve execution speed
of existed application without changing them.


Also please notice that extended protocol requires passing more message 
which has negative effect on performance.
At my  notebook I get about 21k TPS on "pgbench -S" and 18k TPS on 
"pgbench -M extended -S".
And it is with unix socket connection! I think that in case of remote 
connections difference will be even larger.

So may be committing simple version of this patch which do not need to 
solve any challenged problems is good idea.
But I afraid that it will significantly reduce positive effect of this 
patch.



pgsql-hackers by date:

Previous
From: Ashwin Agrawal
Date:
Subject: Re: Remove HeapTuple and Buffer dependency for predicate locking functions
Next
From: Robert Haas
Date:
Subject: Re: block-level incremental backup