Re: Slowness of extended protocol - Mailing list pgsql-hackers

From Shay Rojansky
Subject Re: Slowness of extended protocol
Date
Msg-id CADT4RqDOhgjOgKEK4qQUrJctar5extfpqfmPuxxDqd_q76=fUQ@mail.gmail.com
Whole thread Raw
In response to Re: Slowness of extended protocol  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
List pgsql-hackers
On Tue, Aug 9, 2016 at 3:42 PM, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
Shay>But here's the more important general point. We're driver developers, not application developers. I don't really know what performance is "just fine" for each of my users, and what is not worth optimizing further. Users may follow best practices, or they may not for various reasons.

Of course we cannot benchmark all the existing applications, however we should at lest try to use "close to production" benchmarks.

Let me recap: even "select 1" shows clear advantage of reusing server-prepared statements.
My machine shows the following results for "select 1 pgbench":
simple: 38K ops/sec (~26us/op)
extended: 32K ops/sec (~31us/op)
prepared: 47K ops/sec (~21us/op)

Note: reusing server-prepared statements shaves 10us (out of 31us), while "brand new ParseBindExecDeallocate" message would not able to perform better than 26us/op (that is 5 us worse than the prepared one). So it makes much more sense investing time in "server-prepared statement reuse" at the client side and "improving Bind/Exec performance" at the backend side.

For more complex queries the gap (prepared vs simple) would be much bigger since parse/validate/plan for a complex query is much harder operation than the one for "select 1"

You seem to be misunderstanding the fundamental point here. Nobody is saying that prepared statements aren't a huge performance booster - they are. I recommend them to anyone who asks. But you're saying "let's not optimize anything else", whereas there are many programs out there *not* using prepared statements for various reasons (e.g. pgbouncer, or simply an existing codebase). If your opinion is that nothing should be done for these users, fine - nobody's forcing you to do anything. I simply don't see why *not* optimize the very widely-used single-statement execution path.
 
Note: I do not mean "look, prepared always win". I mean: "if your environment does not allow reuse of prepared statements for some reason, you lose huge amount of time on re-parsing the queries, and it is worth fixing that obvious issue first".

Maybe it's worth fixing it, maybe not - that's going to depend on the application. Some applications may be huge/legacy and hard to change, others may depend on something like pgbouncer which doesn't allow it. Other drivers out there probably don't persist prepared statements across close/open, making prepared statements useless for short-lived scenarios. Does the Python driver persist prepared statements? Does the Go driver do so? If not the single-execution flow is very relevant for optimization.
 
Shay>I don't see how reusing SQL text affects security in any way.

Reusing SQL text makes application more secure as "build SQL on the fly" is prone to SQL injection security issues.
So reusing SQL text makes application more secure and it enables server-prepared statements that improve performance considerably. It is a win-win.

We've all understood that server-prepared statements are good for performance. But they aren't more or less vulnerable to SQL injection - developers can just as well concatenate user-provided strings into a prepared SQL text.

[deleting more comments trying to convince that prepared statements are great for performance, which they are]

Shay>Again, in a world where prepared statements aren't persisted across connections (e.g. pgbouncer)

pgbouncer does not properly support named statements, and that is pbgouncer's issue.

The response from pgbouncer team is "all the protocol bits are there, it is just implementation from pgbouncer that is missing".

By the way: I do not use pgbouncer, thus there's no much interest for me to invest time in fixing pgbouncer's issues.

Um, OK... So you're not at all bothered by the fact that the (probably) most popular PostgreSQL connection pool is incompatible with your argument? I'm trying to think about actual users and the actual software they use, so pgbouncer is very relevant.

Shay>Any scenario where you open a relatively short-lived connection and execute something once is problematic - imagine a simple web service which needs to insert a single record into a table.

I would assume the application does not use random string for a table name (and columns/aliases), thus it would result in typical SQL text reuse, thus it should trigger "server-side statement prepare" logic. In other way, that kind of application does not need the "new ParseBindExecDeallocate message we are talking about".

I wouldn't assume anything. Maybe the application does want to change column names (which can't be parameterized). Maybe it needs the extended protocol simply because it wants to do binary encoding, which isn't possible with the simple protocol (that's the case with Npgsql). There are many types of applications out there, I try in general not to assume things about them.

It would be great to get some external opinions on this debate (and apologies about the email volume here too).

pgsql-hackers by date:

Previous
From: Ryan Pedela
Date:
Subject: Re: 9.6 phrase search distance specification
Next
From: "David G. Johnston"
Date:
Subject: phrase search TS_phrase_execute code readability patch