Re: Slowness of extended protocol - Mailing list pgsql-hackers

From Shay Rojansky
Subject Re: Slowness of extended protocol
Date
Msg-id CADT4RqBphvxfZ70pGNNQNT3XLHh9A-QSks6kabg_DzT_506nBg@mail.gmail.com
Whole thread Raw
In response to Re: Slowness of extended protocol  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
Responses Re: Slowness of extended protocol  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
List pgsql-hackers


On Tue, Aug 9, 2016 at 8:50 AM, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
Shay>There are many scenarios where connections are very short-lived (think about webapps where a pooled connection is allocated per-request and reset in between)

Why the connection is reset in between in the first place?
In pgjdbc we do not reset per-connection statement cache, thus we easily reuse named statements for pooled connections.

A DISCARD ALL is sent when the connection is returned to the pool, the prevent state leakage etc. You make a valid comment though - there's already a new feature in the Npgsql dev branch which allows prepared statements to be persisted across open/close on pooled connections, it's interesting to learn that it is standard behavior in pgjdbc. At least up to now, the logic was not to implcitly keep holding server-side resources across a pooled connection close, because these may become a drain on the server etc.

More important, unless I'm mistaken pgbouncer also sends DISCARD ALL to clean the connection state, as will other pooling solutions. That unfortunately means that you can't always depend on prepared statements to persist after closing the connection.
 
Shay>There are also many scenarios where you're not necessarily going to send the same query multiple times in a single connection lifespan, so preparing is again out of the question.

Can you list at least one scenario of that kind, so we can code it into pgbench (or alike) and validate "simple vs prepared" performance?

Again, in a world where prepared statements aren't persisted across connections (e.g. pgbouncer), this scenario is extremely common. Any scenario where you open a relatively short-lived connection and execute something once is problematic - imagine a simple web service which needs to insert a single record into a table.

Shay>And more generally, there's no reason for a basic, non-prepared execution to be slower than it can be.

That's too generic. If the performance for "end-to-end cases" is just fine, then it is not worth optimizing further. Typical application best practice is to reuse SQL text (for both security and performance point of views), so in typical applications I've seen, query text was reused, thus it naturally was handled by server-prepared logic.

I don't see how reusing SQL text affects security in any way.

But here's the more important general point. We're driver developers, not application developers. I don't really know what performance is "just fine" for each of my users, and what is not worth optimizing further. Users may follow best practices, or they may not for various reasons. They may be porting code over from SqlServer, where prepare is almost never used (because SqlServer caches plans implicitly), or they may simply not be very good programmers. The API for executing non-prepared statements is there, we support it and PostgreSQL supports it - it just happens to not be very fast. Of course we can say "screw everyone not preparing statements", but that doesn't seem like a very good way to treat users. Especially since the fix isn't that hard.

Let me highlight another direction: current execution of server-prepared statement requires some copying of "parse tree" (or whatever). I bet it would be much better investing in removal of that copying rather than investing into "make one-time queries faster" thing. If we could make "Exec" processing faster, it would immediately improve tons of applications.

I don't understand what exactly you're proposing here, but if you have some unrelated optimization that would speed up prepared statements, by all means that's great. It's just unrelated to this thread.
 
Shay>Of course we can choose a different query to benchmark instead of SELECT 1 - feel free to propose one (or several).

I've tried pgbench -M prepared, and it is way faster than pgbench -M simple.

Once again: all cases I have in mind would benefit from reusing server-prepared statements. In other words, after some warmup the appication would use just Bind-Execute-Sync kind of messages, and it would completely avoid Parse/Describe ones.

Of course that's the ideal scenario. It's just not the *only* scenario for all users - they may either not have prepared statements persisting across open/close as detailed above, or their code may simply not be preparing statements at the moment. Why not help them out for free?

Shay>FYI in Npgsql specifically describe isn't used to get any knowledge about parameters - users must populate the correct parameters or query execution fails.

I think the main reason to describe for pgjdbc is to get result oids. pgjdbc is not "full binary", thus it has to be careful which fields it requests in binary format.
That indeed slows down "unknown queries", but as the query gets reused, pgjdbc switches to server-prepared execution, and eliminates parse-describe overheads completely.

Npgsql has a very different architecture. It does not attempt to decide on its own when to prepare statements and when not to - it's a very simple driver that does exactly what the user requests. If the user requests that a statement is prepared, it's prepared, otherwise it's not prepared.

FYI, if it's *result* OIDs you're concerned with, the Npgsql solution is to always request binary results, unless the user explicitly requests certain fields in text (via a special API). This means Npgsql never breaks the query into two roundtrips - Parse/Bind/Describe/Execute/Sync are always sent in a single roundtrip.

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Proposal for CSN based snapshots
Next
From: Amit Kapila
Date:
Subject: Re: [sqlsmith] Failed assertion in joinrels.c