Re: Slowness of extended protocol - Mailing list pgsql-hackers

From Shay Rojansky
Subject Re: Slowness of extended protocol
Date
Msg-id CADT4RqDaim7Jx-Ob+qf8n6eoTUER9xD8v3K9PCJOht7ahobUTA@mail.gmail.com
Whole thread Raw
In response to Re: Slowness of extended protocol  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
Responses Re: Slowness of extended protocol  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
List pgsql-hackers


On Mon, Aug 8, 2016 at 6:44 PM, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
The problem with "empty statement name" is statements with empty name can be reused (for instance, for batch insert executions), so the server side has to do a defensive copy (it cannot predict how many times this unnamed statement will be used).

That seems right.

Also, part of the point here is to reduce the number of protocol messages needed in order to send a parameterized query - not to have to do Parse/Describe/Bind/Execute/Sync - since part of the slowdown comes from that (although I'm not sure how much). Your proposal keeps the 5 messages.

As my benchmarks show, notable overhead is due to "defensive copying of the execution plan". So I would measure first, and only then would claim where the overhead is. 

Some more profiling is required to tell which part is a main time consumer.

Tom also pointed to the caching as the main culprit, although there seems to be some message-related overhead as well. It seems that things like process title changes may be fixable easily - do we really need a process title change on every message (as opposed to, say, execute messages only). This profiling and optimization effort can happen in parallel to the discussion on what to do with the execution plan caching.
 
Technically speaking, I would prefer to have a more real-life looking example instead of SELECT 1.
Do you have something in mind?
For instance, for more complex queries, "Parse/Plan" could cost much more than we shave by adding "a special non-cached statement name" or by reducing "5 messages into 1".

There's a side problem: describe message requires full roundtrip since there are cases when client needs to know how to encode parameters. Additional roundtrip hurts much worse than just an additional message that is pipelined (e.g. sent in the same TCP packet).

This is true, but there doesn't seem to be anything we can do about it - if your usage relies on describe to get information on parameters (as opposed to results), you're stuck with an extra roundtrip no matter what. So it seems you have to use the extended protocol anyway.

FYI in Npgsql specifically describe isn't used to get any knowledge about parameters - users must populate the correct parameters or query execution fails.
 
Note: it is quite easy to invent a name that is not yet used in the wild, so it is safe.

That's problematic, how do you know what's being used in the wild and what isn't? The protocol has a specification, it's very problematic to get up one day and to change it retroactively. But again, the empty statement seems to already be there for that.

Empty statement has different semantics, and it is wildly used.
For instance, pgjdbc uses unnamed statements a lot.
On the other hand, statement name of "!pq@#!@#42" is rather safe to use as a special case.
Note: statement names are not typically created by humans (statement name is not a SQL), and very little PG clients do support named statements.

IMHO this simply isn't the kind of thing one does in a serious protocol of a widely-used product, others seem to agree on this.

> Sir, any new SQL keyword is what you call a "retroactively defining special semantics".
> It's obvious that very little current clients do use named server-prepared statements.
> Statement names are not something that is provided by the end-user in a web page, so it is not a rocket science to come up with a
> statement name that is both short and "never ever used in the wild".

The difference is that before the new SQL keyword is added, trying to use it results in an error. What you're proposing is taking something that already works in one way and changing its behavior.

> Shay, can you come up with a real-life use case when those "I claim the statement will be used only once" is would indeed improve performance?
> Or, to put it in another way: "do you have a real-life case when simple protocol is faster than extended protocol with statement reuse"?

> I do have a couple of java applications and it turns out there's a huge win of reusing server-prepared statements.
> There's a problem that "generic plan after 5th execution might be much worse than a specific one", however those statements are not often
> and I just put hints to the SQL (limit 0, +0, CTE, those kind of things).

I maintain Npgsql, which is a general-purpose database driver and not a specific application. The general .NET database API (ADO.NET), like most (all?) DB APIs, allows users to send a simple statement to the database (ExecuteReader, ExecuteScalar, ExecuteNonQuery). Every time a user uses these APIs without preparing, they pay a performance penalty because the extended protocol has more overhead than the simple one.

Obviously smart use of prepared statements is a great idea, but it doesn't work everywhere. There are many scenarios where connections are very short-lived (think about webapps where a pooled connection is allocated per-request and reset in between), and the extra roundtrip that preparing entails is too much. There are also many scenarios where you're not necessarily going to send the same query multiple times in a single connection lifespan, so preparing is again out of the question. And more generally, there's no reason for a basic, non-prepared execution to be slower than it can be.

Of course we can choose a different query to benchmark instead of SELECT 1 - feel free to propose one (or several).

FWIW I do agree at this stage we should probably benchmark a bit more and see where we can get by optimizing the extended protocol itself (which is a good idea regardless). If it becomes clear that it's impossible to reach performance that's similar to the simple query, we can look into adding a new protocol message (as I think the special-statement-name solution has generally been rejected). I'm not sure I'll have the time to go into this in the very near future (and it would be my first time really diving into PostgreSQL internals) but if nobody else wants to I'll try.

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Logical Replication WIP
Next
From: Craig Ringer
Date:
Subject: Re: Logical Replication WIP