Re: Slowness of extended protocol - Mailing list pgsql-hackers

From Shay Rojansky
Subject Re: Slowness of extended protocol
Date
Msg-id CADT4RqALgnR0WaomyNWiJ0hZS+C=YXQ6u5aOjKsaUkRX_JmoGg@mail.gmail.com
Whole thread Raw
In response to Re: Slowness of extended protocol  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
Responses Re: Slowness of extended protocol  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
List pgsql-hackers
Some comments...

For the record, I do find implicit/transparent driver-level query preparation interesting and potentially useful, and have opened https://github.com/npgsql/npgsql/issues/1237 to think about it - mostly based on arguments on this thread. One big question mark I have is whether this behavior should be opt-out as in pgjdbc, rather than opt-in. Like others in this thread (I think), I prefer my libraries and drivers very simple, following my exact API calls to the letter and doing minimal magic under the hood. As Tom said, if an application can benefit from preparing, the developer has the responsibility (and also the best knowledge) to manage preparation, not the driver. Magical behavior under the hood causes surprises, hard-to-diagnose bugs etc.

One interesting case where implicit preparation is unbeatable, though, is when users aren't coding against the driver directly but are using some layer, e.g. an ORM. For example, the Entity Framework ORM simply does not prepare statements (see https://github.com/aspnet/EntityFramework/issues/5459 which I opened). This is one reason I find it a valuable feature to have, although probably as opt-in and not opt-out.

Regarding driver-level SQL parsing, Vladimir wrote:

> Unfortunately, client-side SQL parsing is inevitable evil (see below on '?'), so any sane driver would cache queries any way. The ones that do not have query cache will perform badly anyway.

First, it's a very different thing to have a query cache in the driver, and to implicitly prepare statements.

Second, I personally hate the idea that drivers parse SQL and rewrite queries. It's true that in many languages database APIs have "standardized" parameter placeholders, and therefore drivers have no choice but to rewrite. ADO.NET (the .NET database API) actually does not do this - parameter placeholders are completely database-dependent (see https://msdn.microsoft.com/en-us/library/yy6y35y8%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396). Npgsql does rewrite queries to provide named parameters, but the next version is going to have a "raw query" mode in which no parsing/rewriting happens whatsoever. It simply takes your string, packs it into a Parse (or Query), and sends it off. This is not necessarily going to be the main way users use Npgsql, but it's important to note that query parsing and rewriting isn't an "inevitable evil".

Vladimir wrote:

> This new message would be slower than proper query cache, so why should we all spend time on a half-baked solution?

The cases where the prepared statement path doesn't work have already been listed many times - pgbouncer (and other pools), drivers which don't support persisting prepared statement across pooled connection open/close, and people (like many in this conversation) who don't appreciate their drivers doing magic under the hood. This is why I thing both proposals are great - there's nothing wrong with query caching (or more precisely, implicit statement preparation by the driver), especially if it's opt-in, but that doesn't mean we shouldn't optimize things for people who don't, can't or won't go for that.

To be fair, implementing a cache is a trivial thing when compared with hard-coding binary/text formats for all the datatypes in each and every language.
> Remember, each driver has to implement its own set of procedures to input/output values in text/binary format, and that is a way harder than implementing the cache we are talking about.

I agree with that, but it's not a question of how easy it is to implement implicit preparation. It's a question of whether driver developers choose to do this, based on other considerations as well - many people here think it's not the job of the driver to decide for you whether to prepare or not, for example. It has nothing to do with implementation complexity.

> "cached plan cannot change result type" -- PostgreSQL just fails to execute the server-prepared statement if a table was altered.

That alone seems to be a good reason to make implicit preparation opt-in at best. How exactly do users cope with this in pgjdbc? Do they have some special API to flush (i.e. deallocate) prepared statements which they're supposed to use after a DDL? Do you look at the command tag in the CommandComplete to know whether a command was DDL or not?

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Heap WARM Tuples - Design Draft
Next
From: Claudio Freire
Date:
Subject: Re: Heap WARM Tuples - Design Draft