Re: Slowness of extended protocol - Mailing list pgsql-hackers

From Shay Rojansky
Subject Re: Slowness of extended protocol
Date
Msg-id CADT4RqAvEJrJRxWNfcw=99mAmM+NQ0QEpV9CYC8+oJ2FP5-+WQ@mail.gmail.com
Whole thread Raw
In response to Re: Slowness of extended protocol  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
List pgsql-hackers
Halfway through this mail I suddenly understood something, please read all the way down before responding...

On Tue, Aug 16, 2016 at 2:16 PM, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
Shay> your analogy breaks down. Of course L2 was invented to improve
performance,
Shay> but that doesn't mean that all caches are the same. More precisely, what I
Shay> find objectionable about your approach is not any caching - it's the
Shay> implicit or automatic preparation of statements. This practice isn't
Shay> invisible in that a) it may cause errors that wouldn't have been there
Shay> otherwise (e.g. because of DDL),

Long-lived named server-prepared statements cause problems even if
server-prepared statements are created manually by developers.

Could you please stop saying "automatic preparation causes ~DDL issues"?

I never said that... As I've said many times, the problem is errors caused by something the user never asked for. If I server-prepare a statement and then get an error, it's a result of my own action.

Shay> As I said above, I think this is a critical point of misunderstand between
Shay> us. The developers tells the driver which statements should be
Shay> server-prepared by calling .prepareStatement(). I'm guessing you have a
Shay> totally different understanding here.

Please, quote the document you got that "developers tell the driver which
statements should be server-prepared by calling ..." from. It never
works like that.
Neither in Java, nor in C#. I would admit I've no C# experience, but I did
find documentation on IDbCommand.Prepare() and examined it.

The proper way to say is "by calling .Prepare() developer passes the
intention that
he might be using the same query multiple times".
That is it. It never means "driver must absolutely use server-prepare
in the response
to .Prepare() call".

The same goes for Java's PreparedStatement.
It never means "the driver must use server-prepared features".

As Microsoft lists in the .Prepare() documentation, modern versions of
MSSQL just ignore .Prepare() and cache statements automatically.

It is not a developer's business which statements should be in the
database cache.
Neither developer should care which statements reside in the driver cache.

I'm really baffled here.

First, I never said prepared statements *must* be server-prepared. You're completely correct that databases APIs don't *require* this, because they by definition cover many databases and drivers. In Sqlite there's no server, so there can be no server-prepared statement.

However, where there *is* a server which supports prepared statements as an optimization, it's completely unthinkable to me that a driver wouldn't implement prepare as server-prepare. Nobody forces you to do it - it just seems unthinkable to do otherwise. This reason for this is that if server-prepared statements are supported by your database, we expect them to be a significant optimization (otherwise why would they exist), and therefore not using them when the user calls "prepare" seems like... foolishness. In other words, whatever client-side "precompilation" or other optimization is possible is probably going to be negligible when compared to server-preparation (this seems to be the case with PostgreSQL at the very least), so why *not* map the database API's prepare method to server-prepared statements?

I'm going to requote the API note which you quoted above on Connection.prepareStatement (https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html):

> This method is optimized for handling parametric SQL statements that benefit from precompilation. If the driver supports precompilation, the method prepareStatement will send the statement to the database for precompilation. Some drivers may not support precompilation.

Again, my understanding of English may be flawed, or maybe my logic circuits are malfunctioning. But I read this the following way:
1. preparedStatement is about precompilation.
2. If a driver supports precompilation (i.e. preparation), "prepareStatement will send the statement to the *database* for precompilation". Note that the API explicitly mentioned sending *to the database* - server preparation...
3. A driver may not support precompilation (i.e. preparation). This could be because it simply hasn't implemented it yet, or because the backend doesn't support it, or for any other reason. In this case it's a noop, which doesn't really change anything in this discussion.

A compliant implementation (that is a driver) could just assemble full SQL
by concatenating the parameters on each execution and send it via 'Q' simple
execute message.

I think I may have understood the problem here - there's definitely a Java vs. C# issue difference this conversation.

From reading the Java docs, I now realize that JDBC only seems to support parameters in prepared statements. In other words, the parameters capability is coupled with the capability of precompilation, i.e. the ability to execute the same query multiple times with better performance. In ADO.NET things are different. You can use parameters on a statement regardless of whether it's prepared or not. In other words, the *only* use of preparation is to speed up frequently-used statements - this makes it very natural to understand preparation as "server preparation".

So I now understand why in JDBC it could make sense for some prepared statements to be server-prepared, and for others not to. The JDBC API design seems very problematic to me - there are perfectly good reasons to use parameters without preparation (prevent SQL injection) and also the other way around, so coupling them doesn't seem like a good idea. But that isn't really relevant to this conversation.

I hope this also makes you understand why, at least in ADO.NET, it makes perfect sense to simply understand preparation as server-preparation - the only reason prepare exists is to improve performance. 
 
I'm going to skip most of the CPU reordering/caching arguments here as I don't think they're relevant. Not everything is the same as CPU registers, and we can find different behaviors depending on the components and layer we choose to compare with. There are many scenarios in which programmers are expected to specify caching explicitly. I'm going to leave this behind.
 
0) Microsoft's documentation on ".Prepare()" says "prepare call does not have
any effect since sql server optimizes the statements automatically".
So applications ported from MSSQL might miss ".Prepare()" altogether,
thus it will
under-perform and blame PostgreSQL for no reason.
Does it sound fair? I do not think so.

Of course it's fair. They're porting code from one database to another, it's their responsibility to do the work. There are many other aspects besides preparation which would perform better in one database and worst in another. SQL itself differs from database to database, so it's totally unrealistic to expect the same code to simply run. The same goes for your Oracle example.

Regarding "regardless...by CPU or not", you are off.
I've listed a link to a concurrency bug in Linux kernel.
It manifested itself only on some recent Intel CPUs.
In other words, the kernel was fine otherwise, but if running modern
CPU, it might hang.
Of course it was bug in the kernel, but the trigger condition was
"smarter CPU" that
was able to do more "reorderings" that were possible with older ones.

Again, as I wrote, the bug was already there. Reordering just made it manifest - it could in theory have been manifested otherwise. You need to separate between a problem the programmer is responsible for, and a problem someone else is responsible for, regardless of why it's manifested.

Shay> It does only if you do it in a roundtrip of its own. When you close a
Shay> pooled connection in Npgsql, the reset query is written to an internal
Shay> buffer but not sent. The first query that actually gets sent by the user
Shay> after opening will therefore have the reset query prepended to it
Shay> (basically the reset query is batched)

That is clever.
Is it something specific to Npgsql pool? Does it work with all poolers
that operate
on top of Npgsql connections?

Thanks. It's specific to the Npgsql pool, which is also in-process. ADO.NET tends heavily to in-process pools, and also couples the pool to the driver - I'm not aware of driver-independent pools which you can use with any driver, like in Java. Basically if you use Npgsql, you use the Npgsql pool.

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Intermittent "cache lookup failed for type" buildfarm failures
Next
From: Dmitry Igrishin
Date:
Subject: Re: [GENERAL] C++ port of Postgres