Re: Slowness of extended protocol - Mailing list pgsql-hackers

From Shay Rojansky
Subject Re: Slowness of extended protocol
Date
Msg-id CADT4RqA7Np5ZwcDKNzDjxN9PJYpdiPaXaSy_K=W9+1HLHgKREQ@mail.gmail.com
Whole thread Raw
In response to Re: Slowness of extended protocol  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
Responses Re: Slowness of extended protocol  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
List pgsql-hackers

Vladimir wrote:

Shay>I don't know much about the Java world, but both pgbouncer and pgpool (the major pools?)

In Java world, https://github.com/brettwooldridge/HikariCP is a very good connection pool.
Neither pgbouncer nor pgpool is required.
The architecture is:  application <=> HikariCP <=> pgjdbc <=> PostgreSQL

The idea is HikariCP pools pgjdbc connections, and server-prepared statements are per-pgjdbc connection, so there is no reason to "discard all" or "deallocate all" or whatever.

Shay>send DISCARD ALL by default. That is a fact, and it has nothing to do with any bugs or issues pgbouncer may have.

That is configurable. If pgbouncer/pgpool defaults to "wrong configuration", why should we (driver developers, backend developers) try to accommodate that?

In a nutshell, that sentence represents much of the problem I have with this conversation: you simply consider that anything that's different from your approach is simply "wrong".

Shay>f you do a savepoint on every single command, that surely would impact performance even without extra roundtrips...?

My voltmeter says me that the overhead is just 3us for a simple "SELECT" statement (see https://github.com/pgjdbc/pgjdbc/pull/477#issuecomment-239579547 ).
I think it is acceptable to have it enabled by default, however it would be nice if the database did not require a savepoint dance to heal its "not implemented" "cache query cannot change result type" error.

That's interesting (I don't mean that ironically). Aside from simple client-observed speed which you're measuring, are you sure there aren't "hidden" costs on the PostgreSQL side for generating so many implicit savepoints? I don't know anything about PostgreSQL transaction/savepoint internals, but adding savepoints to each and every statement in a transaction seems like it would have some impact. It would be great to have the confirmation of someone who really knows the internals.
 
Shay>I'm not aware of other drivers that implicitly prepare statements,
Shay >and definitely of no drivers that internally create savepoints and
Shay> roll the back without explicit user APIs

Glad to hear that.
Are you aware of other drivers that translate "insert into table(a,b,c) values(?,?,?)" =into=> "insert into table(a,b,c) values(?,?,?),(?,?,?),...,(?,?,?)" statement on the fly?

That gives 2-3x performance boost (that includes client result processing as well!) for batched inserts since "bind/exec" message processing is not that fast. That is why I say that investing into "bind/exec performance" makes more sense than investing into "improved execution of non-prepared statements".

I'm not aware of any, but I also don't consider that part of the driver's job. There's nothing your driver is doing that the application developer can't do themselves - so your driver isn't faster than other drivers. It's faster only when used by lazy programmers.

What you're doing is optimizing developer code, with the assumption that developers can't be trusted to code efficiently - they're going to write bad SQL and forget to prepare their statements. That's your approach, and that's fine. The other approach, which is also fine, is that each software component has its own role, and that clearly-defined boundaries of responsibility should exist - that writing SQL is the developer's job, and that delivering it to the database is the driver's job. To me this separation of layers/roles is key part of software engineering: it results in simpler, leaner components which interact in predictable ways, and when there's a problem it's easier to know exactly where it originated. To be honest, the mere idea of having an SQL parser inside my driver makes me shiver.

The HikariCP page you sent (thanks, didn't know it) contains a great Dijkstra quote that summarizes what I think about this - "Simplicity is prerequisite for reliability.". IMHO you're going the opposite way by implicitly rewriting SQL, preparing statements and creating savepoints.

But at the end of the day it's two different philosophies. All I ask is that you respect the one that isn't yours, which means accepting the optimization of non-prepared statements. There's no mutual exclusion.
 
2) I don't say "it is the only true way". I would change my mind if someone would suggest better solution. Everybody makes mistakes, and I have no problem with admitting that "I made a mistake" and moving forward.
They say: "Don't cling to a mistake just because you spent a lot of time making it"

:)

So your way isn't the only true way, but others are just clinging to their mistakes... :)
 
However, no-one did suggest a case when application issues lots of unique SQL statements.

We did, you just dismissed or ignored them.

3) For "performance related" issues, a business case and a voltmeter is required to prove there's an issue.


Shay>But the second query, which should be invalidated, has already been
Shay>sent to the server (because of batching), and boom 
 
When doing batched SQL, some of the queries might fail with "duplicate key", or "constraint violation". There's already API that covers those kind of cases and reports which statements did succeed (if any).
In the case as you described (one query in a batch somehow invalidates the subsequent one) it would just resort to generic error handling.

You claimed that this condition could be handled transparently between the driver and the backend, without disrupting the application with an error. You proposed some new invalidation protocol between the client and the server, but for batches this fails and the application does get an error. That's all I wanted to say - that transparent handling of this error is difficult, probably impossible without serious changes to the PostgreSQL protocol...

This is very different from failure because of "duplicate key" or "constraint violation" - your error is a result of something the driver did implicitly, after 5 executions, without the user's request or knowledge.

From my side, I think this conversation has definitely reached a dead end (although I can continue responding if you think it would be of any value). There are two software design philosophies here, and the discussion isn't really even about databases or drivers. For the record, I do recognize that your approach has some merit (even if in the grand scheme of things I think it isn't the right way to design software components).

Also for the record, I do intend to implement implicit query preparation, but as an opt-in mechanism. I'm going to so not because I think developers shouldn't be trusted to code efficiently, but because there are situations outside of the developers control which could greatly benefit from it (i.e. ORMs, large existing codebase that can't easily be changed to use prepare). I'm going to recommend developers keep this feature off, unless they have no other choice.

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: No longer possible to query catalogs for index capabilities?
Next
From: Jeff Janes
Date:
Subject: Undiagnosed bug in Bloom index