Re: Slowness of extended protocol - Mailing list pgsql-hackers

From Shay Rojansky
Subject Re: Slowness of extended protocol
Date
Msg-id CADT4RqCkc6cbRfvTQCqgy2N75Z-DA-LHZhPfTJj1e74wxcnr7g@mail.gmail.com
Whole thread Raw
In response to Re: Slowness of extended protocol  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
Responses Re: Slowness of extended protocol  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
Re: Slowness of extended protocol  (Dave Cramer <pg@fastcrypt.com>)
List pgsql-hackers


On Thu, Aug 11, 2016 at 1:22 PM, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:

2) The driver can use safepoints and autorollback to the good "right before failure" state in case of a known failure. Here's the implementation: https://github.com/pgjdbc/pgjdbc/pull/477
As far as I can remember, performance overheads are close to zero (no extra roundtrips to create a safepoint)

What? Do you mean you do implicit savepoints and autorollback too? How does the driver decide when to do a savepoint? Is it on every single command? If not, commands can get lost when an error is raised and you automatically roll back? If you do a savepoint on every single command, that surely would impact performance even without extra roundtrips...?

You seem to have a very "unique" idea of what a database driver should do under-the-hood for its users. At the very least I can say that your concept is very far from almost any database driver I've seen up to now (PostgreSQL JDBC, psycopg, Npgsql, libpq...). I'm not aware of other drivers that implicitly prepare statements, and definitely of no drivers that internally create savepoints and roll the back without explicit user APIs. At the very least you should be aware (and also clearly admit!) that you're doing something very different - not necessarily wrong - and not attempt to impose your ideas on everyone as if it's the only true way to write a db driver.

3) Backend could somehow invalidate prepared statements, and notify clients accordingly. Note: the problem is hard in a generic case, however it might be not that hard if we fix well-known often-used cases like "a column is added". That however, would add memory overheads to store additional maps like "table_oid -> statement_names[]" 

Assuming your driver supports batching/pipelining (and I hope it does), that doesn't make sense. Say I send a 2-statement batch, with the first one a DDL and with the second one some prepared query invalidated by the first. When your DDL is executed by PostgreSQL your hypothetical notification is sent. But the second query, which should be invalidated, has already been sent to the server (because of batching), and boom.

4) Other. For instance, new message flow so frontend and backend could re-negotiate "binary vs text formats for the new resulting type". Or "ValidatePreparedStatement" message that would just validate the statement and avoid killing the transaction if the statement is invalid. Or whatever else there can be invented.

When would you send this ValidatePreparedStatement? Before each execution as a separate roundtrip? That would kill performance. Would you send it in the same packet before Execute? In that case you still get the error when Execute is evaluated...

There really is no solution to this problem within the current PostgreSQL way of doing things - although of course we could reinvent the entire PostgreSQL protocol here to accommodate for your special driver...

The basic truth is this... In every db driver I'm familiar with programmers are expected to manage query preparation on their own. They're supposed to do it based on knowledge only they have, weighing pros and cons. They have responsibility over their own code and they don't outsource major decisions like this to their driver. When they get an error from PostgreSQL, it's triggered by something they did in a very explicit and clear way - and they therefore have a good chance to understand what's going on. They generally don't get errors triggered by some under-the-hood magic their driver decided to do for them, and which are hard to diagnose and understand. Their drivers are simple, predictable and lean, and they don't have to include complex healing logic to deal with errors they themselves triggered with under-the-hood logic (e.g. implicit savepoints).
 
Shay>So the general point is that the existence of pools is problematic for the argument "always prepare for recurring statements".

So what?
Don't use pools that issue "discard all" or configure them accordingly. That's it.
In Java world, no wildly used pool defaults to "discard everything" strategy.

I don't know much about the Java world, but both pgbouncer and pgpool (the major pools?) send DISCARD ALL by default. That is a fact, and it has nothing to do with any bugs or issues pgbouncer may have. I'm tempted to go look at other pools in other languages but frankly I don't think that would have any effect in this conversation...

pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: Heap WARM Tuples - Design Draft
Next
From: Petr Jelinek
Date:
Subject: Re: Logical Replication WIP