Re: Slowness of extended protocol - Mailing list pgsql-hackers

From Vladimir Sitnikov
Subject Re: Slowness of extended protocol
Date
Msg-id CAB=Je-GV1eAZyK=196WhCEgYKL0JouCRgCrujmu7NntTusR1Lw@mail.gmail.com
Whole thread Raw
In response to Re: Slowness of extended protocol  (Shay Rojansky <roji@roji.org>)
Responses Re: Slowness of extended protocol  (Shay Rojansky <roji@roji.org>)
Re: Slowness of extended protocol  (Tatsuo Ishii <ishii@sraoss.co.jp>)
List pgsql-hackers
Shay>I don't know much about the Java world, but both pgbouncer and pgpool (the major pools?)

In Java world, https://github.com/brettwooldridge/HikariCP is a very good connection pool.
Neither pgbouncer nor pgpool is required.
The architecture is:  application <=> HikariCP <=> pgjdbc <=> PostgreSQL

The idea is HikariCP pools pgjdbc connections, and server-prepared statements are per-pgjdbc connection, so there is no reason to "discard all" or "deallocate all" or whatever.

Shay>send DISCARD ALL by default. That is a fact, and it has nothing to do with any bugs or issues pgbouncer may have.

That is configurable. If pgbouncer/pgpool defaults to "wrong configuration", why should we (driver developers, backend developers) try to accommodate that?

ShayWhat? Do you mean you do implicit savepoints and autorollback too? 

I mean that.
On top of that it enables opt-in psql-like ON_ERROR_ROLLBACK functionality so it could automatically roll back the latest statement if it failed.
For instance, that might simplify migration from other DBs that have the same "rollback just one statement on error" semantics by default.

Shay>How does the driver decide when to do a savepoint?

By default it would set a savepoint in a case when there's open transaction, and the query to be executed has been previously described.

In other words, the default mode is to protect user from "cached plan cannot change result type" error.

Shay>Is it on every single command?

Exactly (modulo the above). If user manually sets "autosave=always", every command would get prepended with a savepoint and rolled back.

Shay>f you do a savepoint on every single command, that surely would impact performance even without extra roundtrips...?

My voltmeter says me that the overhead is just 3us for a simple "SELECT" statement (see https://github.com/pgjdbc/pgjdbc/pull/477#issuecomment-239579547 ).
I think it is acceptable to have it enabled by default, however it would be nice if the database did not require a savepoint dance to heal its "not implemented" "cache query cannot change result type" error.


Shay>I'm not aware of other drivers that implicitly prepare statements,
Shay >and definitely of no drivers that internally create savepoints and
Shay> roll the back without explicit user APIs

Glad to hear that.
Are you aware of other drivers that translate "insert into table(a,b,c) values(?,?,?)" =into=> "insert into table(a,b,c) values(?,?,?),(?,?,?),...,(?,?,?)" statement on the fly?

That gives 2-3x performance boost (that includes client result processing as well!) for batched inserts since "bind/exec" message processing is not that fast. That is why I say that investing into "bind/exec performance" makes more sense than investing into "improved execution of non-prepared statements".

Shay>you're doing something very different - not necessarily wrong - and not
Shay>attempt to impose your ideas on everyone as if it's the only true way
Shay>to write a db driver.

1) Feel free to pick ideas you like

2) I don't say "it is the only true way". I would change my mind if someone would suggest better solution. Everybody makes mistakes, and I have no problem with admitting that "I made a mistake" and moving forward.
They say: "Don't cling to a mistake just because you spent a lot of time making it"

However, no-one did suggest a case when application issues lots of unique SQL statements.
The suggested alternative "a new message for non-prepared extended query" might shave 5-10us per query for those who are lazy to implement a query cache. However, that is just 5-10ms per 1000 queries. Would that be noticeable by the end-users? I don't think so.

Having a query cache can easily shave 5-10+ms for each query, that is why I suggest driver developers to implement a query cache first, and only then ask for new performance-related messages.

3) For "performance related" issues, a business case and a voltmeter is required to prove there's an issue.


Shay>But the second query, which should be invalidated, has already been
Shay>sent to the server (because of batching), and boom

-- Doctor, it hurts me when I do that
-- Don't do that

When doing batched SQL, some of the queries might fail with "duplicate key", or "constraint violation". There's already API that covers those kind of cases and reports which statements did succeed (if any).
In the case as you described (one query in a batch somehow invalidates the subsequent one) it would just resort to generic error handling.


Shay>When would you send this ValidatePreparedStatement?
Shay>Before each execution as a separate roundtrip?
Shay>That would kill performance. 

Why do you think the performance would degrade? Once again: the current problem is the client thinks it knows "which columns will be returned by a particular server-prepared statement" but the table might get change behind the scenes (e.g. online schema upgrade), so the error occurs. That "return type" is already validated by the database (the time is already spent on validation), so there's is virtually no additional work.

So it could be fine if a separate validate message was invented or just some connection property that would make "cached query cannot change result type" a non-fatal error (as in "it would not kill the transaction").

For instance: "in case of cached query cannot change result type error the backend would skip processing the subsequent bind/exec kind of messages for the problematic statement, and it will keep the transaction in open state. The client would have to re-send relevant prepare/bind/execute". That would prevent user from receiving the error and that approach does not require savepoints.

This particular topic "how to fix <<cannot change result type>> error" might be better to be split into its own conversation.

Vladimir

pgsql-hackers by date:

Previous
From: Victor Wagner
Date:
Subject: Re: handling unconvertible error messages
Next
From: Dave Cramer
Date:
Subject: Re: Slowness of extended protocol