Re: [HACKERS] Pipelining executions to postgresql server - Mailing list pgsql-jdbc

From Craig Ringer
Subject Re: [HACKERS] Pipelining executions to postgresql server
Date
Msg-id 545790D8.9090201@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] Pipelining executions to postgresql server  (Mikko Tiihonen <Mikko.Tiihonen@nitorcreations.com>)
Responses Re: [HACKERS] Pipelining executions to postgresql server  (Mikko Tiihonen <Mikko.Tiihonen@nitorcreations.com>)
List pgsql-jdbc
On 11/02/2014 09:27 PM, Mikko Tiihonen wrote:
> Is the following summary correct:
> - the network protocol supports pipelinings

Yes.

All you have to do is *not* send a Sync message and be aware that the
server will discard all input until the next Sync, so pipelining +
autocommit doesn't make a ton of sense for error handling reasons.

> - the server handles operations in order, starting the processing of next operation only after fully processing the
previousone - thus pipelining is invisible to the server 

As far as I know, yes. The server just doesn't care.

> - libpq driver does not support pipelining, but that is due to internal limitations

Yep.

> - if proper error handling is done by the client then there is no reason why pipelining could be supported by any pg
client

Indeed, and most should support it. Sending batches of related queries
would make things a LOT faster.

PgJDBC's batch support is currently write-oriented. There is no
fundamental reason it can't be expanded for reads. I've already written
a patch to do just that for the case of returning generated keys.

https://github.com/ringerc/pgjdbc/tree/batch-returning-support

and just need to rebase it so I can send a pull for upstream PgJDBC.
It's already linked in the issues documenting the limitatations in batch
support.


If you want to have more general support for batches that return rowsets
there's no fundamental technical reason why it can't be added. It just
requires some tedious refactoring of the driver to either:

- Sync and wait before it fills its *send* buffer, rather than trying
  to manage its receive buffer (the server send buffer), so it can
  reliably avoid deadlocks; or

- Do async I/O in a select()-like loop over a protocol state machine,
  so it can simultaneously read and write on the wire.

I might need to do some of that myself soon, but it's a big (and
therefore error-prone) job I've so far avoided by making smaller, more
targeted changes.

Doing async I/O using Java nio channels is by far the better approach,
but also the more invasive one. The driver currently sends data on the
wire where it generates it and blocks to receive expected data.
Switching to send-side buffer management doesn't have the full
performance gains that doing bidirectional I/O via channels does,
though, and may be a significant performance _loss_ if you're sending
big queries but getting small replies.

For JDBC the JDBC batch interface is the right place to do this, and you
should not IMO attempt to add pipelining outside that interface.
(Multiple open resultsets from portals, yes, but not pipelining of queries).


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-jdbc by date:

Previous
From: Craig Ringer
Date:
Subject: Re: [HACKERS] Pipelining executions to postgresql server
Next
From: Mikko Tiihonen
Date:
Subject: Re: [HACKERS] Pipelining executions to postgresql server