Thread: Pipelining executions to postgresql server

Pipelining executions to postgresql server

From
Mikko Tiihonen
Date:
Hi,

I created a proof of concecpt patch for postgresql JDBC driver that allows the caller to do pipelining of requests
withina transaction. The pipelining here means same as for HTTP: the client can send the next execution already before
waitingfor the response of the previous request to be fully processed. 

The goal is to reduce the effects of latency between server and client. The pipelining allowed my test with localhost
postgresqland jdbc test that queries a single value from database 200 times to get a more than 20% speed-up.
The pipeliningversion processes the responses every 10 queries. With actual latency between the server and client
largerspeed-ups are of course possible. 

I think pipelining + jsonb support would make postgresql even more competive key/value and document store.

Example use case:
1) insert to shopping cart table, and 3 inserts shopping cart rows table in one pipeline.
  - only one round trip penalty instead of 4
2) query shopping cart row and query shopping cart rows in one pipeline
  - only one round trip penalty instead of 2

The only alternative way to reduce the round-trip latency is to make every operation in single round-trip and that can
onlybe done with pl functions and by passing in more complex objects, for example the whole shopping cart with rows as
jsondata. 

What kind of problems could pipelining cause (assuming we limit it to rather simple use cases only)?

-Mikko
Attachment

Re: [HACKERS] Pipelining executions to postgresql server

From
Tom Lane
Date:
Mikko Tiihonen <Mikko.Tiihonen@nitorcreations.com> writes:
> I created a proof of concecpt patch for postgresql JDBC driver that allows the caller to do pipelining of requests
withina transaction. The pipelining here means same as for HTTP: the client can send the next execution already before
waitingfor the response of the previous request to be fully processed. 

In principle this can work if you think through the error handling
carefully.  The way the FE/BE protocol is intended to handle the case
is that all the pipelined statements are part of one transaction so
that they all succeed or fail together.  Otherwise the user has to work
through and predict the outcome of all the combinations of "Q1 succeeded,
Q2 failed, Q3 succeeded" which is mighty error-prone, especially if the
commands are interdependent in any way at all.  Having legislated that
(and forbidden any transaction-boundary commands in a pipelined group),
what you do is not issue a Sync until after the last command of the set.
Then, if any command fails, the backend just automatically discards
remaining messages until it gets the Sync, and so you can safely issue
new commands before knowing the results of the previous ones.

I mention this because, although I don't know the innards of the JDBC
driver at all, it sure doesn't look like that's the way you've done it.
If I'm right that you've left the error recovery strategy to the user,
I think this is gonna be very hard to use safely.

            regards, tom lane


Re: [HACKERS] Pipelining executions to postgresql server

From
Andres Freund
Date:
On 2014-11-01 14:04:05 +0000, Mikko Tiihonen wrote:
> I created a proof of concecpt patch for postgresql JDBC driver that
> allows the caller to do pipelining of requests within a
> transaction. The pipelining here means same as for HTTP: the client
> can send the next execution already before waiting for the response of
> the previous request to be fully processed.

Slightly confused here. To my knowledge the jdbc driver already employs
some pipelining? There's some conditions where it's disabled (IIRC
RETURNING for DML is one of them), but otherwise it's available.

I'm very far from a pgjdbc expert, but that's what I gathered from the
code when investigating issues a while back and from my colleague Craig.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Pipelining executions to postgresql server

From
Mikko Tiihonen
Date:
> From: Andres Freund <andres@2ndquadrant.com>
> On 2014-11-01 14:04:05 +0000, Mikko Tiihonen wrote:
> > I created a proof of concecpt patch for postgresql JDBC driver that
> > allows the caller to do pipelining of requests within a
> > transaction. The pipelining here means same as for HTTP: the client
> > can send the next execution already before waiting for the response of
> > the previous request to be fully processed.
>
> Slightly confused here. To my knowledge the jdbc driver already employs
> some pipelining? There's some conditions where it's disabled (IIRC
> RETURNING for DML is one of them), but otherwise it's available.
>
> I'm very far from a pgjdbc expert, but that's what I gathered from the
> code when investigating issues a while back and from my colleague Craig.

Most DB interfaces make the server operations look synchronous.
For JDBC the only standard interface is similar to libpg:
  PGresult *PQexec(PGconn *conn, const char *command);

I should have searched earlier a better reference to libpg. I am planning on adding support for something similar to
http://www.postgresql.org/docs/9.3/static/libpq-async.html
More specifically operations like:
  int PQsendQuery(PGconn *conn, const char *command);
  PGresult *PQgetResult(PGconn *conn);
The Java API will of course be custom to postgresql jdbc driver since there is no official java api for async db
operations.

The above I can do purely on the jdbc driver side. But my my question was about pipelining.
In libpg terms: Is it safe to do multiple PQsendQuery operations before invoking PQgetResult as many times?
It should be, if the server startd to process (read in) the next query only after it has sent the previous response
out.

And do any pg connection poolers support having multiple executions on-the-fly at the same time for the same
connection?

-Mikko

Re: [HACKERS] Pipelining executions to postgresql server

From
Andres Freund
Date:
On 2014-11-02 12:11:49 +0000, Mikko Tiihonen wrote:
> > From: Andres Freund <andres@2ndquadrant.com>
> > On 2014-11-01 14:04:05 +0000, Mikko Tiihonen wrote:
> > > I created a proof of concecpt patch for postgresql JDBC driver that
> > > allows the caller to do pipelining of requests within a
> > > transaction. The pipelining here means same as for HTTP: the client
> > > can send the next execution already before waiting for the response of
> > > the previous request to be fully processed.
> >
> > Slightly confused here. To my knowledge the jdbc driver already employs
> > some pipelining? There's some conditions where it's disabled (IIRC
> > RETURNING for DML is one of them), but otherwise it's available.
> >
> > I'm very far from a pgjdbc expert, but that's what I gathered from the
> > code when investigating issues a while back and from my colleague Craig.
>
> Most DB interfaces make the server operations look synchronous.

You IIRC can use jdbc's batch interface.

> I should have searched earlier a better reference to libpg. I am planning on adding support for something similar to
> http://www.postgresql.org/docs/9.3/static/libpq-async.html
> More specifically operations like:
>   int PQsendQuery(PGconn *conn, const char *command);
>   PGresult *PQgetResult(PGconn *conn);

That doesn't really support pipelining though - you can still only send
one query. It's nonblocking, but that's a different thing.

> In libpg terms: Is it safe to do multiple PQsendQuery operations
> before invoking PQgetResult as many times?

The network protocol allows for pipelining, yes. But libpq unfortunately
doesn't.

You should read the protocol definition.

> It should be, if the server startd to process (read in) the next query
> only after it has sent the previous response out.

There's complexities around error handling and such making it slightly
more complex.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Pipelining executions to postgresql server

From
Mikko Tiihonen
Date:
> > > From: Andres Freund <andres@2ndquadrant.com>
> > > On 2014-11-01 14:04:05 +0000, Mikko Tiihonen wrote:
> > > > I created a proof of concecpt patch for postgresql JDBC driver that
> > > > allows the caller to do pipelining of requests within a
> > > > transaction. The pipelining here means same as for HTTP: the client
> > > > can send the next execution already before waiting for the response of
> > > > the previous request to be fully processed.
> > >
> > > Slightly confused here. To my knowledge the jdbc driver already employs
> > > some pipelining? There's some conditions where it's disabled (IIRC
> > > RETURNING for DML is one of them), but otherwise it's available.
> > >
> > > I'm very far from a pgjdbc expert, but that's what I gathered from the
> > > code when investigating issues a while back and from my colleague Craig.
> >
> > Most DB interfaces make the server operations look synchronous.
>
> You IIRC can use jdbc's batch interface.

Yes, there is a limited batch interface for inserts and updates. But for example when using prepared statements you can
onlydo batches of same statement (with different parameters of course). 

> > I should have searched earlier a better reference to libpg. I am planning on adding support for something similar
to
> > http://www.postgresql.org/docs/9.3/static/libpq-async.html
> > More specifically operations like:
> >   int PQsendQuery(PGconn *conn, const char *command);
> >   PGresult *PQgetResult(PGconn *conn);
>
> The network protocol allows for pipelining, yes. But libpq unfortunately
> doesn't.
> You should read the protocol definition.

I have studied the protocol, that is why I concluded that it would be possible to add support for pipelining for
clients.

> > It should be, if the server startd to process (read in) the next query
> > only after it has sent the previous response out.
>
> There's complexities around error handling and such making it slightly
> more complex.

Are you referring to some complexities on the server side related to error handling or on the client side?

Is the following summary correct:
- the network protocol supports pipelinings
- the server handles operations in order, starting the processing of next operation only after fully processing the
previousone - thus pipelining is invisible to the server 
- libpq driver does not support pipelining, but that is due to internal limitations
- if proper error handling is done by the client then there is no reason why pipelining could be supported by any pg
client

-Mikko

Re: [HACKERS] Pipelining executions to postgresql server

From
Scott Harrington
Date:
>>>> On 2014-11-01 14:04:05 +0000, Mikko Tiihonen wrote:
>>>>> I created a proof of concecpt patch for postgresql JDBC driver that
>>>>> allows the caller to do pipelining of requests within a
>>>>> transaction. The pipelining here means same as for HTTP: the client
>>>>> can send the next execution already before waiting for the response of
>>>>> the previous request to be fully processed.
>>>>
>>>> Slightly confused here. To my knowledge the jdbc driver already employs
>>>> some pipelining? There's some conditions where it's disabled (IIRC
>>>> RETURNING for DML is one of them), but otherwise it's available.
>>>>
>>>> I'm very far from a pgjdbc expert, but that's what I gathered from the
>>>> code when investigating issues a while back and from my colleague Craig.
>>>
>>> Most DB interfaces make the server operations look synchronous.
>>
>> You IIRC can use jdbc's batch interface.
>
> Yes, there is a limited batch interface for inserts and updates. But for
> example when using prepared statements you can only do batches of same
> statement (with different parameters of course).

Hi Mikko,

I am very interested in this.

Indeed JDBC makes you wait for N round-trip delays. If my transaction
needs to to INSERT rows to tables A and B and UPDATE rows in tables C and
D (and then COMMIT), then I'm looking at 4-5 round trips, even with
executeBatch which as you mentioned is made to "look synchronous". If DB
is localhost it's OK, but gets painful if DB is across a LAN hop, and
unusable across a WAN.

As you've observed, there is no corresponding limitation in the FE/BE
protocol, if we can delay sending Sync until after we've sent all the
overlapping Binds and Executes.

I looked over your patch. Your list of ResultHandlerHolders seems to be
the right direction, but as Tom Lane mentioned there may need to be some
way to ensure the statements are all in the same transaction.

Off the top of my head I've sketched out an interface below (using
PGStatement & PGConnection so we don't have to downcast quite as far).
After the "sync" call you could call the original Statement.executeQuery
but it would return immediately. If you attempted to re-use a Statement
(or make additional addBatch calls) between the async() and sync() calls
then you would get an IllegalStateException. This avoids the need for a
Future, and avoids the client having to loop/sleep until done.

/**
  *  This interface defines the public PostgreSQL extensions to
  *  java.sql.Statement. All Statements constructed by the PostgreSQL
  *  driver implement PGStatement.
  */
public interface PGStatement
{
     // ...

     /** Like {@link PreparedStatement#executeQuery()} but no results until {@link PGConnection#sync}. */
     void asyncExecuteQuery();

     /** Like {@link PreparedStatement#executeUpdate()} but no results until {@link PGConnection#sync}. */
     void asyncExecuteUpdate();

     /** Like {@link Statement#executeBatch()} but no results until {@link PGConnection#sync}. */
     void asyncExecuteBatch();
}

/**
  *  This interface defines the public PostgreSQL extensions to
  *  java.sql.Connection. All Connections returned by the PostgreSQL driver
  *  implement PGConnection.
  */
public interface PGConnection
{
     // ...

     /** Wait for all the asynchronous statements to complete. */
     void sync();

     /** Commit the current transaction, and wait for all the asynchronous statements to complete. */
     void commitAndSync();
}


Re: [HACKERS] Pipelining executions to postgresql server

From
Craig Ringer
Date:
On 11/01/2014 10:04 PM, Mikko Tiihonen wrote:
> Hi,
>
> I created a proof of concecpt patch for postgresql JDBC driver that allows the caller to do pipelining of requests
withina transaction. The pipelining here means same as for HTTP: the client can send the next execution already before
waitingfor the response of the previous request to be fully processed. 

... but ... it already does that.

Use batches, and that's exactly what it'll do.

Statement.addBatch(String)
or
PreparedStatement.addBatch()

> What kind of problems could pipelining cause (assuming we limit it to rather simple use cases only)?

Client/server pipleline deadlocks due to how PgJDBC manages it.

See the details:

https://github.com/pgjdbc/pgjdbc/issues/195

and

https://github.com/pgjdbc/pgjdbc/issues/194

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Pipelining executions to postgresql server

From
Craig Ringer
Date:
On 11/02/2014 09:27 PM, Mikko Tiihonen wrote:
> Is the following summary correct:
> - the network protocol supports pipelinings

Yes.

All you have to do is *not* send a Sync message and be aware that the
server will discard all input until the next Sync, so pipelining +
autocommit doesn't make a ton of sense for error handling reasons.

> - the server handles operations in order, starting the processing of next operation only after fully processing the
previousone - thus pipelining is invisible to the server 

As far as I know, yes. The server just doesn't care.

> - libpq driver does not support pipelining, but that is due to internal limitations

Yep.

> - if proper error handling is done by the client then there is no reason why pipelining could be supported by any pg
client

Indeed, and most should support it. Sending batches of related queries
would make things a LOT faster.

PgJDBC's batch support is currently write-oriented. There is no
fundamental reason it can't be expanded for reads. I've already written
a patch to do just that for the case of returning generated keys.

https://github.com/ringerc/pgjdbc/tree/batch-returning-support

and just need to rebase it so I can send a pull for upstream PgJDBC.
It's already linked in the issues documenting the limitatations in batch
support.


If you want to have more general support for batches that return rowsets
there's no fundamental technical reason why it can't be added. It just
requires some tedious refactoring of the driver to either:

- Sync and wait before it fills its *send* buffer, rather than trying
  to manage its receive buffer (the server send buffer), so it can
  reliably avoid deadlocks; or

- Do async I/O in a select()-like loop over a protocol state machine,
  so it can simultaneously read and write on the wire.

I might need to do some of that myself soon, but it's a big (and
therefore error-prone) job I've so far avoided by making smaller, more
targeted changes.

Doing async I/O using Java nio channels is by far the better approach,
but also the more invasive one. The driver currently sends data on the
wire where it generates it and blocks to receive expected data.
Switching to send-side buffer management doesn't have the full
performance gains that doing bidirectional I/O via channels does,
though, and may be a significant performance _loss_ if you're sending
big queries but getting small replies.

For JDBC the JDBC batch interface is the right place to do this, and you
should not IMO attempt to add pipelining outside that interface.
(Multiple open resultsets from portals, yes, but not pipelining of queries).


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Pipelining executions to postgresql server

From
Mikko Tiihonen
Date:
> Craig Ringer wrote:
> On 11/02/2014 09:27 PM, Mikko Tiihonen wrote:
> > Is the following summary correct:
> > - the network protocol supports pipelinings
> Yes.
>
> All you have to do is *not* send a Sync message and be aware that the
> server will discard all input until the next Sync, so pipelining +
> autocommit doesn't make a ton of sense for error handling reasons.

I do not quite grasp why not sending Sync is so important. My proof of concept setup was for queries with autocommit
enabled.
When looking with wireshark I could observe that the client sent 3-10 P/B//D/E/S messages to server, before the server
startedsending the corresponding 1/2/T/D/C/Z replies for each request. Every 10 requests the test application waited
forthe all the replies to come to not overflow the network buffers (which is known to cause deadlocks with current pg
jdbcdriver). 

If I want separate error handling for each execution then shouldn't I be sending separate sync for each pipelined
operation?

> > - the server handles operations in order, starting the processing of next operation only after fully processing the
previousone  
> >    - thus pipelining is invisible to the server
>
> As far as I know, yes. The server just doesn't care.
>
> > - libpq driver does not support pipelining, but that is due to internal limitations
>
> Yep.
>
> > - if proper error handling is done by the client then there is no reason why pipelining could be supported by any
pgclient 
>
> Indeed, and most should support it. Sending batches of related queries
> would make things a LOT faster.
>
> PgJDBC's batch support is currently write-oriented. There is no
> fundamental reason it can't be expanded for reads. I've already written
> a patch to do just that for the case of returning generated keys.
>
> https://github.com/ringerc/pgjdbc/tree/batch-returning-support
>
> and just need to rebase it so I can send a pull for upstream PgJDBC.
> It's already linked in the issues documenting the limitatations in batch
>support.

Your code looked like good. Returning inserts are an important thing to optimize.

> If you want to have more general support for batches that return rowsets
> there's no fundamental technical reason why it can't be added. It just
> requires some tedious refactoring of the driver to either:
>
> - Sync and wait before it fills its *send* buffer, rather than trying
>   to manage its receive buffer (the server send buffer), so it can
>   reliably avoid deadlocks; or
>
> - Do async I/O in a select()-like loop over a protocol state machine,
>   so it can simultaneously read and write on the wire.

I also think the async I/O is the way to go. Luckily that has already been done
in the pgjdbc-ng  (https://github.com/impossibl/pgjdbc-ng), built on top
of netty java NIO library. It has quite good feature parity with the original
pgjdbc driver. I'll try next if I can enable the pipelining with it now that
I have tried the proof of concept with the originial pgjdbc driver.

> I might need to do some of that myself soon, but it's a big (and
> therefore error-prone) job I've so far avoided by making smaller, more
> targeted changes.
>
> For JDBC the JDBC batch interface is the right place to do this, and you
> should not IMO attempt to add pipelining outside that interface.
> (Multiple open resultsets from portals, yes, but not pipelining of queries).

I do not think the JDBC batch interface even allow doing updates to multiple
tables when using prepared statements?

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Pipelining executions to postgresql server

From
Tom Lane
Date:
Mikko Tiihonen <Mikko.Tiihonen@nitorcreations.com> writes:
> I do not quite grasp why not sending Sync is so important. My proof of concept setup was for queries with autocommit
enabled.

The point is that that will be very, very much harder to use than doing
it the other way.  It's fairly easy to reason about the results of
single-transaction pipelined queries: they're all or nothing.  If you
fire off queries that are going to autocommit independently, then you
have to worry about all combinations of success/failure, and you won't
have the opportunity to adjust on the fly.  It'll be very much like
sending a fixed (predetermined) SQL script to the server using a scripting
language that lacks any conditionals.  People certainly do use fixed
scripts sometimes, but they usually find out they want them wrapped into
single transactions, because otherwise they're left with a horrible mess
if the script goes off the rails at all.

            regards, tom lane


Re: [HACKERS] Pipelining executions to postgresql server

From
Craig Ringer
Date:
On 11/04/2014 09:10 AM, Tom Lane wrote:
> Mikko Tiihonen <Mikko.Tiihonen@nitorcreations.com> writes:
>> I do not quite grasp why not sending Sync is so important. My proof of concept setup was for queries with autocommit
enabled.
>
> [snip] It'll be very much like
> sending a fixed (predetermined) SQL script to the server using a scripting
> language that lacks any conditionals.

... which is part of why I think the batch interface for JDBC is the
appropriate UI, and the existing support in PgJDBC just needs to be
extended to support returning result sets.

The JDBC driver supports multiple result sets, and a batch execution
looks just like a procedure that returned multiple result sets (or
rather, a mix of result sets, affected rowcounts, and no-info success
messages).

See

http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#addBatch(java.lang.String)

http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#executeBatch()



The docs are admittedly mostly oriented toward DML, but nothing stops us
returning SUCCESS_NO_INFO and a resultset. Or if we choose, returning a
rowcount and resultset.

It'd be worth testing what other drivers do before doing that, but
nothing in the spec:

https://jcp.org/aboutJava/communityprocess/mrel/jsr221/index.html

seems to stop us doing it.

The batch interface doesn't offer any way to set scrollable/editable
resultset options, but we don't want to allow that for batches anyway.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Pipelining executions to postgresql server

From
Craig Ringer
Date:
On 11/03/2014 07:05 AM, Scott Harrington wrote:

> I looked over your patch. Your list of ResultHandlerHolders seems to be
> the right direction, but as Tom Lane mentioned there may need to be some
> way to ensure the statements are all in the same transaction.

Why go down this track, when we already have batching?

Just use the standard JDBC API batching facilities and add the missing
support for batching queries that return result sets in PgJDBC.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Pipelining executions to postgresql server

From
Craig Ringer
Date:
On 11/04/2014 07:56 AM, Mikko Tiihonen wrote:
> I do not quite grasp why not sending Sync is so important.

Well, at the moment the PgJDBC driver relies on the following flow to
manage its buffers and avoid a logjam where both server and client are
waiting for the other to consume input:

* Send some stuff
* Sync
* Consume input until Sync response received

... since at this point it knows the server's send buffer should be
empty, or near enough.

There's no fundamental reason why you *can't* send a Sync after each
query, AFAIK. You just have to change how the driver handles syncs so it
knows that there might be more work pipelined. Or fix the driver's
buffer management, as described in the github issues I linked.

It doesn't make much sense to unless you're running in autocommit mode
though, and I'm not convinced piplelining work in autocommit mode makes
a lot of sense. The biggest problem is exception handling - you can't
throw an exception at some statement execution in the past. So you need
a new interface for piplelining... or to just use the existing batch
interface.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Pipelining executions to postgresql server

From
Craig Ringer
Date:
On 11/03/2014 07:05 AM, Scott Harrington wrote:
> This avoids the need for a Future, and avoids the client having to
> loop/sleep until done.

A Future is the logical way to represent an asynchronous operation in
Java. Why implement something else that doesn't fit into existing
libraries and tools when there's already a standard interface?

If you're breaking outside the stock JDBC model and thinking
asynchronously, then using the existing Java APIs for asynchrony makes
sense to me.

In terms of looping until done ... what we really need is a reliably
non-blocking PGConnection.consumeInput() so apps can call that in their
main loops, or via a helper thread. Then we'd be able to add async
notification callbacks too. To do that we need to make PGstream use a
java.nio.Channel instead of a java.net.socket .

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Pipelining executions to postgresql server

From
Craig Ringer
Date:
On 11/04/2014 07:56 AM, Mikko Tiihonen wrote:
> I also think the async I/O is the way to go. Luckily that has already been done
> in the pgjdbc-ng  (https://github.com/impossibl/pgjdbc-ng), built on top
> of netty java NIO library. It has quite good feature parity with the original
> pgjdbc driver.

Huh, it does seem to now. The big omission, unless I'm blind, is support
for COPY. (It also lacks support for JDBC3 and JDBC4, requiring JDK 7
and JDBC 4.1, but that's reasonable enough.)

It now has LISTEN/NOTIFY by the looks, and of course it's built around
binary protocol support.

I freely admit I haven't looked at it too closely. I'm a tad frustrated
by the parallel development of a different driver when the "official"
driver has so much in the way of low hanging fruit to improve.

I have to say I like some aspects of how pgjdbc-ng is being done - in
particular use of Maven and being built around non-blocking I/O.

OTOH, the use of Google Guava I find pretty inexplicable in a JDBC
driver, and since it hard-depends on JDK 7 I don't understand why Netty
was used instead of the async / non-blocking features of the core JVM.
That may simply be my inexperience with writing non-blocking socket I/O
code on Java though.

I'm also concerned about how it'll handle new JDBC versions, as it seems
to lack the JDBC interface abstraction of the core driver.

> I do not think the JDBC batch interface even allow doing updates to multiple
> tables when using prepared statements?

Ah, I missed this before.

That's correct. You get prepared statements _or_ multiple different
statements.

That's a more understandable reason to concoct a new API, and explains
why you're not just solving the issues with batches. Though I still
think you're going to have to fix the buffer management code before you
do anything like this.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Pipelining executions to postgresql server

From
Kevin Wooten
Date:
> On Nov 4, 2014, at 12:55 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>
> On 11/04/2014 07:56 AM, Mikko Tiihonen wrote:
>> I also think the async I/O is the way to go. Luckily that has already been done
>> in the pgjdbc-ng  (https://github.com/impossibl/pgjdbc-ng), built on top
>> of netty java NIO library. It has quite good feature parity with the original
>> pgjdbc driver.
>
> Huh, it does seem to now. The big omission, unless I'm blind, is support
> for COPY. (It also lacks support for JDBC3 and JDBC4, requiring JDK 7
> and JDBC 4.1, but that's reasonable enough.)
>

I focused on 4.1 compliance instead of legacy support. I am proud to say I believe pgjdbc-ng is 100% feature compliant
forJDBC 4.1.  Legacy support is still not, and likely will never be, planned. 

It does lack COPY support as I have never used it myself from the driver, only from tools outside the JVM.

> It now has LISTEN/NOTIFY by the looks, and of course it's built around
> binary protocol support.
>
> I freely admit I haven't looked at it too closely. I'm a tad frustrated
> by the parallel development of a different driver when the "official"
> driver has so much in the way of low hanging fruit to improve.
>

I’ll only nibble this bait…. I outlined my reasons when I started the project.  They were all valid then and still are
now. My apologies for causing any frustration with this new path. It’s a lot cleaner, simpler and provides more JDBC
featuresthan the original driver because of it.  Although I must say, without the original driver and it’s exhausting
batteryof unit tests, building a new driver would seem impossible. 

> I have to say I like some aspects of how pgjdbc-ng is being done - in
> particular use of Maven and being built around non-blocking I/O.
>
> OTOH, the use of Google Guava I find pretty inexplicable in a JDBC
> driver, and since it hard-depends on JDK 7 I don't understand why Netty
> was used instead of the async / non-blocking features of the core JVM.
> That may simply be my inexperience with writing non-blocking socket I/O
> code on Java though.
>

It confuses me as to why you consider using stable, well implemented, well tested and well cared for libraries as
inexplicable. Just because we are building a “driver” means we have to write every line of code ourselves?  No thanks.
Youcan imagine our differences on this philosophy are one of the reasons why I consider pgjdbc-ng’s parallel
developmentto be a godsend rather than hacking on the original code. 

Concerning Guava…  A great library with an amazing number of classes that make everyday Java easier.  The requirement
forJDK 7 was chosen before Guava was considered not because of it.  Using it seemed obvious after that decision.  Also,
wehave internalized the classes we use out of Guava to remove it as a dependency. This is more work to maintain on our
partbut makes it worth it when deploying a single JAR. 

Concerning Netty…  All options were entertained at the beginning.  The original version actually used a basic NIO
socket. After I realized I had to basically write my own framework to work with this socket correctly I searched for an
alternativeand found Netty.  The deciding factor was that Implementing SSL on on top of the NIO API was considered next
toimpossible to get right; according to all prevailing wisdom at the time.  Whereas with Netty, SSL support is
basicallya single line change. 

> I'm also concerned about how it'll handle new JDBC versions, as it seems
> to lack the JDBC interface abstraction of the core driver.
>

My plan is to handle adding support for 4.2 and beyond by using a Maven based conditional preprocessor.  If that fails,
orseems just too ugly, I’ll probably have to look at an abstract class based method like that of the original driver. 

>> I do not think the JDBC batch interface even allow doing updates to multiple
>> tables when using prepared statements?
>
> Ah, I missed this before.
>
> That's correct. You get prepared statements _or_ multiple different
> statements.
>
> That's a more understandable reason to concoct a new API, and explains
> why you're not just solving the issues with batches. Though I still
> think you're going to have to fix the buffer management code before you
> do anything like this.
>
> --
> Craig Ringer                   http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-jdbc



Re: [HACKERS] Pipelining executions to postgresql server

From
Mikko Tiihonen
Date:
Kevin Wooten <kdubb@me.com> wrote:
> > On Nov 4, 2014, at 12:55 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> >
> > I have to say I like some aspects of how pgjdbc-ng is being done - in
> > particular use of Maven and being built around non-blocking I/O.
> >
> > OTOH, the use of Google Guava I find pretty inexplicable in a JDBC
> > driver, and since it hard-depends on JDK 7 I don't understand why Netty
> > was used instead of the async / non-blocking features of the core JVM.
> > That may simply be my inexperience with writing non-blocking socket I/O
> > code on Java though.
> >

Java6 has been EOL since 2011 and Java7 is EOL in less than 6 months. I think
that depending on old Java 7 version that soon should not even be used in
production (without paying for extra support) can hardly be too hard requirement.

> It confuses me as to why you consider using stable, well implemented, well tested and well cared for libraries as
inexplicable. > Just because we are building a “driver” means we have to write every line of code ourselves?  No
thanks.

Embedding parts of other projects into code-base during build with renamed packages is nowadays common practice in java
projects:spring does it, elasticsearch embeds whole netty and more, even jre embeds for example xerces and asm. 
It might not be the optimal solution, but still definitely better than writing everything from scratch or copy-pasting
codefrom other projects. 

If pgjdbc-ng provides both a thin maven version (with external versioned dependencies) and a fat-jar with the external
versionsrepackaged inside the users can choose either old way: just-drop-a-jdbc-jar-into-project or new way with their
chosenbuild tool that automatically manages the dependencies. 

-Mikko