Thread: FYI: porting Copy API to 8.x

FYI: porting Copy API to 8.x

From

"Kalle Hallivuori"

Date:

30 May 2007, 09:29:57

Hi!

I'm porting the Copy API to 8.x series of pgjdbc. You may expect a
patch at the commit list this week.

I've copied Chris Jurka's patch for the 7.x series from here:
http://archives.postgresql.org/pgsql-jdbc/2003-12/msg00186.php

I'm just modifying that old patch to fit current implementation.
Suggestions for enhancements or alternative approaches are welcome so
long as they are sufficiently trivial to implement :)

Should this patched patch get accepted for inclusion it might make it
into official distribution for 8.4. Otherwise (and meanwhile) people
will have to keep building their own drivers in order to be able to
use COPY TO/FROM STDIN.

--
Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/

Re: FYI: porting Copy API to 8.x

From

Dave Cramer

Date:

30 May 2007, 11:47:19

Kalle,

The reason that the patch has never been officially accepted is
because it creates two protocol paths.

To do this correctly you need to add the copy functionality into the
current protocol handler as opposed to having a separate protocol
handler for copy.

Dave
On 30-May-07, at 8:29 AM, Kalle Hallivuori wrote:

> Hi!
>
> I'm porting the Copy API to 8.x series of pgjdbc. You may expect a
> patch at the commit list this week.
>
> I've copied Chris Jurka's patch for the 7.x series from here:
> http://archives.postgresql.org/pgsql-jdbc/2003-12/msg00186.php
>
> I'm just modifying that old patch to fit current implementation.
> Suggestions for enhancements or alternative approaches are welcome so
> long as they are sufficiently trivial to implement :)
>
> Should this patched patch get accepted for inclusion it might make it
> into official distribution for 8.4. Otherwise (and meanwhile) people
> will have to keep building their own drivers in order to be able to
> use COPY TO/FROM STDIN.
>
> --
> Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq

Re: FYI: porting Copy API to 8.x

From

"Kalle Hallivuori"

Date:

30 May 2007, 16:46:54

Hi Dave!

2007/5/30, Dave Cramer <pg@fastcrypt.com>:
> The reason that the patch has never been officially accepted is
> because it creates two protocol paths.
>
> To do this correctly you need to add the copy functionality into the
> current protocol handler as opposed to having a separate protocol
> handler for copy.

Thanks, now I understand why it's not there. I can take a look at
alternative approaches tomorrow, and only go with the ugly one if I
don't come up with an (moderately easily implementable) acceptable
design.

Is it more of an open design problem that many people have looked
closely into, or rather just a matter of someone getting to it? Ie. is
it likely there is no clean way to add these special cases inside the
protocol handler?

As for redesign, which would be the least abhorred way of providing
the functionality for use:

1. Separate Copy API available via a getCopyAPI() method in PGConnection API?
2. Separate Copy API available with some cleaner approach, which?
3. Special Copy methods in PGConnection API?
4. Special cases to batch processing (FROM STDIN) and resultset
collecting (TO STDOUT)?
5. Special cases to some other part of the standard JDBC API, which?

Cheers,

--
Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/

Re: FYI: porting Copy API to 8.x

From

Dave Cramer

Date:

31 May 2007, 15:57:02

Hi Kalle,


On 30-May-07, at 3:45 PM, Kalle Hallivuori wrote:

> Hi Dave!
>
> 2007/5/30, Dave Cramer <pg@fastcrypt.com>:
>> The reason that the patch has never been officially accepted is
>> because it creates two protocol paths.
>>
>> To do this correctly you need to add the copy functionality into the
>> current protocol handler as opposed to having a separate protocol
>> handler for copy.
>
> Thanks, now I understand why it's not there. I can take a look at
> alternative approaches tomorrow, and only go with the ugly one if I
> don't come up with an (moderately easily implementable) acceptable
> design.
>
> Is it more of an open design problem that many people have looked
> closely into, or rather just a matter of someone getting to it? Ie. is
> it likely there is no clean way to add these special cases inside the
> protocol handler?
>
> As for redesign, which would be the least abhorred way of providing
> the functionality for use:
>
> 1. Separate Copy API available via a getCopyAPI() method in
> PGConnection API?
> 2. Separate Copy API available with some cleaner approach, which?
> 3. Special Copy methods in PGConnection API?
> 4. Special cases to batch processing (FROM STDIN) and resultset
> collecting (TO STDOUT)?
> 5. Special cases to some other part of the standard JDBC API, which?
>
I think you have to make it an extension somehow. I'd leave
getCopyAPI in place, I think the harder part is getting the protocol
to work,
> Cheers,
>
> --
> Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq

Re: FYI: porting Copy API to 8.x

From

"Kalle Hallivuori"

Date:

01 June 2007, 09:53:41

Hi again.

2007/5/31, Dave Cramer <pg@fastcrypt.com>:
> I think you have to make it an extension somehow. I'd leave
> getCopyAPI in place, I think the harder part is getting the protocol
> to work,

I found these hooks for protocol level implementation at end of
org.postgresql.core.v3.QueryExecutorImpl.processResults():

            case 'G':  // CopyInResponse
            case 'H':  // CopyOutResponse
            case 'c':  // CopyDone
            case 'd':  // CopyData
                {
                    // COPY FROM STDIN / COPY TO STDOUT, neither of
which are currently
                    // supported.
...
                    handler.handleError(new PSQLException(GT.tr("The
driver currently does not support COPY operations."),
PSQLState.NOT_IMPLEMENTED));
                }
                break;

AFAIK (counting out breaking current design) the only way to implement
this at protocol level without touching core.v3 would be to clone the
whole package with every class inherited from core.v3, and copy
contents of QueryExecutorImpl there. I'd rather not.

Problem is that class has its stuff (pgStream) declared as private, so
I can't just inherit a class to override that single method.
Furthermore, I'd end up duplicating the protocol support implemented
in original version of that method. Thus, I'll edit it in place. Even
if it ends up in a rejected patch at least the patch gets it right.

The protocol spec is clear enough by itself.

I'll still have to look how to best pass the CopyData rows between BE
and application. They could probably be encapsulated as tuples, but
I'll have to read a bit more to find out the best way.

I won't handle field types (text/binary in the protocol level) because
I'm looking forward to a solution where I could pass whole or possibly
even multiple rows as raw(ish) byte[] at once. If this becomes a
barrier for acceptance I can reconsider.

Thanks for the clear source. It is what makes enhancements such as
this possible on short notice.

So no commits this week, but there's still hoping I'll get something out.

--
Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/

Re: FYI: porting Copy API to 8.x

From

"Kalle Hallivuori"

Date:

08 June 2007, 08:05:55

Hi again.

I have an implementation of COPY subprotocol support that is
implemented inside org.postgresql.core.v3.QueryExecutorImpl and used
via a clean API available from PGConnection.getCopyAPI(). I'm trying
to test it today and commit the patch next week.

To ensure inclusion in the official source, I tried to follow the way
Fastpath was implemented.

In this implementation a COPY SQL statement is executed normally,
after which the response from server initiating COPY subprotocol is
handled in the usual processResults(). After that, specific methods
must be used to exchange copydata with the server and return to normal
protocol. All responses from server during that time are handled in a
separate private processCopyResults() method.

Alternatively processCopyResults() could be merged into
processResults() or separate everything COPY-related from there,
keeping it as it was. Opinions?

--
Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/

Re: FYI: porting Copy API to 8.x

From

Heikki Linnakangas

Date:

08 June 2007, 08:32:47

Kalle Hallivuori wrote:
> To ensure inclusion in the official source, I tried to follow the way
> Fastpath was implemented.

I can't comment on whether that approach is good or not, but the fast
path has been pretty much obsoleted since the introduction of
server-side prepared statements, in 7.4 IIRC. That piece of code is not
necessarily a good example to follow.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: FYI: porting Copy API to 8.x

From

"Kalle Hallivuori"

Date:

11 June 2007, 16:55:49

Hello again.

Thanks to Chris Jurka for notifying me that there's a limit on size of
postings on this list.

Attached is a patch against postgresql-jdbc-8.2-505.src containing
Copy support inside v3/QueryExecutorImpl usable through a separate API
together with minimal unit tests and documentation.

I'd like to hear whether I should still fix something before making a
version for the 8.3 branch (there are plenty of possible approaches
for both client-backend and API, after all). Also I wonder if it would
be possible to include this in the official driver?

--
Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/

Attachment

pgjdbc82-copy.diff.gz

Re: FYI: porting Copy API to 8.x

From

Kris Jurka

Date:

13 June 2007, 05:16:47

On Mon, 11 Jun 2007, Kalle Hallivuori wrote:

> Attached is a patch against postgresql-jdbc-8.2-505.src containing
> Copy support inside v3/QueryExecutorImpl usable through a separate API
> together with minimal unit tests and documentation.
>

This patch doesn't apply cleanly because there's a mix of unix and windows
EOL characters and it seems to have some other application problems.

The API is not thread safe.  If two threads are using the same connection
the synchronization in QueryExecutorImpl prevents them from stepping on
each others toes and writing garbage to the backend.  So the call to
QueryExecutor must be an atomic operation and implies that the
QueryExecutorImpl will be the controller and demand/push data from some
kind of copy client instead of the other way around.

Also I'm not especially fond of the row based API that you've come up
with.  It seems like you should either go to an element or stream based
API.  Who has a premade row?  They've either got single fields or a bulk
stream.  Consider someone with a CSV file on their client machine that
they want to copy to the server, they'd have to parse it to break it into
rows which can be tricky with embedded newlines.  They'd like an easy way
to just dump it to the server and let it deal with it.

Kris Jurka

Re: FYI: porting Copy API to 8.x

From

"Kalle Hallivuori"

Date:

13 June 2007, 10:26:46

Hi Kris!

Happy to receive feedback.

2007/6/13, Kris Jurka <books@ejurka.com>:
> This patch doesn't apply cleanly because there's a mix of unix and windows
> EOL characters and it seems to have some other application problems.

Sorry for that. I'll pay closer attention to the format in future.

> The API is not thread safe.  If two threads are using the same connection
> the synchronization in QueryExecutorImpl prevents them from stepping on
> each others toes and writing garbage to the backend.  So the call to
> QueryExecutor must be an atomic operation and implies that the
> QueryExecutorImpl will be the controller and demand/push data from some
> kind of copy client instead of the other way around.

Yes. (I thought that was alright since I spotted a statement of no
support for synchronization somewhere, but that turned out to be at
lower level, pgStream.)

I'll look into current synchronization and hopefully I'll be able (as
in have the time) to do that.

> Also I'm not especially fond of the row based API that you've come up
> with.  It seems like you should either go to an element or stream based
> API.  Who has a premade row? ...

It doesn't matter what kinds of chunks you feed it (I think that was
ingenious of the COPY specification writers). With a CSV file you can
read it as a whole into a single byte[] and write that at once. Or you
can have static size byte buffer and pass it repeatedly to write().

In my application I now succesfully collect a bunch of field values as
separate byte[]'s mixed with references to static instances of
delimiter byte[]'s into a large, static size byte[][], write when it
fills up, repeat until out of data. Cut time spent in import by half.
(The other half of import time is spent post-processing the data :))

However, if that isn't intuitive it has to be explained clearly in all
documentation or a more intuitive API offered.

I think you're right on all accounts. I'll try to find some time to
make it synchronized and self-evident. Shouldn't be too much of an
effort now.

--
Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/