Thread: COPY support in JDBC driver?

COPY support in JDBC driver?

From
"Addleman, Mark J"
Date:

Hi, folks –

 

Is there an ETA for support of the COPY command in the JDBC driver?  I am aware of the patch (http://kato.iki.fi/sw/db/postgresql/jdbc/copy/) but notice that the provided jdbc drivers are from the previous generation.  Although I could patch the current driver myself, I’d really rather have a blessed version from the Postgres committers.

 

Mark

Re: COPY support in JDBC driver?

From
Maciek Sakrejda
Date:
Hi all,

Regarding postgresql JDBC COPY support:

http://archives.postgresql.org/pgsql-jdbc/2008-09/msg00053.php

Truviso has been using Kalle's COPY patch for a while, and we'd like to
help fold it into the core jdbc driver. We've fixed a couple of encoding
issues, but otherwise, we've been using it without any problems. I'd
like to merge against the current trunk and re-submit the patch in the
next couple of weeks.

Thanks,
--
Maciek Sakrejda
Truviso, Inc.
http://www.truviso.com



Re: COPY support in JDBC driver?

From
"Albe Laurenz"
Date:
Maciek Sakrejda wrote:
> Truviso has been using Kalle's COPY patch for a while, and we'd like to
> help fold it into the core jdbc driver. We've fixed a couple of encoding
> issues, but otherwise, we've been using it without any problems. I'd
> like to merge against the current trunk and re-submit the patch in the
> next couple of weeks.

I think it would be wonderful to have support for COPY!

+1

Yours,
Laurenz Albe

Re: COPY support in JDBC driver?

From
"Michael Nacos"
Date:
Hi Mark,

if you want COPY support from the JDBC driver just for performance reasons, you should probably also consider using the driver's batch mode. Please have a look at the following links:

http://archives.postgresql.org/pgsql-jdbc/2007-04/msg00076.php
http://freeyourtech.wordpress.com/2008/07/15/using-postgresql-jdbc-for-bulk-updates-batch-size-vs-performance/

Encoding is a serious consideration. While testing pgBee, I ran into problems inserting codepage 1252 files into a SQL_ASCII database, because of character mismatches. In your Java application, you may specify a particular charset for reading the input files. I haven't used the patch, but if support for the COPY operation is to be included in the driver, please make sure the driver handles such things with care.

It might also be that the driver shouldn't have to deal with things like that. It's a driver for connecting to a DBMS, not a backup-restore solution. But I cannot speak on behalf of anyone or anything, I'm pretty much an occasional pgsql-jdbc reader.

Michael

Re: COPY support in JDBC driver?

From
Daniel Migowski
Date:
Michael Nacos schrieb:
> Hi Mark,
>
> if you want COPY support from the JDBC driver just for performance
> reasons, you should probably also consider using the driver's batch
> mode. Please have a look at the following links:
>
> http://archives.postgresql.org/pgsql-jdbc/2007-04/msg00076.php
> http://freeyourtech.wordpress.com/2008/07/15/using-postgresql-jdbc-for-bulk-updates-batch-size-vs-performance/
>
> Encoding is a serious consideration. While testing pgBee, I ran into
> problems inserting codepage 1252 files into a SQL_ASCII database,
> because of character mismatches. In your Java application, you may
> specify a particular charset for reading the input files. I haven't
> used the patch, but if support for the COPY operation is to be
> included in the driver, please make sure the driver handles such
> things with care.
AFAIK is UTF-8 the only encoding which the driver supports, anyway. And
the native Java encoding, too. In my opinion the API should either
support Writers and Readers (instead of Output- and InputStream), so the
application has to take care for the encoding itself, or the API should
encapsulate setting an arbitrary encoding on the server side before the
copy command, and return to the default encoding directly afterwards.

Please note that this might create conflicts with error mesages and
notices thrown during the copy command.

With best regards,
Daniel Migowski

Re: COPY support in JDBC driver?

From
Kris Jurka
Date:

On Wed, 24 Sep 2008, Michael Nacos wrote:

> if you want COPY support from the JDBC driver just for performance reasons,
> you should probably also consider using the driver's batch mode.

Copy is quite a bit faster than batch execution.  See:

http://archives.postgresql.org/pgsql-jdbc/2008-02/msg00018.php

> Encoding is a serious consideration. While testing pgBee, I ran into
> problems inserting codepage 1252 files into a SQL_ASCII database,
> because of character mismatches.

You should not use a SQL_ASCII database if you care about encoding your
data properly.  Once you put data in, there is no way of telling what
encoding it was inserted with, so there is no way of retrieving it
correctly.

Kris Jurka


Re: COPY support in JDBC driver?

From
Kris Jurka
Date:

On Wed, 24 Sep 2008, Daniel Migowski wrote:

> AFAIK is UTF-8 the only encoding which the driver supports, anyway. And
> the native Java encoding, too. In my opinion the API should either
> support Writers and Readers (instead of Output- and InputStream), so the
> application has to take care for the encoding itself, or the API should
> encapsulate setting an arbitrary encoding on the server side before the
> copy command, and return to the default encoding directly afterwards.
>

Yes, the current copy patches only support *Stream which does leave the
user exposed to encoding issues.  Providing a Reader/Writer API doesn't
support COPY ... BINARY, but I don't know how many people would actually
use such a thing.  Parallel interfaces are a possibility, but I'd guess
people would end up using the Stream versions for non-binary data anyway.

Does anyone have the need to do COPY BINARY?

I also wonder what the encoding conversion hit is if no conversion needs
to be done.  Perhaps we should measure that before abandonding the Stream
API?

Kris Jurka

Re: COPY support in JDBC driver?

From
"Albe Laurenz"
Date:
Kris Jurka wrote:

> Yes, the current copy patches only support *Stream which does leave the
> user exposed to encoding issues.  Providing a Reader/Writer API doesn't
> support COPY ... BINARY, but I don't know how many people would actually
> use such a thing.  Parallel interfaces are a possibility, but I'd guess
> people would end up using the Stream versions for non-binary data anyway.
>
> Does anyone have the need to do COPY BINARY?

I have never felt the temptation.

As far as I understand, that would mean that your code is dependent
on the endianness of the server machine and compile time flags like
--enable-integer-datetime.

> I also wonder what the encoding conversion hit is if no conversion needs
> to be done.  Perhaps we should measure that before abandonding the Stream
> API?

That would certainly be interesting, although I don't think it's a killer
argument.

I don't want to go on about Java, but I guess that if you need to load
huge amounts of data into a database as fast as possible, you would probably
not do this via JDBC from a remote machine, but use psql locally on the server.
So performance is of course important, but not the most important thing for me
when I code Java.

For me the nice thing about COPY support in JDBC would be that you have
an easy way to load CSV files into the database or dump them out from a
database, which is a requirement I encounter frequently.

Yours,
Laurenz Albe

Re: COPY support in JDBC driver?

From
Craig Ringer
Date:
Albe Laurenz wrote:

> For me the nice thing about COPY support in JDBC would be that you have
> an easy way to load CSV files into the database or dump them out from a
> database, which is a requirement I encounter frequently.

It also permits you to do bulk load/dump operations within the context
of an ongoing JDBC transaction. Somebody on this list (or -general; I
don't remember) was trying to do that only a few days ago.

In addition to fast import/export of user data, I can also see uses for
it when trying to avoid having database setup scripts outside the
primary application. For apps you're distributing to end users, having
the Java app able to create and populate the initial database quickly
and efficiently rather than having  to shell out to psql, get the user
to run scripts, etc is highly appealing.

Text-based COPY in the JDBC driver would be a significant plus to me. I
can't say I personally care about binary COPY especially with a language
that tries to be platform independent.

--
Craig Ringer

Re: COPY support in JDBC driver?

From
"Ido M. Tamir"
Date:
On Thursday 25 September 2008 09:05:39 Craig Ringer wrote:
> Albe Laurenz wrote:
> > For me the nice thing about COPY support in JDBC would be that you have
> > an easy way to load CSV files into the database or dump them out from a
> > database, which is a requirement I encounter frequently.

> It also permits you to do bulk load/dump operations within the context
> of an ongoing JDBC transaction. Somebody on this list (or -general; I
> don't remember) was trying to do that only a few days ago.

This was exactly something that I had to do once.
Thanks to java, I could even put a FilterInputStream to change the
input data before streaming it into the copyManager,

> Text-based COPY in the JDBC driver would be a significant plus to me. I
> can't say I personally care about binary COPY especially with a language
> that tries to be platform independent.
>

Yes please, it would be a great if this makes it into the
official driver and then onto the maven repositories.

thank you very much,
ido

Re: COPY support in JDBC driver?

From
dmp
Date:
> On Wed, 24 Sep 2008, Daniel Migowski wrote:
>
> AFAIK is UTF-8 the only encoding which the driver supports, anyway.
> And the native Java encoding, too. In my opinion the API should either
> support Writers and Readers (instead of Output- and InputStream), so
> the application has to take care for the encoding itself, or the API
> should encapsulate setting an arbitrary encoding on the server side
> before the copy command, and return to the default encoding directly
> afterwards.
>
>
> Yes, the current copy patches only support *Stream which does leave
> the user exposed to encoding issues. Providing a Reader/Writer API
> doesn't support COPY ... BINARY, but I don't know how many people
> would actually use such a thing. Parallel interfaces are a
> possibility, but I'd guess people would end up using the Stream
> versions for non-binary data anyway.
>
> Does anyone have the need to do COPY BINARY?
>
> I also wonder what the encoding conversion hit is if no conversion
> needs to be done. Perhaps we should measure that before abandonding
> the Stream API?
>
> Kris Jurka



Somehow in the midst of this thread the discussion of abandoning
the Stream API seems to have come up. The Java API clearly
indicates through the java.sql package that the ResultSet Interface
has stream methods. Please continue to support the Java API JDBC.

Dana Proctor.

Re: COPY support in JDBC driver?

From
Kris Jurka
Date:

On Thu, 25 Sep 2008, dmp wrote:

> Somehow in the midst of this thread the discussion of abandoning
> the Stream API seems to have come up. The Java API clearly
> indicates through the java.sql package that the ResultSet Interface
> has stream methods. Please continue to support the Java API JDBC.

The discussion is only referring to a postgresql specific interface for
doing COPY, not anything in java.sql.*.

Kris Jurka


Re: COPY support in JDBC driver?

From
mark_addleman@bigfoot.com
Date:
On Sep 24, 7:06 am, m.na...@gmail.com ("Michael Nacos") wrote:
> Hi Mark,
>
> if you want COPY support from the JDBC driver just for performance reasons,
> you should probably also consider using the driver's batch mode. Please have
> a look at the following links:
>
>
http://archives.postgresql.org/pgsql-jdbc/2007-04/msg00076.phphttp://freeyourtech.wordpress.com/2008/07/15/using-postgresql-jdbc-fo...
>
> Encoding is a serious consideration. While testing pgBee, I ran into
> problems inserting codepage 1252 files into a SQL_ASCII database, because of
> character mismatches. In your Java application, you may specify a particular
> charset for reading the input files. I haven't used the patch, but if
> support for the COPY operation is to be included in the driver, please make
> sure the driver handles such things with care.
>
> It might also be that the driver shouldn't have to deal with things like
> that. It's a driver for connecting to a DBMS, not a backup-restore solution.
> But I cannot speak on behalf of anyone or anything, I'm pretty much an
> occasional pgsql-jdbc reader.
>
> Michael

Hi, Michael -

Thanks for the batch pointer.  We are currently using batches with
varying sizes but still aren't getting the performance we'd like.

As for including the COPY facility in the JDBC driver versus doing
bulk loads from a command line utility:  Our online application
receives an immense amount of data periodically and must store that
data into Postgres before the next bulk arrives.  It makes sense to
process and store this data entirely within the confines of a JVM and
within a transaction thus COPY support in the JDBC driver is very
helpful for us.


Re: COPY support in JDBC driver?

From
"Michael Nacos"
Date:
Hi Mark,

Batching statements in JDBC will probably get you up to 1000 operations/sec (reading from disk, slow database) or 10000 operations/sec (reading from memory and/or fast database). I have not yet used the JDBC COPY patch, but it might be what you want, if you need higher speeds. I had always associated the COPY command with properly formatted input files, but I have read all the patch needs is an InputStream, so it might be exactly what you need.

I am planning to package pgBee into a library, so that it may be used from other java programs or become part of a workflow, but it's always going to be file based. Then again, when you say 'immense amount of data' and 'periodically' you make me worry about JVM coping with your memory requirements.

cheers,

Michael

On Thu, Oct 16, 2008 at 12:14 AM, <mark_addleman@bigfoot.com> wrote:

Hi, Michael -

Thanks for the batch pointer.  We are currently using batches with
varying sizes but still aren't getting the performance we'd like.

As for including the COPY facility in the JDBC driver versus doing
bulk loads from a command line utility:  Our online application
receives an immense amount of data periodically and must store that
data into Postgres before the next bulk arrives.  It makes sense to
process and store this data entirely within the confines of a JVM and
within a transaction thus COPY support in the JDBC driver is very
helpful for us.