Thread: COPY support in JDBC driver?
Hi, folks –
Is there an ETA for support of the COPY command in the JDBC driver? I am aware of the patch (http://kato.iki.fi/sw/db/postgresql/jdbc/copy/) but notice that the provided jdbc drivers are from the previous generation. Although I could patch the current driver myself, I’d really rather have a blessed version from the Postgres committers.
Mark
Hi all, Regarding postgresql JDBC COPY support: http://archives.postgresql.org/pgsql-jdbc/2008-09/msg00053.php Truviso has been using Kalle's COPY patch for a while, and we'd like to help fold it into the core jdbc driver. We've fixed a couple of encoding issues, but otherwise, we've been using it without any problems. I'd like to merge against the current trunk and re-submit the patch in the next couple of weeks. Thanks, -- Maciek Sakrejda Truviso, Inc. http://www.truviso.com
Maciek Sakrejda wrote: > Truviso has been using Kalle's COPY patch for a while, and we'd like to > help fold it into the core jdbc driver. We've fixed a couple of encoding > issues, but otherwise, we've been using it without any problems. I'd > like to merge against the current trunk and re-submit the patch in the > next couple of weeks. I think it would be wonderful to have support for COPY! +1 Yours, Laurenz Albe
Hi Mark,
if you want COPY support from the JDBC driver just for performance reasons, you should probably also consider using the driver's batch mode. Please have a look at the following links:
http://archives.postgresql.org/pgsql-jdbc/2007-04/msg00076.php
http://freeyourtech.wordpress.com/2008/07/15/using-postgresql-jdbc-for-bulk-updates-batch-size-vs-performance/
Encoding is a serious consideration. While testing pgBee, I ran into problems inserting codepage 1252 files into a SQL_ASCII database, because of character mismatches. In your Java application, you may specify a particular charset for reading the input files. I haven't used the patch, but if support for the COPY operation is to be included in the driver, please make sure the driver handles such things with care.
It might also be that the driver shouldn't have to deal with things like that. It's a driver for connecting to a DBMS, not a backup-restore solution. But I cannot speak on behalf of anyone or anything, I'm pretty much an occasional pgsql-jdbc reader.
Michael
if you want COPY support from the JDBC driver just for performance reasons, you should probably also consider using the driver's batch mode. Please have a look at the following links:
http://archives.postgresql.org/pgsql-jdbc/2007-04/msg00076.php
http://freeyourtech.wordpress.com/2008/07/15/using-postgresql-jdbc-for-bulk-updates-batch-size-vs-performance/
Encoding is a serious consideration. While testing pgBee, I ran into problems inserting codepage 1252 files into a SQL_ASCII database, because of character mismatches. In your Java application, you may specify a particular charset for reading the input files. I haven't used the patch, but if support for the COPY operation is to be included in the driver, please make sure the driver handles such things with care.
It might also be that the driver shouldn't have to deal with things like that. It's a driver for connecting to a DBMS, not a backup-restore solution. But I cannot speak on behalf of anyone or anything, I'm pretty much an occasional pgsql-jdbc reader.
Michael
Michael Nacos schrieb: > Hi Mark, > > if you want COPY support from the JDBC driver just for performance > reasons, you should probably also consider using the driver's batch > mode. Please have a look at the following links: > > http://archives.postgresql.org/pgsql-jdbc/2007-04/msg00076.php > http://freeyourtech.wordpress.com/2008/07/15/using-postgresql-jdbc-for-bulk-updates-batch-size-vs-performance/ > > Encoding is a serious consideration. While testing pgBee, I ran into > problems inserting codepage 1252 files into a SQL_ASCII database, > because of character mismatches. In your Java application, you may > specify a particular charset for reading the input files. I haven't > used the patch, but if support for the COPY operation is to be > included in the driver, please make sure the driver handles such > things with care. AFAIK is UTF-8 the only encoding which the driver supports, anyway. And the native Java encoding, too. In my opinion the API should either support Writers and Readers (instead of Output- and InputStream), so the application has to take care for the encoding itself, or the API should encapsulate setting an arbitrary encoding on the server side before the copy command, and return to the default encoding directly afterwards. Please note that this might create conflicts with error mesages and notices thrown during the copy command. With best regards, Daniel Migowski
On Wed, 24 Sep 2008, Michael Nacos wrote: > if you want COPY support from the JDBC driver just for performance reasons, > you should probably also consider using the driver's batch mode. Copy is quite a bit faster than batch execution. See: http://archives.postgresql.org/pgsql-jdbc/2008-02/msg00018.php > Encoding is a serious consideration. While testing pgBee, I ran into > problems inserting codepage 1252 files into a SQL_ASCII database, > because of character mismatches. You should not use a SQL_ASCII database if you care about encoding your data properly. Once you put data in, there is no way of telling what encoding it was inserted with, so there is no way of retrieving it correctly. Kris Jurka
On Wed, 24 Sep 2008, Daniel Migowski wrote: > AFAIK is UTF-8 the only encoding which the driver supports, anyway. And > the native Java encoding, too. In my opinion the API should either > support Writers and Readers (instead of Output- and InputStream), so the > application has to take care for the encoding itself, or the API should > encapsulate setting an arbitrary encoding on the server side before the > copy command, and return to the default encoding directly afterwards. > Yes, the current copy patches only support *Stream which does leave the user exposed to encoding issues. Providing a Reader/Writer API doesn't support COPY ... BINARY, but I don't know how many people would actually use such a thing. Parallel interfaces are a possibility, but I'd guess people would end up using the Stream versions for non-binary data anyway. Does anyone have the need to do COPY BINARY? I also wonder what the encoding conversion hit is if no conversion needs to be done. Perhaps we should measure that before abandonding the Stream API? Kris Jurka
Kris Jurka wrote: > Yes, the current copy patches only support *Stream which does leave the > user exposed to encoding issues. Providing a Reader/Writer API doesn't > support COPY ... BINARY, but I don't know how many people would actually > use such a thing. Parallel interfaces are a possibility, but I'd guess > people would end up using the Stream versions for non-binary data anyway. > > Does anyone have the need to do COPY BINARY? I have never felt the temptation. As far as I understand, that would mean that your code is dependent on the endianness of the server machine and compile time flags like --enable-integer-datetime. > I also wonder what the encoding conversion hit is if no conversion needs > to be done. Perhaps we should measure that before abandonding the Stream > API? That would certainly be interesting, although I don't think it's a killer argument. I don't want to go on about Java, but I guess that if you need to load huge amounts of data into a database as fast as possible, you would probably not do this via JDBC from a remote machine, but use psql locally on the server. So performance is of course important, but not the most important thing for me when I code Java. For me the nice thing about COPY support in JDBC would be that you have an easy way to load CSV files into the database or dump them out from a database, which is a requirement I encounter frequently. Yours, Laurenz Albe
Albe Laurenz wrote: > For me the nice thing about COPY support in JDBC would be that you have > an easy way to load CSV files into the database or dump them out from a > database, which is a requirement I encounter frequently. It also permits you to do bulk load/dump operations within the context of an ongoing JDBC transaction. Somebody on this list (or -general; I don't remember) was trying to do that only a few days ago. In addition to fast import/export of user data, I can also see uses for it when trying to avoid having database setup scripts outside the primary application. For apps you're distributing to end users, having the Java app able to create and populate the initial database quickly and efficiently rather than having to shell out to psql, get the user to run scripts, etc is highly appealing. Text-based COPY in the JDBC driver would be a significant plus to me. I can't say I personally care about binary COPY especially with a language that tries to be platform independent. -- Craig Ringer
On Thursday 25 September 2008 09:05:39 Craig Ringer wrote: > Albe Laurenz wrote: > > For me the nice thing about COPY support in JDBC would be that you have > > an easy way to load CSV files into the database or dump them out from a > > database, which is a requirement I encounter frequently. > It also permits you to do bulk load/dump operations within the context > of an ongoing JDBC transaction. Somebody on this list (or -general; I > don't remember) was trying to do that only a few days ago. This was exactly something that I had to do once. Thanks to java, I could even put a FilterInputStream to change the input data before streaming it into the copyManager, > Text-based COPY in the JDBC driver would be a significant plus to me. I > can't say I personally care about binary COPY especially with a language > that tries to be platform independent. > Yes please, it would be a great if this makes it into the official driver and then onto the maven repositories. thank you very much, ido
> On Wed, 24 Sep 2008, Daniel Migowski wrote: > > AFAIK is UTF-8 the only encoding which the driver supports, anyway. > And the native Java encoding, too. In my opinion the API should either > support Writers and Readers (instead of Output- and InputStream), so > the application has to take care for the encoding itself, or the API > should encapsulate setting an arbitrary encoding on the server side > before the copy command, and return to the default encoding directly > afterwards. > > > Yes, the current copy patches only support *Stream which does leave > the user exposed to encoding issues. Providing a Reader/Writer API > doesn't support COPY ... BINARY, but I don't know how many people > would actually use such a thing. Parallel interfaces are a > possibility, but I'd guess people would end up using the Stream > versions for non-binary data anyway. > > Does anyone have the need to do COPY BINARY? > > I also wonder what the encoding conversion hit is if no conversion > needs to be done. Perhaps we should measure that before abandonding > the Stream API? > > Kris Jurka Somehow in the midst of this thread the discussion of abandoning the Stream API seems to have come up. The Java API clearly indicates through the java.sql package that the ResultSet Interface has stream methods. Please continue to support the Java API JDBC. Dana Proctor.
On Thu, 25 Sep 2008, dmp wrote: > Somehow in the midst of this thread the discussion of abandoning > the Stream API seems to have come up. The Java API clearly > indicates through the java.sql package that the ResultSet Interface > has stream methods. Please continue to support the Java API JDBC. The discussion is only referring to a postgresql specific interface for doing COPY, not anything in java.sql.*. Kris Jurka
On Sep 24, 7:06 am, m.na...@gmail.com ("Michael Nacos") wrote: > Hi Mark, > > if you want COPY support from the JDBC driver just for performance reasons, > you should probably also consider using the driver's batch mode. Please have > a look at the following links: > > http://archives.postgresql.org/pgsql-jdbc/2007-04/msg00076.phphttp://freeyourtech.wordpress.com/2008/07/15/using-postgresql-jdbc-fo... > > Encoding is a serious consideration. While testing pgBee, I ran into > problems inserting codepage 1252 files into a SQL_ASCII database, because of > character mismatches. In your Java application, you may specify a particular > charset for reading the input files. I haven't used the patch, but if > support for the COPY operation is to be included in the driver, please make > sure the driver handles such things with care. > > It might also be that the driver shouldn't have to deal with things like > that. It's a driver for connecting to a DBMS, not a backup-restore solution. > But I cannot speak on behalf of anyone or anything, I'm pretty much an > occasional pgsql-jdbc reader. > > Michael Hi, Michael - Thanks for the batch pointer. We are currently using batches with varying sizes but still aren't getting the performance we'd like. As for including the COPY facility in the JDBC driver versus doing bulk loads from a command line utility: Our online application receives an immense amount of data periodically and must store that data into Postgres before the next bulk arrives. It makes sense to process and store this data entirely within the confines of a JVM and within a transaction thus COPY support in the JDBC driver is very helpful for us.
Hi Mark,
Batching statements in JDBC will probably get you up to 1000 operations/sec (reading from disk, slow database) or 10000 operations/sec (reading from memory and/or fast database). I have not yet used the JDBC COPY patch, but it might be what you want, if you need higher speeds. I had always associated the COPY command with properly formatted input files, but I have read all the patch needs is an InputStream, so it might be exactly what you need.
I am planning to package pgBee into a library, so that it may be used from other java programs or become part of a workflow, but it's always going to be file based. Then again, when you say 'immense amount of data' and 'periodically' you make me worry about JVM coping with your memory requirements.
cheers,
Michael
Batching statements in JDBC will probably get you up to 1000 operations/sec (reading from disk, slow database) or 10000 operations/sec (reading from memory and/or fast database). I have not yet used the JDBC COPY patch, but it might be what you want, if you need higher speeds. I had always associated the COPY command with properly formatted input files, but I have read all the patch needs is an InputStream, so it might be exactly what you need.
I am planning to package pgBee into a library, so that it may be used from other java programs or become part of a workflow, but it's always going to be file based. Then again, when you say 'immense amount of data' and 'periodically' you make me worry about JVM coping with your memory requirements.
cheers,
Michael
On Thu, Oct 16, 2008 at 12:14 AM, <mark_addleman@bigfoot.com> wrote:
Hi, Michael -
Thanks for the batch pointer. We are currently using batches with
varying sizes but still aren't getting the performance we'd like.
As for including the COPY facility in the JDBC driver versus doing
bulk loads from a command line utility: Our online application
receives an immense amount of data periodically and must store that
data into Postgres before the next bulk arrives. It makes sense to
process and store this data entirely within the confines of a JVM and
within a transaction thus COPY support in the JDBC driver is very
helpful for us.