Thread: COPY support implemented

COPY support implemented

From

Kris Jurka

Date:

30 December 2003, 04:25:31

With the introduction of the V3 protocol in 7.4 it is now possible to
support the COPY protocol in the JDBC driver.  Before it was impossible to
recover from errors and the Connection had to be abandoned.  This patch
implements support for COPY based on the API proposed by Michael
Adler's original patch of about a year ago.

For example

Connection conn = DriverManager.getConnection(...);
CopyManager copy = ((org.postgresql.PGConnection)conn).getCopyAPI();

// copy data from the table to the given output stream
OutputStream output = new ByteArrayOutputStream();
copy.copyOut("tablename",output);

// copy data from the given input stream to the table
InputStream input = new ByteArrayInputStream(output.toByteArray());
copy.copyIn("tablename",input);

Kris Jurka

Attachment

copy.patch

Re: COPY support implemented

From

Per-Olof Noren

Date:

12 January 2004, 14:35:34

Hi Kris,

On Tue, 2003-12-30 at 09:25, Kris Jurka wrote:
> With the introduction of the V3 protocol in 7.4 it is now possible to
> support the COPY protocol in the JDBC driver.  Before it was impossible to
> recover from errors and the Connection had to be abandoned.  This patch
> implements support for COPY based on the API proposed by Michael
> Adler's original patch of about a year ago.

This is great news, as I'm currently working on a project that could
benefit from being able to run copy on a client machine.
I have patched the driver from the 7.4.1 src distro with the patch and
it compiled successfully.

After giving it a spin (which actually failed due to another reason,
more on that below) I have a little suggestion.

Consider this simple example (which happen to be my very code) :

String filename = "myfile.csv";
String sql = "COPY <<tablename>> FROM STDIN";
PGConnection pgConnection =
(org.postgresql.PGConnection) connection;

CopyManager copy = pgConnection.getCopyAPI();
FileInputStream fis = new FileInputStream(filename);
copy.copyInQuery(sql, fis);

The reason for using copyInQuery is that I need to be able to specify
the delimiter. I'm reasoning that i'm probably not alone on this.

My suggestion is one of the following:

Add CopyManager.copyIn(String tName, InputStream is, String delimiter);

Or add Javadoc that clearly states that the query
*MUST* be using "FROM STDIN" or the server will start looking locally
for the file, giving an FileNotException.

Ok, now to my litte trouble.
The database is encoded in LATIN1 running on pgsql 7.4.1.
I'm running the above code on a different machine running linux using
locale "sv_SE". The file I'm trying to import is encoded in ISO-8859-1.
Then running the above code I get the following error:
org.postgresql.util.PSQLException: ERROR: could not convert UTF-8
character 0x00f6 to ISO8859-1

This must be the JVM's encoding being set to UTF-8, right?
Or do I need to specify the encoding in the jdbc url or something?

Thanx in advance

Per-Olof Norén
Curalia AB

Re: COPY support implemented

From

ritchie

Date:

12 January 2004, 14:51:49

Great I've been monitoring the list for exactly this news! Thanks to the
developers.

Ritchie

Re: COPY support implemented

From

Kris Jurka

Date:

12 January 2004, 14:52:27

On 12 Jan 2004, Per-Olof Noren wrote:

> Hi Kris,
>
> On Tue, 2003-12-30 at 09:25, Kris Jurka wrote:
> > With the introduction of the V3 protocol in 7.4 it is now possible to
> > support the COPY protocol in the JDBC driver.  Before it was impossible to
> > recover from errors and the Connection had to be abandoned.  This patch
> > implements support for COPY based on the API proposed by Michael
> > Adler's original patch of about a year ago.
>
> This is great news, as I'm currently working on a project that could
> benefit from being able to run copy on a client machine.
> I have patched the driver from the 7.4.1 src distro with the patch and
> it compiled successfully.
>
> After giving it a spin (which actually failed due to another reason,
> more on that below) I have a little suggestion.
>
> Add CopyManager.copyIn(String tName, InputStream is, String delimiter);
>
> Or add Javadoc that clearly states that the query
> *MUST* be using "FROM STDIN" or the server will start looking locally
> for the file, giving an FileNotException.

I'm working on a new API that is more flexible in this and a number of
other regards.  Something like this will probably get folded in.

> Ok, now to my litte trouble.
> The database is encoded in LATIN1 running on pgsql 7.4.1.
> I'm running the above code on a different machine running linux using
> locale "sv_SE". The file I'm trying to import is encoded in ISO-8859-1.
> Then running the above code I get the following error:
> org.postgresql.util.PSQLException: ERROR: could not convert UTF-8
> character 0x00f6 to ISO8859-1
>
> This must be the JVM's encoding being set to UTF-8, right?
> Or do I need to specify the encoding in the jdbc url or something?
>

As it stands now the JVM's encoding must be UTF-8 or you must do the
conversion to UTF-8 on your own.  This is not good, but for the first pass
I wanted to focus on the copy protocol itself and left a number of user
interface features out.

The next version of this will allow you to specify some kind of recoding
to be done.  Additionally it will provide support for a much friendlier
interface than an I/O Stream.  Based on a suggestion from Oliver Jowett
I'm going to add an interface based on SQLInput/SQLOutput so an object can
easily read/write itself directly from/to a copy command so there isn't a
need to persist things to a Stream if your application doesn't want to.

Kris Jurka

Re: COPY support implemented

From

Csaba Nagy

Date:

13 January 2004, 05:59:05

Hi,

Some thoughts on converting files between different encodings (my search
for this was long enough to share the results).
If you're on Linux, you might use the "iconv" tool to convert the file
encoding (for details: man iconv).
On all platforms you could use the "native2ascii" tool (it's part of the
JDK), in a 2 step process: first convert from iso-8859-1 to the
ASCII/unicode encoded format, then convert that to UTF-8 using the
"-reverse" option. For all the options of native2ascii, consult the docs
for your platform, possibly:
http://java.sun.com/j2se/1.3/docs/tooldocs/tools.html#intl

HTH,
Csaba.


On Mon, 2004-01-12 at 19:34, Per-Olof Noren wrote:
> Hi Kris,
>
> On Tue, 2003-12-30 at 09:25, Kris Jurka wrote:
> > With the introduction of the V3 protocol in 7.4 it is now possible to
> > support the COPY protocol in the JDBC driver.  Before it was impossible to
> > recover from errors and the Connection had to be abandoned.  This patch
> > implements support for COPY based on the API proposed by Michael
> > Adler's original patch of about a year ago.
>
> This is great news, as I'm currently working on a project that could
> benefit from being able to run copy on a client machine.
> I have patched the driver from the 7.4.1 src distro with the patch and
> it compiled successfully.
>
> After giving it a spin (which actually failed due to another reason,
> more on that below) I have a little suggestion.
>
> Consider this simple example (which happen to be my very code) :
>
> String filename = "myfile.csv";
> String sql = "COPY <<tablename>> FROM STDIN";
> PGConnection pgConnection =
> (org.postgresql.PGConnection) connection;
>
> CopyManager copy = pgConnection.getCopyAPI();
> FileInputStream fis = new FileInputStream(filename);
> copy.copyInQuery(sql, fis);
>
> The reason for using copyInQuery is that I need to be able to specify
> the delimiter. I'm reasoning that i'm probably not alone on this.
>
> My suggestion is one of the following:
>
> Add CopyManager.copyIn(String tName, InputStream is, String delimiter);
>
> Or add Javadoc that clearly states that the query
> *MUST* be using "FROM STDIN" or the server will start looking locally
> for the file, giving an FileNotException.
>
>
> Ok, now to my litte trouble.
> The database is encoded in LATIN1 running on pgsql 7.4.1.
> I'm running the above code on a different machine running linux using
> locale "sv_SE". The file I'm trying to import is encoded in ISO-8859-1.
> Then running the above code I get the following error:
> org.postgresql.util.PSQLException: ERROR: could not convert UTF-8
> character 0x00f6 to ISO8859-1
>
> This must be the JVM's encoding being set to UTF-8, right?
> Or do I need to specify the encoding in the jdbc url or something?
>
> Thanx in advance
>
> Per-Olof Norén
> Curalia AB
>
>
>
>
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

Re: COPY support implemented

From

Kris Jurka

Date:

13 January 2004, 06:39:26

On 13 Jan 2004, Csaba Nagy wrote:

> Hi,
>
> Some thoughts on converting files between different encodings (my search
> for this was long enough to share the results).
> If you're on Linux, you might use the "iconv" tool to convert the file
> encoding (for details: man iconv).
> On all platforms you could use the "native2ascii" tool (it's part of the
> JDK), in a 2 step process: first convert from iso-8859-1 to the
> ASCII/unicode encoded format, then convert that to UTF-8 using the
> "-reverse" option. For all the options of native2ascii, consult the docs
> for your platform, possibly:
> http://java.sun.com/j2se/1.3/docs/tooldocs/tools.html#intl

These are applications, so they aren't terribly useful in the case at
hand.  We need a programmatic means of doing this conversion.  Java
provides the Reader and Writer classes for doing reads and writes
in specific encodings.  Further the JDBC driver internally has a Encoding
class which is used to convert from the JVM's character set to the
databases.  So using these two I believe I've got it handling the encoding
issue.

Right now I've got an API roughly like so for moving data into the server:
There is an abstract CopyDataProvider which is implemented by three
concete providers.  One pulls data from an InputStream.  This is the
fastest method as it just pushes the data as fast as it can read it.  The
next pulls from a Reader and will do encoding translation, but the
interface is still a rather opaque stream.  The final provider gives you
control to write the column level entries in a friendly way.  This
provider takes a CopySource object to draw the actual data from.  When
streaming the data to the server it will do something like:

CopySource source;
while (source.next()) {
    source.writeRow(SQLOutput);
    // Send data for that row to the server;
}

So CopySource could for example wrap a Vector of objects representing a
table's rows.  So when source.writeRow is called it picks the next row and
calls writeRow on that object.  So now this row equivalent object can
persist itself to the SQLOutput object using methods like so

writeRow(SQLOutput out) {
    out.writeInt(this.userid);
    out.writeString(this.username);
    out.writeTimestamp(this.lastlogin);
}

As you can imagine there are three equivalent implementations of
CopyDataReceiver for extracting data from the server.

Kris Jurka

Re: COPY support implemented

From

Csaba Nagy

Date:

13 January 2004, 06:45:02

OK, I wasn't after solving the problem, but suggesting a possible
workaround for the moment. I'm using RedHat/Fedora with UTF-8 encoding
in an environment where a lot of files are encoded with any of the
ISO-8859-x encodings. I've had a lot of annoyance out of this, and just
concluded that sometimes is easier to have the files concerted to the
right encoding...

Cheers,
Csaba.

[snip]
> These are applications, so they aren't terribly useful in the case at
> hand.  We need a programmatic means of doing this conversion.
[snip]