Thread: COPY support implemented
With the introduction of the V3 protocol in 7.4 it is now possible to support the COPY protocol in the JDBC driver. Before it was impossible to recover from errors and the Connection had to be abandoned. This patch implements support for COPY based on the API proposed by Michael Adler's original patch of about a year ago. For example Connection conn = DriverManager.getConnection(...); CopyManager copy = ((org.postgresql.PGConnection)conn).getCopyAPI(); // copy data from the table to the given output stream OutputStream output = new ByteArrayOutputStream(); copy.copyOut("tablename",output); // copy data from the given input stream to the table InputStream input = new ByteArrayInputStream(output.toByteArray()); copy.copyIn("tablename",input); Kris Jurka
Attachment
Hi Kris, On Tue, 2003-12-30 at 09:25, Kris Jurka wrote: > With the introduction of the V3 protocol in 7.4 it is now possible to > support the COPY protocol in the JDBC driver. Before it was impossible to > recover from errors and the Connection had to be abandoned. This patch > implements support for COPY based on the API proposed by Michael > Adler's original patch of about a year ago. This is great news, as I'm currently working on a project that could benefit from being able to run copy on a client machine. I have patched the driver from the 7.4.1 src distro with the patch and it compiled successfully. After giving it a spin (which actually failed due to another reason, more on that below) I have a little suggestion. Consider this simple example (which happen to be my very code) : String filename = "myfile.csv"; String sql = "COPY <<tablename>> FROM STDIN"; PGConnection pgConnection = (org.postgresql.PGConnection) connection; CopyManager copy = pgConnection.getCopyAPI(); FileInputStream fis = new FileInputStream(filename); copy.copyInQuery(sql, fis); The reason for using copyInQuery is that I need to be able to specify the delimiter. I'm reasoning that i'm probably not alone on this. My suggestion is one of the following: Add CopyManager.copyIn(String tName, InputStream is, String delimiter); Or add Javadoc that clearly states that the query *MUST* be using "FROM STDIN" or the server will start looking locally for the file, giving an FileNotException. Ok, now to my litte trouble. The database is encoded in LATIN1 running on pgsql 7.4.1. I'm running the above code on a different machine running linux using locale "sv_SE". The file I'm trying to import is encoded in ISO-8859-1. Then running the above code I get the following error: org.postgresql.util.PSQLException: ERROR: could not convert UTF-8 character 0x00f6 to ISO8859-1 This must be the JVM's encoding being set to UTF-8, right? Or do I need to specify the encoding in the jdbc url or something? Thanx in advance Per-Olof Norén Curalia AB
Great I've been monitoring the list for exactly this news! Thanks to the developers. Ritchie
On 12 Jan 2004, Per-Olof Noren wrote: > Hi Kris, > > On Tue, 2003-12-30 at 09:25, Kris Jurka wrote: > > With the introduction of the V3 protocol in 7.4 it is now possible to > > support the COPY protocol in the JDBC driver. Before it was impossible to > > recover from errors and the Connection had to be abandoned. This patch > > implements support for COPY based on the API proposed by Michael > > Adler's original patch of about a year ago. > > This is great news, as I'm currently working on a project that could > benefit from being able to run copy on a client machine. > I have patched the driver from the 7.4.1 src distro with the patch and > it compiled successfully. > > After giving it a spin (which actually failed due to another reason, > more on that below) I have a little suggestion. > > Add CopyManager.copyIn(String tName, InputStream is, String delimiter); > > Or add Javadoc that clearly states that the query > *MUST* be using "FROM STDIN" or the server will start looking locally > for the file, giving an FileNotException. I'm working on a new API that is more flexible in this and a number of other regards. Something like this will probably get folded in. > Ok, now to my litte trouble. > The database is encoded in LATIN1 running on pgsql 7.4.1. > I'm running the above code on a different machine running linux using > locale "sv_SE". The file I'm trying to import is encoded in ISO-8859-1. > Then running the above code I get the following error: > org.postgresql.util.PSQLException: ERROR: could not convert UTF-8 > character 0x00f6 to ISO8859-1 > > This must be the JVM's encoding being set to UTF-8, right? > Or do I need to specify the encoding in the jdbc url or something? > As it stands now the JVM's encoding must be UTF-8 or you must do the conversion to UTF-8 on your own. This is not good, but for the first pass I wanted to focus on the copy protocol itself and left a number of user interface features out. The next version of this will allow you to specify some kind of recoding to be done. Additionally it will provide support for a much friendlier interface than an I/O Stream. Based on a suggestion from Oliver Jowett I'm going to add an interface based on SQLInput/SQLOutput so an object can easily read/write itself directly from/to a copy command so there isn't a need to persist things to a Stream if your application doesn't want to. Kris Jurka
Hi, Some thoughts on converting files between different encodings (my search for this was long enough to share the results). If you're on Linux, you might use the "iconv" tool to convert the file encoding (for details: man iconv). On all platforms you could use the "native2ascii" tool (it's part of the JDK), in a 2 step process: first convert from iso-8859-1 to the ASCII/unicode encoded format, then convert that to UTF-8 using the "-reverse" option. For all the options of native2ascii, consult the docs for your platform, possibly: http://java.sun.com/j2se/1.3/docs/tooldocs/tools.html#intl HTH, Csaba. On Mon, 2004-01-12 at 19:34, Per-Olof Noren wrote: > Hi Kris, > > On Tue, 2003-12-30 at 09:25, Kris Jurka wrote: > > With the introduction of the V3 protocol in 7.4 it is now possible to > > support the COPY protocol in the JDBC driver. Before it was impossible to > > recover from errors and the Connection had to be abandoned. This patch > > implements support for COPY based on the API proposed by Michael > > Adler's original patch of about a year ago. > > This is great news, as I'm currently working on a project that could > benefit from being able to run copy on a client machine. > I have patched the driver from the 7.4.1 src distro with the patch and > it compiled successfully. > > After giving it a spin (which actually failed due to another reason, > more on that below) I have a little suggestion. > > Consider this simple example (which happen to be my very code) : > > String filename = "myfile.csv"; > String sql = "COPY <<tablename>> FROM STDIN"; > PGConnection pgConnection = > (org.postgresql.PGConnection) connection; > > CopyManager copy = pgConnection.getCopyAPI(); > FileInputStream fis = new FileInputStream(filename); > copy.copyInQuery(sql, fis); > > The reason for using copyInQuery is that I need to be able to specify > the delimiter. I'm reasoning that i'm probably not alone on this. > > My suggestion is one of the following: > > Add CopyManager.copyIn(String tName, InputStream is, String delimiter); > > Or add Javadoc that clearly states that the query > *MUST* be using "FROM STDIN" or the server will start looking locally > for the file, giving an FileNotException. > > > Ok, now to my litte trouble. > The database is encoded in LATIN1 running on pgsql 7.4.1. > I'm running the above code on a different machine running linux using > locale "sv_SE". The file I'm trying to import is encoded in ISO-8859-1. > Then running the above code I get the following error: > org.postgresql.util.PSQLException: ERROR: could not convert UTF-8 > character 0x00f6 to ISO8859-1 > > This must be the JVM's encoding being set to UTF-8, right? > Or do I need to specify the encoding in the jdbc url or something? > > Thanx in advance > > Per-Olof Norén > Curalia AB > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster
On 13 Jan 2004, Csaba Nagy wrote: > Hi, > > Some thoughts on converting files between different encodings (my search > for this was long enough to share the results). > If you're on Linux, you might use the "iconv" tool to convert the file > encoding (for details: man iconv). > On all platforms you could use the "native2ascii" tool (it's part of the > JDK), in a 2 step process: first convert from iso-8859-1 to the > ASCII/unicode encoded format, then convert that to UTF-8 using the > "-reverse" option. For all the options of native2ascii, consult the docs > for your platform, possibly: > http://java.sun.com/j2se/1.3/docs/tooldocs/tools.html#intl These are applications, so they aren't terribly useful in the case at hand. We need a programmatic means of doing this conversion. Java provides the Reader and Writer classes for doing reads and writes in specific encodings. Further the JDBC driver internally has a Encoding class which is used to convert from the JVM's character set to the databases. So using these two I believe I've got it handling the encoding issue. Right now I've got an API roughly like so for moving data into the server: There is an abstract CopyDataProvider which is implemented by three concete providers. One pulls data from an InputStream. This is the fastest method as it just pushes the data as fast as it can read it. The next pulls from a Reader and will do encoding translation, but the interface is still a rather opaque stream. The final provider gives you control to write the column level entries in a friendly way. This provider takes a CopySource object to draw the actual data from. When streaming the data to the server it will do something like: CopySource source; while (source.next()) { source.writeRow(SQLOutput); // Send data for that row to the server; } So CopySource could for example wrap a Vector of objects representing a table's rows. So when source.writeRow is called it picks the next row and calls writeRow on that object. So now this row equivalent object can persist itself to the SQLOutput object using methods like so writeRow(SQLOutput out) { out.writeInt(this.userid); out.writeString(this.username); out.writeTimestamp(this.lastlogin); } As you can imagine there are three equivalent implementations of CopyDataReceiver for extracting data from the server. Kris Jurka
OK, I wasn't after solving the problem, but suggesting a possible workaround for the moment. I'm using RedHat/Fedora with UTF-8 encoding in an environment where a lot of files are encoded with any of the ISO-8859-x encodings. I've had a lot of annoyance out of this, and just concluded that sometimes is easier to have the files concerted to the right encoding... Cheers, Csaba. [snip] > These are applications, so they aren't terribly useful in the case at > hand. We need a programmatic means of doing this conversion. [snip]