Thread: FYI: porting Copy API to 8.x
Hi! I'm porting the Copy API to 8.x series of pgjdbc. You may expect a patch at the commit list this week. I've copied Chris Jurka's patch for the 7.x series from here: http://archives.postgresql.org/pgsql-jdbc/2003-12/msg00186.php I'm just modifying that old patch to fit current implementation. Suggestions for enhancements or alternative approaches are welcome so long as they are sufficiently trivial to implement :) Should this patched patch get accepted for inclusion it might make it into official distribution for 8.4. Otherwise (and meanwhile) people will have to keep building their own drivers in order to be able to use COPY TO/FROM STDIN. -- Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/
Kalle, The reason that the patch has never been officially accepted is because it creates two protocol paths. To do this correctly you need to add the copy functionality into the current protocol handler as opposed to having a separate protocol handler for copy. Dave On 30-May-07, at 8:29 AM, Kalle Hallivuori wrote: > Hi! > > I'm porting the Copy API to 8.x series of pgjdbc. You may expect a > patch at the commit list this week. > > I've copied Chris Jurka's patch for the 7.x series from here: > http://archives.postgresql.org/pgsql-jdbc/2003-12/msg00186.php > > I'm just modifying that old patch to fit current implementation. > Suggestions for enhancements or alternative approaches are welcome so > long as they are sufficiently trivial to implement :) > > Should this patched patch get accepted for inclusion it might make it > into official distribution for 8.4. Otherwise (and meanwhile) people > will have to keep building their own drivers in order to be able to > use COPY TO/FROM STDIN. > > -- > Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/ > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq
Hi Dave! 2007/5/30, Dave Cramer <pg@fastcrypt.com>: > The reason that the patch has never been officially accepted is > because it creates two protocol paths. > > To do this correctly you need to add the copy functionality into the > current protocol handler as opposed to having a separate protocol > handler for copy. Thanks, now I understand why it's not there. I can take a look at alternative approaches tomorrow, and only go with the ugly one if I don't come up with an (moderately easily implementable) acceptable design. Is it more of an open design problem that many people have looked closely into, or rather just a matter of someone getting to it? Ie. is it likely there is no clean way to add these special cases inside the protocol handler? As for redesign, which would be the least abhorred way of providing the functionality for use: 1. Separate Copy API available via a getCopyAPI() method in PGConnection API? 2. Separate Copy API available with some cleaner approach, which? 3. Special Copy methods in PGConnection API? 4. Special cases to batch processing (FROM STDIN) and resultset collecting (TO STDOUT)? 5. Special cases to some other part of the standard JDBC API, which? Cheers, -- Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/
Hi Kalle, On 30-May-07, at 3:45 PM, Kalle Hallivuori wrote: > Hi Dave! > > 2007/5/30, Dave Cramer <pg@fastcrypt.com>: >> The reason that the patch has never been officially accepted is >> because it creates two protocol paths. >> >> To do this correctly you need to add the copy functionality into the >> current protocol handler as opposed to having a separate protocol >> handler for copy. > > Thanks, now I understand why it's not there. I can take a look at > alternative approaches tomorrow, and only go with the ugly one if I > don't come up with an (moderately easily implementable) acceptable > design. > > Is it more of an open design problem that many people have looked > closely into, or rather just a matter of someone getting to it? Ie. is > it likely there is no clean way to add these special cases inside the > protocol handler? > > As for redesign, which would be the least abhorred way of providing > the functionality for use: > > 1. Separate Copy API available via a getCopyAPI() method in > PGConnection API? > 2. Separate Copy API available with some cleaner approach, which? > 3. Special Copy methods in PGConnection API? > 4. Special cases to batch processing (FROM STDIN) and resultset > collecting (TO STDOUT)? > 5. Special cases to some other part of the standard JDBC API, which? > I think you have to make it an extension somehow. I'd leave getCopyAPI in place, I think the harder part is getting the protocol to work, > Cheers, > > -- > Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/ > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq
Hi again. 2007/5/31, Dave Cramer <pg@fastcrypt.com>: > I think you have to make it an extension somehow. I'd leave > getCopyAPI in place, I think the harder part is getting the protocol > to work, I found these hooks for protocol level implementation at end of org.postgresql.core.v3.QueryExecutorImpl.processResults(): case 'G': // CopyInResponse case 'H': // CopyOutResponse case 'c': // CopyDone case 'd': // CopyData { // COPY FROM STDIN / COPY TO STDOUT, neither of which are currently // supported. ... handler.handleError(new PSQLException(GT.tr("The driver currently does not support COPY operations."), PSQLState.NOT_IMPLEMENTED)); } break; AFAIK (counting out breaking current design) the only way to implement this at protocol level without touching core.v3 would be to clone the whole package with every class inherited from core.v3, and copy contents of QueryExecutorImpl there. I'd rather not. Problem is that class has its stuff (pgStream) declared as private, so I can't just inherit a class to override that single method. Furthermore, I'd end up duplicating the protocol support implemented in original version of that method. Thus, I'll edit it in place. Even if it ends up in a rejected patch at least the patch gets it right. The protocol spec is clear enough by itself. I'll still have to look how to best pass the CopyData rows between BE and application. They could probably be encapsulated as tuples, but I'll have to read a bit more to find out the best way. I won't handle field types (text/binary in the protocol level) because I'm looking forward to a solution where I could pass whole or possibly even multiple rows as raw(ish) byte[] at once. If this becomes a barrier for acceptance I can reconsider. Thanks for the clear source. It is what makes enhancements such as this possible on short notice. So no commits this week, but there's still hoping I'll get something out. -- Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/
Hi again. I have an implementation of COPY subprotocol support that is implemented inside org.postgresql.core.v3.QueryExecutorImpl and used via a clean API available from PGConnection.getCopyAPI(). I'm trying to test it today and commit the patch next week. To ensure inclusion in the official source, I tried to follow the way Fastpath was implemented. In this implementation a COPY SQL statement is executed normally, after which the response from server initiating COPY subprotocol is handled in the usual processResults(). After that, specific methods must be used to exchange copydata with the server and return to normal protocol. All responses from server during that time are handled in a separate private processCopyResults() method. Alternatively processCopyResults() could be merged into processResults() or separate everything COPY-related from there, keeping it as it was. Opinions? -- Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/
Kalle Hallivuori wrote: > To ensure inclusion in the official source, I tried to follow the way > Fastpath was implemented. I can't comment on whether that approach is good or not, but the fast path has been pretty much obsoleted since the introduction of server-side prepared statements, in 7.4 IIRC. That piece of code is not necessarily a good example to follow. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hello again. Thanks to Chris Jurka for notifying me that there's a limit on size of postings on this list. Attached is a patch against postgresql-jdbc-8.2-505.src containing Copy support inside v3/QueryExecutorImpl usable through a separate API together with minimal unit tests and documentation. I'd like to hear whether I should still fix something before making a version for the 8.3 branch (there are plenty of possible approaches for both client-backend and API, after all). Also I wonder if it would be possible to include this in the official driver? -- Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/
Attachment
On Mon, 11 Jun 2007, Kalle Hallivuori wrote: > Attached is a patch against postgresql-jdbc-8.2-505.src containing > Copy support inside v3/QueryExecutorImpl usable through a separate API > together with minimal unit tests and documentation. > This patch doesn't apply cleanly because there's a mix of unix and windows EOL characters and it seems to have some other application problems. The API is not thread safe. If two threads are using the same connection the synchronization in QueryExecutorImpl prevents them from stepping on each others toes and writing garbage to the backend. So the call to QueryExecutor must be an atomic operation and implies that the QueryExecutorImpl will be the controller and demand/push data from some kind of copy client instead of the other way around. Also I'm not especially fond of the row based API that you've come up with. It seems like you should either go to an element or stream based API. Who has a premade row? They've either got single fields or a bulk stream. Consider someone with a CSV file on their client machine that they want to copy to the server, they'd have to parse it to break it into rows which can be tricky with embedded newlines. They'd like an easy way to just dump it to the server and let it deal with it. Kris Jurka
Hi Kris! Happy to receive feedback. 2007/6/13, Kris Jurka <books@ejurka.com>: > This patch doesn't apply cleanly because there's a mix of unix and windows > EOL characters and it seems to have some other application problems. Sorry for that. I'll pay closer attention to the format in future. > The API is not thread safe. If two threads are using the same connection > the synchronization in QueryExecutorImpl prevents them from stepping on > each others toes and writing garbage to the backend. So the call to > QueryExecutor must be an atomic operation and implies that the > QueryExecutorImpl will be the controller and demand/push data from some > kind of copy client instead of the other way around. Yes. (I thought that was alright since I spotted a statement of no support for synchronization somewhere, but that turned out to be at lower level, pgStream.) I'll look into current synchronization and hopefully I'll be able (as in have the time) to do that. > Also I'm not especially fond of the row based API that you've come up > with. It seems like you should either go to an element or stream based > API. Who has a premade row? ... It doesn't matter what kinds of chunks you feed it (I think that was ingenious of the COPY specification writers). With a CSV file you can read it as a whole into a single byte[] and write that at once. Or you can have static size byte buffer and pass it repeatedly to write(). In my application I now succesfully collect a bunch of field values as separate byte[]'s mixed with references to static instances of delimiter byte[]'s into a large, static size byte[][], write when it fills up, repeat until out of data. Cut time spent in import by half. (The other half of import time is spent post-processing the data :)) However, if that isn't intuitive it has to be explained clearly in all documentation or a more intuitive API offered. I think you're right on all accounts. I'll try to find some time to make it synchronized and self-evident. Shouldn't be too much of an effort now. -- Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/