Thread: Loading binary data into the database
Hi, I'm having serious problems trying to load large amounts of data into the database. I have this data in binary, database compatible, form, but there seems no way left to upload this data except to turn it all back into strings (since 7.0.2 binary copy has been disabled, making quite a few libpq functions superfluous). This is quite a serious deficiency in my view. So, I want to know what my options are. For this type of thing I would have thought that Karel's stored queries would be useful, combined with some way of uploading binary data to the database. Something along the lines of prepare insert into my_table (col1, col2) values (?,?); execute <handle to the query> 3, 4; To upload binary data there would have to be a libpq call that uploads the data in the execute statement as binary (there is a specification for this already in the current libpq), and executes the prepared plan. For any application that generates a lot of data (especially floating point data), this would be a huge win. An added advantage would be that this type of schema would allow a serial value on the table to be incremented as in any normal insert, which has always been annoying when using copy. I have no idea how hard this is and whether I'm the only person in the world that will find this useful. I seem to be the only one who moaned about the binary copy vanishing, and breaking code, so perhaps nobody else sees this as a problem? Adriaan
Adriaan Joubert <a.joubert@albourne.com> writes: > since 7.0.2 binary copy has been disabled, making > quite a few libpq functions superfluous Such as? IIRC, the reason we disabled it was precisely that there was no support on the client side. (What's worse, there's no support in the FE/BE protocol either. I don't see how you could have made this work...) Cross-machine binary copy is a dangerous thing anyway, since it opens you up to all sorts of compatibility problems. If your app is running on the same machine as the server, you can write data to a file and then send a command to do a binary copy from that file. regards, tom lane
Tom Lane wrote: > Adriaan Joubert <a.joubert@albourne.com> writes: > > since 7.0.2 binary copy has been disabled, making > > quite a few libpq functions superfluous > > Such as? IIRC, the reason we disabled it was precisely that there was > no support on the client side. (What's worse, there's no support in > the FE/BE protocol either. I don't see how you could have made this > work...) I issued a 'copy binary <table> from stdin;' and then sent the data with PQputnbytes (this is now obsolete, isn't it?) and as this was from a CORBA server running on the same machine as the database this worked fine and was very fast (not being able to update a serial was a pain, and I ended up doing it by hand in the server). As some of the data I have to write goes into bytea fields, i now have to convert all non-printable characters to octet codes, which is a total pain in the neck. > Cross-machine binary copy is a dangerous thing anyway, since it opens > you up to all sorts of compatibility problems. If your app is running > on the same machine as the server, you can write data to a file and > then send a command to do a binary copy from that file. Yes sure, if you write from a machine with a different architecture it is going to cause trouble. Reading and writing binary files on the host machine seems kind-of a slow solution to the problem and leads to yet another load of permission problems (Ok, they can be resolved, but it is yet another place where things can go wrong). Perhaps libpq is not the answer. I've even been thinking about writing a SPI function that acts as a CORBA server -- but decided that that is just too ugly to contemplate. So what is the solution? Adriaan
Adriaan Joubert <a.joubert@albourne.com> writes: >> Such as? IIRC, the reason we disabled it was precisely that there was >> no support on the client side. (What's worse, there's no support in >> the FE/BE protocol either. I don't see how you could have made this >> work...) > I issued a 'copy binary <table> from stdin;' and then sent the data with > PQputnbytes How did you get out of COPY state? In binary mode CopyFrom will only recognize EOF as end of data, and there's no provision in the FE/BE protocol for making it see an EOF. You'd have had to break the connection to get out of that --- and I'd have expected the loss of connection to cause a transaction abort, preventing your data from getting committed. (If it didn't abort, that's a bug that needs to be fixed... if the line drops halfway through a copy, you don't want it to commit do you?) The real bottom line here is that the FE/BE protocol would need to be changed to support binary copy properly, and no one's excited about putting more work into the existing protocol, nor about the ensuing compatibility problems. > Perhaps libpq is not the answer. I've even been thinking about writing a > SPI function that acts as a CORBA server -- but decided that that is just > too ugly to contemplate. So what is the solution? A CORBA-based replacement protocol has been discussed seriously, though I haven't noticed any work getting done on it lately. Feel free to pitch in if you think it's a good idea. regards, tom lane
Tom Lane wrote: > Adriaan Joubert <a.joubert@albourne.com> writes: > >> Such as? IIRC, the reason we disabled it was precisely that there was > >> no support on the client side. (What's worse, there's no support in > >> the FE/BE protocol either. I don't see how you could have made this > >> work...) > > > I issued a 'copy binary <table> from stdin;' and then sent the data with > > PQputnbytes > > How did you get out of COPY state? In binary mode CopyFrom will only > recognize EOF as end of data, and there's no provision in the FE/BE > protocol for making it see an EOF. You'd have had to break the > connection to get out of that --- and I'd have expected the loss of > connection to cause a transaction abort, preventing your data from > getting committed. (If it didn't abort, that's a bug that needs to be > fixed... if the line drops halfway through a copy, you don't want it > to commit do you?) Don't know. I first sent the length of the binary buffer, then the buffer (I just stored the whole thing in an STL vector) and PQendcopy to terminate it. But, as you said, libpq is probably not the right way to go about it. Also, the docs for the binary structure were not quite correct, but it took only a little bit of fiddling to get the structure right. I could not find the description of the binary structure back in the current docs on postgresql.org, so I guess this really has been ripped out. > > A CORBA-based replacement protocol has been discussed seriously, though > I haven't noticed any work getting done on it lately. Feel free to > pitch in if you think it's a good idea. Yes, I've been looking through the mailing list. Problem is to settle on a CORBA system that runs everywhere. And it is much more natural to program CORBA in C++, but if I see the problems people have had just compiling the C++ interface to postgres, this looks like a no-go. I'll look around at the various bits and pieces floating around the net. If anybody is working on a CORBA interface to postgres, please let me know! Adriaan
> Adriaan Joubert <a.joubert@albourne.com> writes: > > since 7.0.2 binary copy has been disabled, making > > quite a few libpq functions superfluous > > Such as? IIRC, the reason we disabled it was precisely that there was > no support on the client side. (What's worse, there's no support in > the FE/BE protocol either. I don't see how you could have made this > work...) > > Cross-machine binary copy is a dangerous thing anyway, since it opens > you up to all sorts of compatibility problems. If your app is running > on the same machine as the server, you can write data to a file and > then send a command to do a binary copy from that file. We disabled binary copy? Using \copy or COPY? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > We disabled binary copy? Using \copy or COPY? COPY BINARY to stdout or from stdin is disallowed now. It never really worked anyway. AFAIK, psql's \copy has never had a binary option. regards, tom lane