Thread: Loading binary data into the database

Loading binary data into the database

From
Adriaan Joubert
Date:
Hi,
   I'm having serious problems trying to load large amounts of data
into the database. I have this data in binary, database compatible,
form, but there seems no way left to upload this data except to turn it
all back into strings (since 7.0.2 binary copy has been disabled, making
quite a few libpq functions superfluous). This is quite a serious
deficiency in my view.

So, I want to know what my options are. For this type of thing I would
have thought that Karel's stored queries would be useful, combined with
some way of uploading binary data to the database. Something along the
lines of

prepare insert into my_table (col1, col2) values (?,?);

execute <handle to the query> 3, 4;

To upload binary data there would have to be a libpq call that uploads
the data in the execute statement as binary (there is a specification
for this already in the current libpq), and executes the prepared plan.
For any application that generates a lot of data (especially floating
point data), this would be a huge win. An added advantage would be that
this type of schema would allow a serial value on the table to be
incremented as in any normal insert, which has always been annoying when
using copy.

I have no idea how hard this is and whether I'm the only person in the
world that will find this useful. I seem to be the only one who moaned
about the binary copy vanishing, and breaking code, so perhaps nobody
else sees this as a problem?

Adriaan





Re: Loading binary data into the database

From
Tom Lane
Date:
Adriaan Joubert <a.joubert@albourne.com> writes:
> since 7.0.2 binary copy has been disabled, making
> quite a few libpq functions superfluous

Such as?  IIRC, the reason we disabled it was precisely that there was
no support on the client side.  (What's worse, there's no support in
the FE/BE protocol either.  I don't see how you could have made this
work...)

Cross-machine binary copy is a dangerous thing anyway, since it opens
you up to all sorts of compatibility problems.  If your app is running
on the same machine as the server, you can write data to a file and
then send a command to do a binary copy from that file.
        regards, tom lane


Re: Loading binary data into the database

From
Adriaan Joubert
Date:
Tom Lane wrote:

> Adriaan Joubert <a.joubert@albourne.com> writes:
> > since 7.0.2 binary copy has been disabled, making
> > quite a few libpq functions superfluous
>
> Such as?  IIRC, the reason we disabled it was precisely that there was
> no support on the client side.  (What's worse, there's no support in
> the FE/BE protocol either.  I don't see how you could have made this
> work...)

I issued a 'copy binary <table> from stdin;' and then sent the data with
PQputnbytes (this is now obsolete, isn't it?) and as this was from a
CORBA server running on the same machine as the database this worked fine
and was very fast (not being able to update a serial was a pain, and I
ended up doing it by hand in the server). As some of the data I have to
write goes into bytea fields, i now have to convert all non-printable
characters to octet codes, which is a total pain in the neck.

> Cross-machine binary copy is a dangerous thing anyway, since it opens
> you up to all sorts of compatibility problems.  If your app is running
> on the same machine as the server, you can write data to a file and
> then send a command to do a binary copy from that file.

Yes sure, if you write from a machine with a different architecture it is
going to cause trouble. Reading and writing binary files on the host
machine seems kind-of a slow solution to the problem and leads to yet
another load of permission problems (Ok, they can be resolved, but it is
yet another place where things can go wrong).

Perhaps libpq is not the answer. I've even been thinking about writing a
SPI function that acts as a CORBA server -- but decided that that is just
too ugly to contemplate. So what is the solution?

Adriaan



Re: Loading binary data into the database

From
Tom Lane
Date:
Adriaan Joubert <a.joubert@albourne.com> writes:
>> Such as?  IIRC, the reason we disabled it was precisely that there was
>> no support on the client side.  (What's worse, there's no support in
>> the FE/BE protocol either.  I don't see how you could have made this
>> work...)

> I issued a 'copy binary <table> from stdin;' and then sent the data with
> PQputnbytes

How did you get out of COPY state?  In binary mode CopyFrom will only
recognize EOF as end of data, and there's no provision in the FE/BE
protocol for making it see an EOF.  You'd have had to break the
connection to get out of that --- and I'd have expected the loss of
connection to cause a transaction abort, preventing your data from
getting committed.  (If it didn't abort, that's a bug that needs to be
fixed... if the line drops halfway through a copy, you don't want it
to commit do you?)

The real bottom line here is that the FE/BE protocol would need to be
changed to support binary copy properly, and no one's excited about
putting more work into the existing protocol, nor about the ensuing
compatibility problems.


> Perhaps libpq is not the answer. I've even been thinking about writing a
> SPI function that acts as a CORBA server -- but decided that that is just
> too ugly to contemplate. So what is the solution?

A CORBA-based replacement protocol has been discussed seriously, though
I haven't noticed any work getting done on it lately.  Feel free to
pitch in if you think it's a good idea.
        regards, tom lane


Re: Loading binary data into the database

From
Adriaan Joubert
Date:
Tom Lane wrote:

> Adriaan Joubert <a.joubert@albourne.com> writes:
> >> Such as?  IIRC, the reason we disabled it was precisely that there was
> >> no support on the client side.  (What's worse, there's no support in
> >> the FE/BE protocol either.  I don't see how you could have made this
> >> work...)
>
> > I issued a 'copy binary <table> from stdin;' and then sent the data with
> > PQputnbytes
>
> How did you get out of COPY state?  In binary mode CopyFrom will only
> recognize EOF as end of data, and there's no provision in the FE/BE
> protocol for making it see an EOF.  You'd have had to break the
> connection to get out of that --- and I'd have expected the loss of
> connection to cause a transaction abort, preventing your data from
> getting committed.  (If it didn't abort, that's a bug that needs to be
> fixed... if the line drops halfway through a copy, you don't want it
> to commit do you?)

Don't know. I first sent the length of the binary buffer, then the buffer (I
just stored the whole thing in an STL vector) and PQendcopy to terminate it.
But, as you said, libpq is probably not the right way to go about it. Also,
the docs for the binary structure were not quite correct, but it took only a
little bit of fiddling to get the structure right. I could not find the
description of the binary structure back in the current docs on
postgresql.org, so I guess this really has been ripped out.

>
> A CORBA-based replacement protocol has been discussed seriously, though
> I haven't noticed any work getting done on it lately.  Feel free to
> pitch in if you think it's a good idea.

Yes, I've been looking through the mailing list. Problem is to settle on a
CORBA system that runs everywhere. And it is much more natural to program
CORBA in C++, but if I see the problems people have had just compiling the C++
interface to postgres, this looks like a no-go. I'll look around at the
various bits and pieces floating around the net.

If anybody is working on a CORBA interface to postgres, please let me know!

Adriaan



Re: Loading binary data into the database

From
Bruce Momjian
Date:
> Adriaan Joubert <a.joubert@albourne.com> writes:
> > since 7.0.2 binary copy has been disabled, making
> > quite a few libpq functions superfluous
> 
> Such as?  IIRC, the reason we disabled it was precisely that there was
> no support on the client side.  (What's worse, there's no support in
> the FE/BE protocol either.  I don't see how you could have made this
> work...)
> 
> Cross-machine binary copy is a dangerous thing anyway, since it opens
> you up to all sorts of compatibility problems.  If your app is running
> on the same machine as the server, you can write data to a file and
> then send a command to do a binary copy from that file.

We disabled binary copy?  Using \copy or COPY?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Loading binary data into the database

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> We disabled binary copy?  Using \copy or COPY?

COPY BINARY to stdout or from stdin is disallowed now.  It never
really worked anyway.  AFAIK, psql's \copy has never had a binary
option.
        regards, tom lane