Re: 7.4 COPY BINARY Format Change - Mailing list pgsql-hackers

From Tom Lane
Subject Re: 7.4 COPY BINARY Format Change
Date
Msg-id 26887.1059925553@sss.pgh.pa.us
Whole thread Raw
In response to Re: 7.4 COPY BINARY Format Change  (Lee Kindness <lkindness@csl.co.uk>)
Responses Re: 7.4 COPY BINARY Format Change
List pgsql-hackers
Lee Kindness <lkindness@csl.co.uk> writes:
>>> The real change that occurred here is that the individual data fields
>>> go through per-datatype send/receive routines, which in addition to
>>> implementing a mostly machine-independent binary format also provide
>>> defenses against bad input data.

> Well in that case the docs need attention. They describe the
> "envelope" surrounding the tuples, but no mention is made of the
> format they are in. It is reasonable to assume that this format was
> the native binary format, as in earlier releases.

Yeah, there should be some mention of that in the COPY ref page I guess
--- it's mentioned in the frontend protocol chapter, but not under COPY.
In my defense I'd point out that the contents of individual fields have
never been documented under COPY.

> What do I need to do to make this
> code work with 7.4? Is there any docs describing the "binary" format
> for each of the datatypes or do I need to reverse-engineer a dump file
> or look in the source?

ATM, I'd recommend looking in the sources to see what the datatype
send/receive routines do.

I have been thinking about documenting the binary formats during beta,
but am unsure where to put the info.  We never documented the internal
formats before either, so there's no obvious place.

> Are the routines in libpq/pqformat.c intended
> to be used by client applications to read/write the binary COPY files?

They are not designed to be used outside the backend environment,
although possibly some enterprising person could adapt them.  I am not
sure there's any value in it though.  Copying the backend code helps
only if what you want to get out of the transmission is the same as the
backend's internal format, which for anything more complex than
int/float/text seems a bit dubious.

>>> We are not going back to the pre-7.4 format.  Sorry.

> Well as pointed out in my earlier message nothing has changed which
> requires the format to change - there is no real reason it's now
> "PGCOPY" and the integer layout field has disappeared.

Given that the interpretation of the field contents has changed
drastically, I thought it better to make an obvious incompatible
change.  We could perhaps have kept the skeleton the same, but to
what end?  An app trying to read or write the file as if it were
pre-7.4 data would fail miserably anyway.

> I am still willing to make a patch which does this (to aid those
> writing COPY format files) and to fully support the reading of the old
> format tuples. However i'm not going to waste both our time if this
> patch is not going to be positively considered...

My vote will be to reject it because of the security problem.

> I can't think of much use of byte swapping when 99% of the
> use of COPY BINARY FROM is to improve performance over using
> INSERT. Both the reader and writer will be using the same binary
> integer/float/etc formats!

You must think that the universe consists exclusively of Intel hardware.
In my view, standardizing on a machine-independent binary format will
greatly *expand* the usefulness of COPY BINARY, since the files will not
be tied to a single architecture.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Rod Taylor
Date:
Subject: Re: SQL2003 GENERATED ... AS ... syntax
Next
From: Tom Lane
Date:
Subject: Re: SQL2003 GENERATED ... AS ... syntax