Home > mailing lists

Re: 7.4 COPY BINARY Format Change - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: 7.4 COPY BINARY Format Change
Date	August 3, 2003 12:46:29
Msg-id	26887.1059925553@sss.pgh.pa.us Whole thread Raw
In response to	Re: 7.4 COPY BINARY Format Change (Lee Kindness <lkindness@csl.co.uk>)
Responses	Re: 7.4 COPY BINARY Format Change
List	pgsql-hackers

Tree view

Lee Kindness <lkindness@csl.co.uk> writes:
>>> The real change that occurred here is that the individual data fields
>>> go through per-datatype send/receive routines, which in addition to
>>> implementing a mostly machine-independent binary format also provide
>>> defenses against bad input data.

> Well in that case the docs need attention. They describe the
> "envelope" surrounding the tuples, but no mention is made of the
> format they are in. It is reasonable to assume that this format was
> the native binary format, as in earlier releases.

Yeah, there should be some mention of that in the COPY ref page I guess
--- it's mentioned in the frontend protocol chapter, but not under COPY.
In my defense I'd point out that the contents of individual fields have
never been documented under COPY.

> What do I need to do to make this
> code work with 7.4? Is there any docs describing the "binary" format
> for each of the datatypes or do I need to reverse-engineer a dump file
> or look in the source?

ATM, I'd recommend looking in the sources to see what the datatype
send/receive routines do.

I have been thinking about documenting the binary formats during beta,
but am unsure where to put the info.  We never documented the internal
formats before either, so there's no obvious place.

> Are the routines in libpq/pqformat.c intended
> to be used by client applications to read/write the binary COPY files?

They are not designed to be used outside the backend environment,
although possibly some enterprising person could adapt them.  I am not
sure there's any value in it though.  Copying the backend code helps
only if what you want to get out of the transmission is the same as the
backend's internal format, which for anything more complex than
int/float/text seems a bit dubious.

>>> We are not going back to the pre-7.4 format.  Sorry.

> Well as pointed out in my earlier message nothing has changed which
> requires the format to change - there is no real reason it's now
> "PGCOPY" and the integer layout field has disappeared.

Given that the interpretation of the field contents has changed
drastically, I thought it better to make an obvious incompatible
change.  We could perhaps have kept the skeleton the same, but to
what end?  An app trying to read or write the file as if it were
pre-7.4 data would fail miserably anyway.

> I am still willing to make a patch which does this (to aid those
> writing COPY format files) and to fully support the reading of the old
> format tuples. However i'm not going to waste both our time if this
> patch is not going to be positively considered...

My vote will be to reject it because of the security problem.

> I can't think of much use of byte swapping when 99% of the
> use of COPY BINARY FROM is to improve performance over using
> INSERT. Both the reader and writer will be using the same binary
> integer/float/etc formats!

You must think that the universe consists exclusively of Intel hardware.
In my view, standardizing on a machine-independent binary format will
greatly *expand* the usefulness of COPY BINARY, since the files will not
be tied to a single architecture.
        regards, tom lane

pgsql-hackers by date:

From: Rod Taylor
Date: 03 August 2003, 12:01:43
Subject: Re: SQL2003 GENERATED ... AS ... syntax

From: Tom Lane
Date: 03 August 2003, 13:04:04
Subject: Re: SQL2003 GENERATED ... AS ... syntax

Re: 7.4 COPY BINARY Format Change - Mailing list pgsql-hackers

Previous

Next