Home > mailing lists

Re: 7.4 COPY BINARY Format Change - Mailing list pgsql-hackers

From	Lee Kindness
Subject	Re: 7.4 COPY BINARY Format Change
Date	August 3, 2003 11:35:35
Msg-id	16173.7590.451491.148083@kelvin.csl.co.uk Whole thread Raw
In response to	Re: 7.4 COPY BINARY Format Change (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: 7.4 COPY BINARY Format Change
List	pgsql-hackers

Tree view

Tom,

Tom Lane writes:> Lee Kindness <lkindness@csl.co.uk> writes:> > I've attached a patch which lets COPY read in the 7.1
format.However> > i'm not convinced this is the right way to go - I think the format> > which is output by 7.4 should
beidentical to the 7.1 format.> > You are greatly underestimating the changes that occurred in COPY BINARY.> If the
formatdifference had been as minor as you think, I would not> have gratuitously broken compatibility.> > The real
changethat occurred here is that the individual data fields> go through per-datatype send/receive routines, which in
additionto> implementing a mostly machine-independent binary format also provide> defenses against bad input data.> >
Tocontinue to read the old COPY BINARY format, we'd have to bypass> those routines and allow direct read of the
internaldata formats.> This was a security risk before and would be a much bigger one now,> seeing that we allow COPY
BINARYFROM STDIN to unprivileged users. It> is trivial to crash the backend by feeding it bad internal-format> data.

Well in that case the docs need attention. They describe the
"envelope" surrounding the tuples, but no mention is made of the
format they are in. It is reasonable to assume that this format was
the native binary format, as in earlier releases.

I've got applications which create binary "bulkload" files which are
loaded into the database using COPY FROM. Currently they write the
data out using simple fwrite calls. What do I need to do to make this
code work with 7.4? Is there any docs describing the "binary" format
for each of the datatypes or do I need to reverse-engineer a dump file
or look in the source? Are the routines in libpq/pqformat.c intended
to be used by client applications to read/write the binary COPY files?
If so they also need documented in the libpq docs and that
documentation linked to from the COPY docs.
> (I don't believe that the patch works anyway, given that you aren't doing> anything to disable use of the
per-datatypereceive routine. It might> work as-is for text fields, and for integers on bigendian machines, but> not
formuch else.)

Yeah, I didn't spend a lot of effort in that respect - after all I
said myself I didn't see the patch being accepted...
> We are not going back to the pre-7.4 format. Sorry.

Well as pointed out in my earlier message nothing has changed which
requires the format to change - there is no real reason it's now
"PGCOPY" and the integer layout field has disappeared. The change for
the byte swapping should have been indicated by an entry in the flags
field.

I am still willing to make a patch which does this (to aid those
writing COPY format files) and to fully support the reading of the old
format tuples. However i'm not going to waste both our time if this
patch is not going to be positively considered...

I think it's worthwhile reiterating that this change will be a real
pain for PostgreSQL users when migrating to 7.4. To be honest i'd
probably stick with 7.3 until the subsequent major release. Have a
think what benefit this incompatibility gives users of COPY
BINARY... I can't think of much use of byte swapping when 99% of the
use of COPY BINARY FROM is to improve performance over using
INSERT. Both the reader and writer will be using the same binary
integer/float/etc formats!

So, will I look at implementing these changes? Or not?

pgsql-hackers by date:

From: des@des.no (Dag-Erling Smørgrav)
Date: 03 August 2003, 11:32:04
Subject: Re: SQL2003 GENERATED ... AS ... syntax

From: Rod Taylor
Date: 03 August 2003, 12:01:43
Subject: Re: SQL2003 GENERATED ... AS ... syntax

Re: 7.4 COPY BINARY Format Change - Mailing list pgsql-hackers

Previous

Next