Re: 7.4 COPY BINARY Format Change - Mailing list pgsql-hackers
From | Lee Kindness |
---|---|
Subject | Re: 7.4 COPY BINARY Format Change |
Date | |
Msg-id | 16173.7590.451491.148083@kelvin.csl.co.uk Whole thread Raw |
In response to | Re: 7.4 COPY BINARY Format Change (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: 7.4 COPY BINARY Format Change
|
List | pgsql-hackers |
Tom, Tom Lane writes:> Lee Kindness <lkindness@csl.co.uk> writes:> > I've attached a patch which lets COPY read in the 7.1 format.However> > i'm not convinced this is the right way to go - I think the format> > which is output by 7.4 should beidentical to the 7.1 format.> > You are greatly underestimating the changes that occurred in COPY BINARY.> If the formatdifference had been as minor as you think, I would not> have gratuitously broken compatibility.> > The real changethat occurred here is that the individual data fields> go through per-datatype send/receive routines, which in additionto> implementing a mostly machine-independent binary format also provide> defenses against bad input data.> > Tocontinue to read the old COPY BINARY format, we'd have to bypass> those routines and allow direct read of the internaldata formats.> This was a security risk before and would be a much bigger one now,> seeing that we allow COPY BINARYFROM STDIN to unprivileged users. It> is trivial to crash the backend by feeding it bad internal-format> data. Well in that case the docs need attention. They describe the "envelope" surrounding the tuples, but no mention is made of the format they are in. It is reasonable to assume that this format was the native binary format, as in earlier releases. I've got applications which create binary "bulkload" files which are loaded into the database using COPY FROM. Currently they write the data out using simple fwrite calls. What do I need to do to make this code work with 7.4? Is there any docs describing the "binary" format for each of the datatypes or do I need to reverse-engineer a dump file or look in the source? Are the routines in libpq/pqformat.c intended to be used by client applications to read/write the binary COPY files? If so they also need documented in the libpq docs and that documentation linked to from the COPY docs. > (I don't believe that the patch works anyway, given that you aren't doing> anything to disable use of the per-datatypereceive routine. It might> work as-is for text fields, and for integers on bigendian machines, but> not formuch else.) Yeah, I didn't spend a lot of effort in that respect - after all I said myself I didn't see the patch being accepted... > We are not going back to the pre-7.4 format. Sorry. Well as pointed out in my earlier message nothing has changed which requires the format to change - there is no real reason it's now "PGCOPY" and the integer layout field has disappeared. The change for the byte swapping should have been indicated by an entry in the flags field. I am still willing to make a patch which does this (to aid those writing COPY format files) and to fully support the reading of the old format tuples. However i'm not going to waste both our time if this patch is not going to be positively considered... I think it's worthwhile reiterating that this change will be a real pain for PostgreSQL users when migrating to 7.4. To be honest i'd probably stick with 7.3 until the subsequent major release. Have a think what benefit this incompatibility gives users of COPY BINARY... I can't think of much use of byte swapping when 99% of the use of COPY BINARY FROM is to improve performance over using INSERT. Both the reader and writer will be using the same binary integer/float/etc formats! So, will I look at implementing these changes? Or not? L.
pgsql-hackers by date: