Re: 7.4 COPY BINARY Format Change - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: 7.4 COPY BINARY Format Change |
Date | |
Msg-id | 26887.1059925553@sss.pgh.pa.us Whole thread Raw |
In response to | Re: 7.4 COPY BINARY Format Change (Lee Kindness <lkindness@csl.co.uk>) |
Responses |
Re: 7.4 COPY BINARY Format Change
|
List | pgsql-hackers |
Lee Kindness <lkindness@csl.co.uk> writes: >>> The real change that occurred here is that the individual data fields >>> go through per-datatype send/receive routines, which in addition to >>> implementing a mostly machine-independent binary format also provide >>> defenses against bad input data. > Well in that case the docs need attention. They describe the > "envelope" surrounding the tuples, but no mention is made of the > format they are in. It is reasonable to assume that this format was > the native binary format, as in earlier releases. Yeah, there should be some mention of that in the COPY ref page I guess --- it's mentioned in the frontend protocol chapter, but not under COPY. In my defense I'd point out that the contents of individual fields have never been documented under COPY. > What do I need to do to make this > code work with 7.4? Is there any docs describing the "binary" format > for each of the datatypes or do I need to reverse-engineer a dump file > or look in the source? ATM, I'd recommend looking in the sources to see what the datatype send/receive routines do. I have been thinking about documenting the binary formats during beta, but am unsure where to put the info. We never documented the internal formats before either, so there's no obvious place. > Are the routines in libpq/pqformat.c intended > to be used by client applications to read/write the binary COPY files? They are not designed to be used outside the backend environment, although possibly some enterprising person could adapt them. I am not sure there's any value in it though. Copying the backend code helps only if what you want to get out of the transmission is the same as the backend's internal format, which for anything more complex than int/float/text seems a bit dubious. >>> We are not going back to the pre-7.4 format. Sorry. > Well as pointed out in my earlier message nothing has changed which > requires the format to change - there is no real reason it's now > "PGCOPY" and the integer layout field has disappeared. Given that the interpretation of the field contents has changed drastically, I thought it better to make an obvious incompatible change. We could perhaps have kept the skeleton the same, but to what end? An app trying to read or write the file as if it were pre-7.4 data would fail miserably anyway. > I am still willing to make a patch which does this (to aid those > writing COPY format files) and to fully support the reading of the old > format tuples. However i'm not going to waste both our time if this > patch is not going to be positively considered... My vote will be to reject it because of the security problem. > I can't think of much use of byte swapping when 99% of the > use of COPY BINARY FROM is to improve performance over using > INSERT. Both the reader and writer will be using the same binary > integer/float/etc formats! You must think that the universe consists exclusively of Intel hardware. In my view, standardizing on a machine-independent binary format will greatly *expand* the usefulness of COPY BINARY, since the files will not be tied to a single architecture. regards, tom lane
pgsql-hackers by date: