Re: 7.4 COPY BINARY Format Change - Mailing list pgsql-hackers

From Lee Kindness
Subject Re: 7.4 COPY BINARY Format Change
Date
Msg-id 16175.45361.842160.289484@kelvin.csl.co.uk
Whole thread Raw
In response to Re: 7.4 COPY BINARY Format Change  (Lee Kindness <lkindness@csl.co.uk>)
List pgsql-hackers
I've just sent off patches to pgsql-patches to:

1. Slight clarification to the COPY BINARY format docs

2. A contrib/binarycopy module which wraps-up the detail of creating a
file which can be used as input to COPY BINARY. User can create either
7.1 or 7.4 format files using the same API, without needing to know
the file format, without needing to know the individual binary
format of each field and without needing to explicitly byte-swap.

#2 will be used extensively within Concept Systems code which
interfaces to PostgreSQL. It really simplifies the creation of the
binary files.

Thanks, Lee.

Lee Kindness writes:> Tom Lane writes:>  > Lee Kindness <lkindness@csl.co.uk> writes:>  > > Well in that case the docs
needattention. They describe the>  > > "envelope" surrounding the tuples, but no mention is made of the>  > > format
theyare in. It is reasonable to assume that this format was>  > > the native binary format, as in earlier releases.>  >
Yeah,there should be some mention of that in the COPY ref page I guess>  > --- it's mentioned in the frontend protocol
chapter,but not under COPY.>  > In my defense I'd point out that the contents of individual fields have>  > never been
documentedunder COPY.> > True, the docs have always skipped the specifics for the> tuples. But now that the format has
evolvedbeyond a simple dump of> the bytes the tuple format does need discussing.> >  > > What do I need to do to make
this> > > code work with 7.4? Is there any docs describing the "binary" format>  > > for each of the datatypes or do I
needto reverse-engineer a dump file>  > > or look in the source?>  > ATM, I'd recommend looking in the sources to see
whatthe datatype>  > send/receive routines do.>  > >  > I have been thinking about documenting the binary formats
duringbeta,>  > but am unsure where to put the info.  We never documented the internal>  > formats before either, so
there'sno obvious place.> > Perhaps the documentation of the binary format should be taken out of> the COPY docs and
movedinto the client interfaces documentation? the> COPY docs would of course reference the new location. Just now the>
tuplescould be "documented" simply by referring the reader to the> relevant functions in the relevant source files.
Afterall the source> is the best documentation for this sort of thing.> >  > > Are the routines in libpq/pqformat.c
intended> > > to be used by client applications to read/write the binary COPY files?>  > They are not designed to be
usedoutside the backend environment,>  > although possibly some enterprising person could adapt them.  I am not>  >
surethere's any value in it though.  Copying the backend code helps>  > only if what you want to get out of the
transmissionis the same as the>  > backend's internal format, which for anything more complex than>  > int/float/text
seemsa bit dubious.> > I think there is a lot of use for a binary COPY file API within libpq> - routines to open a
file,write/read a header and write/read common> datatypes. This would remove the need for most people using the binary>
versionof COPY to even know the file format. This would also isolate> people who use this API from any future changes.>
>Would libpq or contrib be the best place for this? Would you agree> this is a good idea for 7.4? I've already got
somethingalong these> lines:> >  extern FILE *lofsdb_Bulk_Open(char **filename);>  extern void  lofsdb_Bulk_Close(FILE
*f,char *filename);>  extern void  lofsdb_Bulk_Write_NCols(FILE *f, short ncols);>  extern void  lofsdb_Bulk_Write(FILE
*f,void *data, size_t sz, size_t count, short ind);>  extern void  lofsdb_Bulk_WriteText(FILE *f, char *data, short
ind);> extern void  lofsdb_Bulk_WriteBytea(FILE *f, char *data, size_t len, short ind);>  extern void
lofsdb_Bulk_WriteTime(FILE*f, double t, short ind);>  extern void  lofsdb_Bulk_WriteTimeNow(FILE *f);> > which could
formthe basis of a contrib module to handle writing out> 7.1 through to 7.4 format files. Naturally lofsdb_Bulk_Write
needsto> go and be replaced by specific functions.> >  > > Well as pointed out in my earlier message nothing has
changedwhich>  > > requires the format to change - there is no real reason it's now>  > > "PGCOPY" and the integer
layoutfield has disappeared.>  > Given that the interpretation of the field contents has changed>  > drastically, I
thoughtit better to make an obvious incompatible>  > change.  We could perhaps have kept the skeleton the same, but to>
> what end?  An app trying to read or write the file as if it were>  > pre-7.4 data would fail miserably anyway.> >
Yeah,but someone (actually you!) went to the effort of making the 7.1> format extensible and documenting it as such...
Itcould have handled> the changes.> >  > > I am still willing to make a patch which does this (to aid those>  > >
writingCOPY format files) and to fully support the reading of the old>  > > format tuples. However i'm not going to
wasteboth our time if this>  > > patch is not going to be positively considered...>  > My vote will be to reject it
becauseof the security problem.> > In which case I think my time would be better spent looking at the API> described
above.>>  > > I can't think of much use of byte swapping when 99% of the>  > > use of COPY BINARY FROM is to improve
performanceover using>  > > INSERT. Both the reader and writer will be using the same binary>  > > integer/float/etc
formats!> > You must think that the universe consists exclusively of Intel hardware.>  > In my view, standardizing on a
machine-independentbinary format will>  > greatly *expand* the usefulness of COPY BINARY, since the files will not>  >
betied to a single architecture.> > Well my testing (or lack of) of the earlier patch would seem to> indicate it was
doneon non-Intel box (Solaris)! I've got access here> to Solaris (2.5 through to 9), AIX (4.1 to 4.3.3), HPUX (9, 10,
11)>and of course Linux flavours - our apps run on these UNIX versions. So> i'm well aware of binary format issues (for
funlook into the SEG-D> and SEG-Y formats used within the seismic industry).> > However, is COPY BINARY meant/designed
tobe used as transfer or> backup mechanism? I have trouble coming up with many uses where a> binary file generated on
oneserver would be loaded into another> server running on a different architecture.> > Regards, Lee.
 


pgsql-hackers by date:

Previous
From: "Andrew Dunstan"
Date:
Subject: Re: problem with RH7.3 Pg7.3.4 binaries
Next
From: Lamar Owen
Date:
Subject: Re: problem with RH7.3 Pg7.3.4 binaries