Re: 7.4 COPY BINARY Format Change - Mailing list pgsql-hackers

From Lee Kindness
Subject Re: 7.4 COPY BINARY Format Change
Date
Msg-id 16174.13472.307636.442127@kelvin.csl.co.uk
Whole thread Raw
In response to Re: 7.4 COPY BINARY Format Change  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: 7.4 COPY BINARY Format Change
Re: 7.4 COPY BINARY Format Change
List pgsql-hackers
Tom Lane writes:> Lee Kindness <lkindness@csl.co.uk> writes:> > Well in that case the docs need attention. They
describethe> > "envelope" surrounding the tuples, but no mention is made of the> > format they are in. It is reasonable
toassume that this format was> > the native binary format, as in earlier releases.> Yeah, there should be some mention
ofthat in the COPY ref page I guess> --- it's mentioned in the frontend protocol chapter, but not under COPY.> In my
defenseI'd point out that the contents of individual fields have> never been documented under COPY.
 

True, the docs have always skipped the specifics for the
tuples. But now that the format has evolved beyond a simple dump of
the bytes the tuple format does need discussing.
> > What do I need to do to make this> > code work with 7.4? Is there any docs describing the "binary" format> > for
eachof the datatypes or do I need to reverse-engineer a dump file> > or look in the source?> ATM, I'd recommend looking
inthe sources to see what the datatype> send/receive routines do.> > I have been thinking about documenting the binary
formatsduring beta,> but am unsure where to put the info.  We never documented the internal> formats before either, so
there'sno obvious place.
 

Perhaps the documentation of the binary format should be taken out of
the COPY docs and moved into the client interfaces documentation? the
COPY docs would of course reference the new location. Just now the
tuples could be "documented" simply by referring the reader to the
relevant functions in the relevant source files. After all the source
is the best documentation for this sort of thing.
> > Are the routines in libpq/pqformat.c intended> > to be used by client applications to read/write the binary COPY
files?>They are not designed to be used outside the backend environment,> although possibly some enterprising person
couldadapt them.  I am not> sure there's any value in it though.  Copying the backend code helps> only if what you want
toget out of the transmission is the same as the> backend's internal format, which for anything more complex than>
int/float/textseems a bit dubious.
 

I think there is a lot of use for a binary COPY file API within libpq
- routines to open a file, write/read a header and write/read common
datatypes. This would remove the need for most people using the binary
version of COPY to even know the file format. This would also isolate
people who use this API from any future changes.

Would libpq or contrib be the best place for this? Would you agree
this is a good idea for 7.4? I've already got something along these
lines:
extern FILE *lofsdb_Bulk_Open(char **filename);extern void  lofsdb_Bulk_Close(FILE *f, char *filename);extern void
lofsdb_Bulk_Write_NCols(FILE*f, short ncols);extern void  lofsdb_Bulk_Write(FILE *f, void *data, size_t sz, size_t
count,short ind);extern void  lofsdb_Bulk_WriteText(FILE *f, char *data, short ind);extern void
lofsdb_Bulk_WriteBytea(FILE*f, char *data, size_t len, short ind);extern void  lofsdb_Bulk_WriteTime(FILE *f, double t,
shortind);extern void  lofsdb_Bulk_WriteTimeNow(FILE *f);
 

which could form the basis of a contrib module to handle writing out
7.1 through to 7.4 format files. Naturally lofsdb_Bulk_Write needs to
go and be replaced by specific functions.
> > Well as pointed out in my earlier message nothing has changed which> > requires the format to change - there is no
realreason it's now> > "PGCOPY" and the integer layout field has disappeared.> Given that the interpretation of the
fieldcontents has changed> drastically, I thought it better to make an obvious incompatible> change.  We could perhaps
havekept the skeleton the same, but to> what end?  An app trying to read or write the file as if it were> pre-7.4 data
wouldfail miserably anyway.
 

Yeah, but someone (actually you!) went to the effort of making the 7.1
format extensible and documenting it as such... It could have handled
the changes.
> > I am still willing to make a patch which does this (to aid those> > writing COPY format files) and to fully support
thereading of the old> > format tuples. However i'm not going to waste both our time if this> > patch is not going to
bepositively considered...> My vote will be to reject it because of the security problem.
 

In which case I think my time would be better spent looking at the API
described above.
> > I can't think of much use of byte swapping when 99% of the> > use of COPY BINARY FROM is to improve performance
overusing> > INSERT. Both the reader and writer will be using the same binary> > integer/float/etc formats!> You must
thinkthat the universe consists exclusively of Intel hardware.> In my view, standardizing on a machine-independent
binaryformat will> greatly *expand* the usefulness of COPY BINARY, since the files will not> be tied to a single
architecture.

Well my testing (or lack of) of the earlier patch would seem to
indicate it was done on non-Intel box (Solaris)! I've got access here
to Solaris (2.5 through to 9), AIX (4.1 to 4.3.3), HPUX (9, 10, 11)
and of course Linux flavours - our apps run on these UNIX versions. So
i'm well aware of binary format issues (for fun look into the SEG-D
and SEG-Y formats used within the seismic industry).

However, is COPY BINARY meant/designed to be used as transfer or
backup mechanism? I have trouble coming up with many uses where a
binary file generated on one server would be loaded into another
server running on a different architecture.

Regards, Lee.


pgsql-hackers by date:

Previous
From: "Shridhar Daithankar"
Date:
Subject: Re: "truncate all"?
Next
From: Tom Lane
Date:
Subject: Re: [ADMIN] concat_ws