COPY BINARY file format proposal - Mailing list pgsql-hackers

From Tom Lane
Subject COPY BINARY file format proposal
Date
Msg-id 12674.976134397@sss.pgh.pa.us
Whole thread Raw
Responses Re: COPY BINARY file format proposal  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Well, no one seemed very unhappy at the idea of changing the file format
for binary COPY, so here is a proposal.

The objectives of this change are:

1. Get rid of the tuple count at the front of the file.  This requires
an extra pass over the relation, which is a lot more trouble than the
count is worth.  Use an explicit EOF marker instead.
2. Send fields of a tuple individually, instead of dumping out raw tuples
(complete with alignment padding and so forth) as is currently done.
This is mainly to simplify TOAST-related processing.
3. Make the format somewhat self-identifying, so that the reader has at
least some chance of detecting it when the data doesn't match the table
it's supposed to be loaded into.

The proposed format consists of a file header, zero or more tuples, and a
file trailer.

The file header will just be a 32-bit magic number; it's present so that a
reader can reject non-COPY-binary input data, as well as detect problems
like incompatible endianness.  (We could also use changes in the magic
number as a flag for future format changes.)

Each tuple begins with an int16 count of the number of fields in the
tuple.  (Presently, all tuples in a table will have the same count, but
that might not always be true.)  Then, repeated for each field in the
tuple, there is an int16 typlen word possibly followed by field data.
The typlen field is interpreted thus:
Zero        Field is NULL.  No data follows.
> 0        Field is a fixed-length datatype.  Exactly N        bytes of data follow the typlen word.
-1        Field is a varlena datatype.  The next four        bytes are the varlena header, which contains        the
totalvalue length including itself.
 
< -1        Reserved for future use.

For non-NULL fields, the reader can check that the typlen matches the
expected typlen for the destination column.  This provides a simple
but very useful check that the data is as expected.

There is no alignment padding or any other extra data between fields.
Note also that the format does not distinguish whether a datatype is
pass-by-reference or pass-by-value.  Both of these provisions are
deliberate: they might help improve portability of the files (although
of course endianness and floating-point-format issues can still keep
you from moving a binary file across machines).

The file trailer consists of an int16 word containing -1.  This is
easily distinguished from a tuple's field-count word.

A reader should report an error if a field-count word is neither -1
nor the expected number of columns.  This provides a pretty strong
check against somehow getting out of sync with the data.

Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Mikheev, Vadim"
Date:
Subject: RE: Logging for sequences
Next
From: ncm@zembu.com (Nathan Myers)
Date:
Subject: CRCs (was: beta testing version)