Re: Re: COPY BINARY file format proposal - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Re: COPY BINARY file format proposal
Date
Msg-id 9435.976323300@sss.pgh.pa.us
Whole thread Raw
In response to Re: Re: COPY BINARY file format proposal  (Philip Warner <pjw@rhyme.com.au>)
Responses Re: Re: COPY BINARY file format proposal  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
Philip Warner <pjw@rhyme.com.au> writes:
> How about a CRC? ;-P

I take it from the smiley that you're not serious, but actually it seems
like it might not be a bad idea.  I could see appending a CRC to each
tuple record.  Comments anyone?

You seemed to like the PNG philosophy of using feature flags rather than
a version number.  Accordingly, I propose dropping the version number
field in favor of a flags word.  (Which was needed anyway, because I had
*again* forgotten about COPY WITH OIDS :-(.)

Attached is the current state of the proposal.  I haven't added a CRC
field but am willing to do so if that's the consensus.
        regards, tom lane


COPY BINARY file format proposal

The objectives of this change are:

1. Get rid of the tuple count at the front of the file.  This requires
an extra pass over the relation, which is a lot more trouble than the
count is worth.  Use an explicit EOF marker instead.
2. Send fields of a tuple individually, instead of dumping out raw tuples
(complete with alignment padding and so forth) as is currently done.
This is mainly to simplify TOAST-related processing.
3. Make the format somewhat self-identifying, so that the reader has at
least some chance of detecting it when the data doesn't match the table
it's supposed to be loaded into.

The proposed format consists of a file header, zero or more tuples, and a
file trailer.


File Header
-----------

The proposed file header consists of 24 bytes of fixed fields, followed
by a variable-length header extension area.

Signature: 12-byte sequence "PGBCOPY\n\377\r\n\0" --- note that the null
is a required part of the signature.  (The signature is designed to allow
easy identification of files that have been munged by a non-8-bit-clean
transfer.  The proposed signature will be changed by newline-translation
filters, dropped nulls, dropped high bits, or parity changes.)

Integer layout field: int32 constant 0x01020304 in source's byte order.
Potentially, a reader could engage in byte-flipping of subsequent fields
if the wrong byte order is detected here.

Flags field: a 4-byte bit mask to denote important aspects of the file
format.  Bits are numbered from 0 (LSB) to 31 (MSB) --- note that this
field is stored with source's endianness, as are all subsequent integer
fields.  Bits 16-31 are reserved to denote critical file format issues;
a reader should abort if it finds an unexpected bit set in this range.
Bits 0-15 are reserved to signal backwards-compatible format issues;
a reader should simply ignore any unexpected bits set in this range.
Currently only one flag bit is defined, and the rest must be zero:Bit 16:    if 1, OIDs are included in the dump; if 0,
not

Next 4 bytes: length of remainder of header, not including self.  In
the initial version this will be zero, and the first tuple follows
immediately.  Future changes to the format might allow additional data
to be present in the header.  A reader should silently ignore any header
extension data it does not know what to do with.

Note that I envision the content of the header extension area as being a
sequence of self-identifying chunks (but the specific design of same is
postponed until we need 'em).  The flags field is not intended to tell
readers what is in the extension area.

This design allows for both backwards-compatible header additions (add
header extension chunks, or set low-order flag bits) and non-backwards-
compatible changes (set high-order flag bits to signal such changes,
and add supporting data to the extension area if needed).


Tuples
------

Each tuple begins with an int16 count of the number of fields in the
tuple.  (Presently, all tuples in a table will have the same count, but
that might not always be true.)  Then, repeated for each field in the
tuple, there is an int16 typlen word possibly followed by field data.
The typlen field is interpreted thus:
Zero        Field is NULL.  No data follows.
> 0        Field is a fixed-length datatype.  Exactly N        bytes of data follow the typlen word.
-1        Field is a varlena datatype.  The next four        bytes are the varlena header, which contains        the
totalvalue length including itself.
 
< -1        Reserved for future use.

For non-NULL fields, the reader can check that the typlen matches the
expected typlen for the destination column.  This provides a simple
but very useful check that the data is as expected.

There is no alignment padding or any other extra data between fields.
Note also that the format does not distinguish whether a datatype is
pass-by-reference or pass-by-value.  Both of these provisions are
deliberate: they might help improve portability of the files (although
of course endianness and floating-point-format issues can still keep
you from moving a binary file across machines).

If OIDs are included in the dump, the OID field immediately follows the
field-count word.  It is a normal field except that it's not included
in the field-count.  In particular it has a typlen --- this will allow
handling of 4-byte vs 8-byte OIDs without too much pain, and will allow
OIDs to be shown as NULL if we someday allow OIDs to be optional.


File Trailer
------------

The file trailer consists of an int16 word containing -1.  This is
easily distinguished from a tuple's field-count word.

A reader should report an error if a field-count word is neither -1
nor the expected number of columns.  This provides a pretty strong
check against somehow getting out of sync with the data.


pgsql-hackers by date:

Previous
From: "Jonathan Ellis"
Date:
Subject: Re: [GENERAL] Oracle-compatible lpad/rpad behavior
Next
From: Horst Herb
Date:
Subject: Fwd: Re: HELP! foreign eys & inheritance