Home > mailing lists

Re: Re: COPY BINARY file format proposal - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Re: COPY BINARY file format proposal
Date	December 12, 2000 11:13:16
Msg-id	10336.976334295@sss.pgh.pa.us Whole thread Raw
In response to	Re: Re: COPY BINARY file format proposal (Philip Warner <pjw@rhyme.com.au>)
List	pgsql-hackers

Tree view

Philip Warner <pjw@rhyme.com.au> writes:
> More a matter of not thinking it was important enough to worry about, and
> not really wanting to drag the MD5/MD4/CRC64/etc debate into this one.

I'd just as soon not drag that debate in here either ;-) ... but once we
settle on an appropriate CRC method for WAL it's easy enough to call the
same routine for this code.

> Sounds good to me. I'm not sure you need it on a per-tuple basis - but it
> can't hurt, assuming it's cheap to generate. Does the backend send tuples
> or blocks of tuples? If the latter, and if CRC is expensive, then maybe 1
> CRC for each group of tuples.

Extending the CRC over multiple tuples would just complicate life,
I think.  The per-byte cost is the biggest factor, so you don't really
save all that much.

>> Next 4 bytes: length of remainder of header, not including self.  In
>> the initial version this will be zero, and the first tuple follows
>> immediately.  Future changes to the format might allow additional data
>> to be present in the header.  A reader should silently ignore any header
>> extension data it does not know what to do with.

> Don't you need to at least define how to specify non-essential chunks,
> since the flags are not to be used to describe the header extensions. Or
> are we going to make the initial version barf when it encounters any header
> extension?

No, the initial version will just silently skip the whole header
extension; it's defined so that that's a legal behavior (everything
in the header extension is inessential).  We can come back and define
a format for the entries in the header extension area when we need some.

> Another option would be to:
> - dump the field sizes in the header somewhere (they will all be the same), 
> - for each row output a bitmap of non-null fields, followed by the data.
> - varlena would have a -1 length in the header, an an int32 length in the row.

That would work if you are willing to assume that all the tuples indeed
always have the same set of fields --- you're not, for example, doing an
inheritance-tree-walk "COPY FROM foo*".  But Chris Bitmead still has a
gleam in his eye about that sort of thing, so we might want it someday.
I think it's worth a small amount of extra space to avoid that
assumption, especially since it simplifies the code too.
        regards, tom lane

pgsql-hackers by date:

From: Philip Warner
Date: 12 December 2000, 11:12:05
Subject: Re: Re: COPY BINARY file format proposal

From: "Jonathan Ellis"
Date: 12 December 2000, 11:17:58
Subject: Re: [GENERAL] Oracle-compatible lpad/rpad behavior

Re: Re: COPY BINARY file format proposal - Mailing list pgsql-hackers

Previous

Next