Re: Fixed length data types issue - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Fixed length data types issue |
Date | |
Msg-id | 1157992103.2692.392.camel@holly Whole thread Raw |
In response to | Re: Fixed length data types issue (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Fixed length data types issue
|
List | pgsql-hackers |
On Sun, 2006-09-10 at 21:16 -0400, Tom Lane wrote: > After further thought I have an alternate proposal (snip) > * If high order bit of datum's first byte is 0, then it's an > uncompressed datum in what's essentially the same as our current > in-memory format except that the 4-byte length word must be big-endian > (to ensure that the leading bit can be kept zero). In particular this > format will be aligned on 4- or 8-byte boundary as called for by the > datatype definition. > > * If high order bit of first byte is 1, then it's some compressed > variant. I'd propose divvying up the code space like this: > > * 0xxxxxxx uncompressed 4-byte length word as stated above > * 10xxxxxx 1-byte length word, up to 62 bytes of data > * 110xxxxx 2-byte length word, uncompressed inline data > * 1110xxxx 2-byte length word, compressed inline data > * 1111xxxx 1-byte length word, out-of-line TOAST pointer > > This limits us to 8K uncompressed or 4K compressed inline data without > toasting, which is slightly annoying but probably still an insignificant > limitation. It also means more distinct cases for the heap_deform_tuple > inner loop to think about, which might be a problem. > > Since the compressed forms would not be aligned to any boundary, > there's an important special case here: how can heap_deform_tuple tell > whether the next field is compressed or not? The answer is that we'll > have to require pad bytes between fields to be zero. (They already are > zeroed by heap_form_tuple, but now it'd be a requirement.) So the > algorithm for decoding a non-null field is: > > * if looking at a byte with high bit 0, then we are either > on the start of an uncompressed field, or on a pad byte before > such a field. Advance to the declared alignment boundary for > the datatype, read a 4-byte length word, and proceed. > > * if looking at a byte with high bit 1, then we are at the > start of a compressed field (which will never have any preceding > pad bytes). Decode length as per rules above. > > The good thing about this approach is that it requires zero changes to > fundamental system structure. The pack/unpack rules in heap_form_tuple > and heap_deform_tuple change a bit, and the mechanics of > PG_DETOAST_DATUM change, but a Datum is still just a pointer and you > can always tell what you've got by examining the pointed-to data. Seems like a great approach to this pain point. More fun than lots of new datatypes also. Is this an 8.2 thing? If not, is Numeric508 applied? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: