Re: Fixed length data types issue - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Fixed length data types issue
Date
Msg-id 1157992103.2692.392.camel@holly
Whole thread Raw
In response to Re: Fixed length data types issue  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Fixed length data types issue  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, 2006-09-10 at 21:16 -0400, Tom Lane wrote:

> After further thought I have an alternate proposal 
(snip)

> * If high order bit of datum's first byte is 0, then it's an
> uncompressed datum in what's essentially the same as our current
> in-memory format except that the 4-byte length word must be big-endian
> (to ensure that the leading bit can be kept zero).  In particular this
> format will be aligned on 4- or 8-byte boundary as called for by the
> datatype definition.
> 
> * If high order bit of first byte is 1, then it's some compressed
> variant.  I'd propose divvying up the code space like this:
> 
>     * 0xxxxxxx  uncompressed 4-byte length word as stated above
>     * 10xxxxxx  1-byte length word, up to 62 bytes of data
>     * 110xxxxx  2-byte length word, uncompressed inline data
>     * 1110xxxx  2-byte length word, compressed inline data
>     * 1111xxxx  1-byte length word, out-of-line TOAST pointer
> 
> This limits us to 8K uncompressed or 4K compressed inline data without
> toasting, which is slightly annoying but probably still an insignificant
> limitation.  It also means more distinct cases for the heap_deform_tuple
> inner loop to think about, which might be a problem.
> 
> Since the compressed forms would not be aligned to any boundary,
> there's an important special case here: how can heap_deform_tuple tell
> whether the next field is compressed or not?  The answer is that we'll
> have to require pad bytes between fields to be zero.  (They already are
> zeroed by heap_form_tuple, but now it'd be a requirement.)  So the
> algorithm for decoding a non-null field is:
> 
>     * if looking at a byte with high bit 0, then we are either
>     on the start of an uncompressed field, or on a pad byte before
>     such a field.  Advance to the declared alignment boundary for
>     the datatype, read a 4-byte length word, and proceed.
> 
>     * if looking at a byte with high bit 1, then we are at the
>     start of a compressed field (which will never have any preceding
>     pad bytes).  Decode length as per rules above.
> 
> The good thing about this approach is that it requires zero changes to
> fundamental system structure.  The pack/unpack rules in heap_form_tuple
> and heap_deform_tuple change a bit, and the mechanics of
> PG_DETOAST_DATUM change, but a Datum is still just a pointer and you
> can always tell what you've got by examining the pointed-to data.

Seems like a great approach to this pain point.

More fun than lots of new datatypes also.

Is this an 8.2 thing? If not, is Numeric508 applied?

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Mark Wong
Date:
Subject: Re: Lock partitions
Next
From: Alvaro Herrera
Date:
Subject: Re: Fixed length data types issue