Re: Fixed length data types issue - Mailing list pgsql-hackers

From Gregory Stark
Subject Re: Fixed length data types issue
Date
Msg-id 87fyeyyb3c.fsf@enterprisedb.com
Whole thread Raw
In response to Re: Fixed length data types issue  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Fixed length data types issue  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Fixed length data types issue  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <gsstark@mit.edu> writes:
>> I'm a bit confused by this and how it would be handled in your sketch. I
>> assumed we needed a bit pattern dedicated to 4-byte length headers because
>> even though it would never occur on disk it would be necessary to for the
>> uncompressed and/or detoasted data.
>
>> In your scheme what would PG_GETARG_TEXT() give you if the data was detoasted
>> to larger than 16k?
>
> I'm imagining that it would give you the same old uncompressed in-memory
> representation as it does now, ie, 4-byte length word and uncompressed
> data.

Sure, but how would you know? Sometimes you would get a pointer to a varlena
starting with a bytes with a leading 00 indicating a 1-byte varlena header and
sometimes you would get a pointer to a varlena with the old uncompressed
representation with a 4-byte length header which may well start with a 00.

> * If high order bit of first byte is 1, then it's some compressed
> variant.  I'd propose divvying up the code space like this:
>
>     * 0xxxxxxx  uncompressed 4-byte length word as stated above
>     * 10xxxxxx  1-byte length word, up to 62 bytes of data
>     * 110xxxxx  2-byte length word, uncompressed inline data
>     * 1110xxxx  2-byte length word, compressed inline data
>     * 1111xxxx  1-byte length word, out-of-line TOAST pointer

I'm unclear how you're using the remaining bits. Are you saying you would have
a 4-byte length word following this bit-flag byte? Or are you saying we would
use 31 bits for the 4-byte length word, 13 bits for the 2-byte uncompressed
length word and 12 bits for the compressed length word?

Also Heikki points out here that it would be nice to allow for the case for a
0-byte header. So for example if the leading bit is 0 then the remaining 7
bits are available for the datum itself. This would actually vacate much of my
argument for a fixed length char(n) data type. The most frequent use case is
for things like CHAR(1) fields containg 'Y' or 'N'.

In any case it seems a bit backwards to me. Wouldn't it be better to preserve
bits in the case of short length words where they're precious rather than long
ones? If we make 0xxxxxxx the 1-byte case it means limiting our maximum datum
size to something like .5G but if you're working with .5G data wouldn't you be
using an api that lets you access it by chunks anyways?


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: "Adrian Maier"
Date:
Subject: Re: Cassowary failing to report the results back to the farm
Next
From: "Albe Laurenz"
Date:
Subject: Re: [PATCHES] Fix linking of OpenLDAP libraries