Martijn van Oosterhout wrote:
> On Fri, Sep 15, 2006 at 10:01:19AM +0100, Heikki Linnakangas wrote:
>> Actually, you can determine the length of a UTF-8 encoded character by
>> looking at the most significant bits of the first byte. So we could
>> store a UTF-8 encoded CHAR(1) field without any additional length header.
>
> Except in postgres the length of a datum is currently only determined
> from the type, or from a standard varlena header. Going down the road
> of having to call type specific length functions for the values in
> columns 1 to n-1 just to read column n seems like a really bad idea.
>
> We want to make access to later columns *faster* not slower, which
> means keeping to the simplest (code-wise) scheme possible.
We really have two goals. We want to reduce on-disk storage size to save
I/O, and we want to keep processing simple to save CPU. Some ideas help
one goal but hurt the other so we have to strike a balance between the two.
My gut feeling is that it wouldn't be that bad compared to what we have
now or the new proposed varlena scheme, but before someone actually
tries it and shows some numbers, this is just hand-waving.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com