Re: Reducing data type space usage - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: Reducing data type space usage
Date
Msg-id 1158579943.3147.5.camel@localhost.localdomain
Whole thread Raw
In response to Re: Reducing data type space usage  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Reducing data type space usage
List pgsql-hackers
Ühel kenal päeval, R, 2006-09-15 kell 19:34, kirjutas Tom Lane:
> Bruce Momjian <bruce@momjian.us> writes:
> > Oh, OK, I had high byte meaning no header, but clear is better, so
> > 00000001 is 0x01, and 00000000 is "".  But I see now that bytea does
> > store nulls, so yea, we would be better using 10000001, and it is the
> > same size as 00000000.
> 
> I'm liking this idea more the more I think about it, because it'd
> actually be far less painful to put into the system structure than the
> other idea of fooling with varlena headers.  To review: Bruce is
> proposing a var-length type structure with the properties
> 
>     first byte 0xxxxxxx  ---- field length 1 byte, exactly that value
>     first byte 1xxxxxxx  ---- xxxxxxx data bytes follow

would adding this -
   first byte 0xxxxxxx  ---- field length 1 byte, exactly that value   first byte 10xxxxxx  ---- 0xxxxxx data bytes
follow  first byte 110xxxxx  -- xxxxx xxxxxxxx data bytes to follow   first byte 111xxxxx  -- xxxxx xxxxxxxx xxxxxxxx
xxxxxxxxbytes t.flw
 

be too expensive ?

it seems that for strings up to 63 bytes it would be as expensive as it is 
with your proposal, but that it would scale up to 536870912 (2^29) bytes nicely.

this would be extra good for datasets that are mostly below 63 (or 127) with only 
a small percentage above


> This can support *any* stored value from zero to 127 bytes long.
> We can imagine creating new datatypes "short varchar" and "short char",
> and then having the parser silently substitute these types for varchar(N)
> or char(N) whenever N <= 127 / max_encoding_length.  Add some
> appropriate implicit casts to convert these to the normal varlena types
> for computation, and away you go.  No breakage of any existing
> datatype-specific code, just a few additions in places like
> heap_form_tuple.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com

NOTICE: This communication contains privileged or other confidential
information. If you have received it in error, please advise the sender
by reply email and immediately delete the message and any attachments
without copying or disclosing the contents.



pgsql-hackers by date:

Previous
From: Joachim Wieland
Date:
Subject: Re: guc comment changes (was Re: Getting a move on for 8.2 beta)
Next
From: Peter Eisentraut
Date:
Subject: Re: Opinion about macro for the uuid datatype.