Home > mailing lists

Re: Reducing data type space usage - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Reducing data type space usage
Date	September 16, 2006 22:19:01
Msg-id	450C784C.8040001@enterprisedb.com Whole thread Raw
In response to	Re: Reducing data type space usage (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

Tom Lane wrote:
> Gregory Stark <stark@enterprisedb.com> writes:
>> The user would have to decide that he'll never need a value over 127 
>> bytes
>> long ever in order to get the benefit.
>
> Weren't you the one that's been going on at great length about how
> wastefully we store CHAR(1) ? Sure, this has a somewhat restricted
> use case, but it's about as efficient as we could possibly get within
> that use case.

I like the idea of having variable length headers much more than a new 
short character type. It solves a more general problem, and it 
compresses VARCHAR(>255) TEXT fields nicely when the actual data in the 
field is small.

I'd like to propose one more encoding scheme, based on on Tom's earlier 
proposals. The use cases I care about are:

* support uncompressed data up to 1G, like we do now
* 1 byte length word for short data.
* store typical CHAR(1) values in just 1 byte.

Tom wrote:> * 0xxxxxxx uncompressed 4-byte length word as stated above> * 10xxxxxx 1-byte length word, up to 62 bytes
ofdata> * 110xxxxx 2-byte length word, uncompressed inline data> * 1110xxxx 2-byte length word, compressed inline data>
*1111xxxx 1-byte length word, out-of-line TOAST pointer

My proposal is:

00xxxxxx uncompressed, aligned 4-byte length word
010xxxxx 1-byte length word, uncompressed inline data (up to 32 bytes)
011xxxxx 2-byte length word, uncompressed inline data (up to 8k)
1xxxxxxx 1 byte data in range 0x20-0x7E
1000xxxx 2-byte length word, compressed inline data (up to 4k)
11111111 TOAST pointer

The decoding algorithm is similar to Tom's proposal, and relies on using 
0x00 for padding.

-- 
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

pgsql-hackers by date:

From: Bruce Momjian
Date: 16 September 2006, 21:53:08
Subject: Re: [PATCHES] plpgsql, return can contains any

From: Gregory Stark
Date: 16 September 2006, 22:59:04
Subject: Re: Reducing data type space usage

Re: Reducing data type space usage - Mailing list pgsql-hackers

Previous

Next