Re: Reducing data type space usage - Mailing list pgsql-hackers

From Gregory Stark
Subject Re: Reducing data type space usage
Date
Msg-id 871wqby49w.fsf@enterprisedb.com
Whole thread Raw
In response to Re: Reducing data type space usage  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Reducing data type space usage  (Bruce Momjian <bruce@momjian.us>)
Re: Reducing data type space usage  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
Bruce Momjian <bruce@momjian.us> writes:

> Tom Lane wrote:
>> Gregory Stark <stark@enterprisedb.com> writes:
>> > The user would have to decide that he'll never need a value over 127 bytes
>> > long ever in order to get the benefit.
>> 
>> Weren't you the one that's been going on at great length about how
>> wastefully we store CHAR(1) ?  Sure, this has a somewhat restricted
>> use case, but it's about as efficient as we could possibly get within
>> that use case.

Sure, this helps with CHAR(1) but there were plen

>
> To summarize what we are now considering:
>
> Originally, there was the idea of doing 1,2, and 4-byte headers.  The
> 2-byte case is probably not worth the extra complexity (saving 2 bytes
> on a 128-byte length isn't very useful).

Well don't forget we virtually *never* use more than 2 bytes out of the 4 byte
headers for on-disk data. The only way we ever store a datum larger than 16k
is you compile with 32k blocks *and* you explicitly disable toasting on the
column.

Worse, if we don't do anything about fields like text it's not true that this
only occurs on 128-byte columns and larger. It occurs on any column that
*could* contain 128 bytes or more. Ie, any column declared as varchar(128)
even if it contains only "Bruce" or any column declared as text or numeric.

I'm not sure myself whether the smallfoo data types are a bad idea in
themselves though. I just think it probably doesn't replace trying to shorten
the largefoo varlena headers as well.

Part of the reason I think the smallfoo data types may be a bright idea in
their own right is that the datatypes might be able to do clever things about
their internal storage. For instance, smallnumeric could use base 100 where
largenumeric uses base 10000.

> I am slightly worried about having short version of many of our types. 
> Not only char, varchar, and text, but also numeric.  I see these varlena
> types in the system:

I think only the following ones make sense for smallfoo types:

>      bpchar
>      varchar
>      bit
>      varbit
>      numeric

These don't currently take typmods so we'll never know when they could use a
smallfoo representation, it might be useful if they did though:

>      bytea
>      text
>      path
>      polygon


Why are these varlena? Just for ipv6 addresses? Is the network mask length not
stored if it's not present? This gives us a strange corner case in that ipv4
addresses will *always* fit in the smallfoo data type and ipv6 *never* fit.
Ie, we'll essentially end up with an ipv4inet and an ipv6inet. Sad in a way.

>      inet
>      cidr

I have to read up on what this is.

>      refcursor


> Are these shorter headers going to have the same alignment requirements
> as the 4-byte headers?  I am thinking not, meaning we will not have as
> much padding overhead we have now.

Well a 1-byte length header doesn't need any alignment so they would have only
the alignment that the data type itself declares. I'm not sure how interacts
with heap_deform_tuple but it's probably simpler than finding out only once
you parse the length header what alignment you need.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: [pgsql-www] Developer's Wiki
Next
From: Josh Berkus
Date:
Subject: Re: [pgsql-www] Developer's Wiki