Re: Fixed length data types issue - Mailing list pgsql-hackers

From Martijn van Oosterhout
Subject Re: Fixed length data types issue
Date
Msg-id 20060908140011.GG5479@svana.org
Whole thread Raw
In response to Re: Fixed length data types issue  (mark@mark.mielke.cc)
Responses Re: Fixed length data types issue  (Gregory Stark <stark@enterprisedb.com>)
Re: Fixed length data types issue  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Fri, Sep 08, 2006 at 09:28:21AM -0400, mark@mark.mielke.cc wrote:
> > But that won't help in the example you posted upthread, because char(N)
> > is not fixed-length.
>
> It can be fixed-length, or at least, have an upper bound. If marked
> up to contain only ascii characters, it doesn't, at least in theory,
> and even if it is unicode, it's not going to need more than 4 bytes
> per character. char(2) through char(16) only require 4 bits to
> store the length header, leaving 4 bits for encoding information.
> bytea(2) through bytea(16), at least in theory, should require none.

If your talking about an upper-bound, then it's not fixed length
anymore, and you need to expend bytes storing the length. ASCII bytes
only take one byte in most encodings, include UTF8.

Doodling this morning I remember why the simple approach didn't work.
If you look at the varlena header, 2 bits are reserved. Say you take
one bit to indicate "short header". Then lengths 0-31 bytes can be
represented with a one byte header, yay!

However, now you only have enough bits leftover to store 29 bits for
the length, so we've just cut the maximum datum size from 1GB to 512MB.
Is that a fair trade? Probably not, so you'd need a more sophisticated
scheme.

> For my own uses, I would like for bytea(16) to have no length header.
> The length is constant. UUID or MD5SUM. Store the length at the head
> of the table, or look up the information from the schema.

I'm still missing the argument of why you can't just make a 16-byte
type. Around half the datatypes in postgresql are fixed-length and have
no header. I'm completely confused about why people are hung up about
bytea(16) not being fixed length when it's trivial to create a type
that is.

> I see the complexity argument. Existing code is too heavy to change
> completely. People talking about compromises such as allowing the
> on disk layout to be different from the in memory layout.

The biggest cost of having differing memory and disk layouts is that
you have to "unpack" each disk page as it's read it. This means an
automatic doubling of memory usage for the buffer cache. If you're RAM
limited, that's the last thing you want.

Currently, the executor will use the contents of the actual disk page
when possible, saving a lot of copying.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

pgsql-hackers by date:

Previous
From: Praveen Kumar N
Date:
Subject: Re: postgresql shared buffers
Next
From: Heikki Linnakangas
Date:
Subject: Re: postgresql shared buffers