Re: Fixed length data types issue - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Fixed length data types issue
Date
Msg-id 5604.1157932798@sss.pgh.pa.us
Whole thread Raw
In response to Re: Fixed length data types issue  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Fixed length data types issue  (Gregory Stark <gsstark@mit.edu>)
List pgsql-hackers
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> Either way, I think it would be interesting to consider
>> 
>> (a) length word either one or two bytes, not four.  You can't need more
>> than 2 bytes for a datum that fits in a disk page ...

> That is an interesting observation, though could compressed inline
> values exceed two bytes?

After expansion, perhaps, but it's the on-disk footprint that concerns
us here.

I thought a bit more about this and came up with a zeroth-order sketch:

The "length word" for an on-disk datum could be either 1 or 2 bytes;
in the 2-byte case we'd need to be prepared to fetch the bytes
separately to avoid alignment issues.  The high bits of the first byte
say what's up:

* First two bits 00: 2-byte length word, uncompressed inline data
follows.  This allows a maximum on-disk size of 16K for an uncompressed
datum, so we lose nothing at all for standard-size disk pages and not
much for 32K pages (remember the toaster will try to compress any tuple
exceeding 1/4 page anyway ... this just makes it mandatory).

* First two bits 01: 2-byte length word, compressed inline data
follows.  Again, hard limit of 16K, so if your data exceeds that you
have to push it out to the toast table.  Again, this policy costs zero
for standard size disk pages and not much for 32K pages.

* First two bits 10: 1-byte length word, zero to 62 bytes of
uncompressed inline data follows.  This is the case that wins for short
values.

* First two bits 11: 1-byte length word, pointer to out-of-line toast
data follows.  We may as well let the low 6 bits of the length word be
the size of the toast pointer, same as it works now.  Since the toast
pointer is not guaranteed aligned anymore, we'd have to memcpy it
somewhere before using it ... but compared to the other costs of
fetching a toast value, that's surely down in the noise.  The
distinction between compressed and uncompressed toast data would need to
be indicated in the body of the toast pointer, not in the length word as
today, but nobody outside of tuptoaster.c would care.

Notice that heap_deform_tuple only sees 2 cases here: high bit 0 means
2-byte length word, high bit 1 means 1-byte.  It doesn't care whether
the data is compressed or toasted, same as today.

There are other ways we could divvy up the bit assignments of course.
The main issue is keeping track of whether any given Datum is in this
compressed-for-disk format or in the uncompressed 4-byte-length-word
format.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Lock partitions
Next
From: Tom Lane
Date:
Subject: Re: contrib uninstall scripts need some love