Re: Variable length varlena headers redux - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Variable length varlena headers redux
Date
Msg-id 878xf7i3fc.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: Variable length varlena headers redux  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Variable length varlena headers redux
List pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > Bruce Momjian <bruce@momjian.us> writes:
> >> I know it is kind of odd to have a data type that is only used on disk,
> >> and not in memory, but I see this as a baby varlena type, used only to
> >> store and get varlena values using less disk space.
> 
> > I was leaning toward generating the short varlena headers primarily in
> > heap_form*tuple and just having the datatype specific code generate 4-byte
> > headers much as you describe.
> 
> I thought we had a solution for all this, namely to make the short-form
> headers be essentially a TOAST-compressed representation.  The format
> with 4-byte headers is still legal but just not compressed.  Anyone who
> fails to detoast an input argument is already broken, so there's no code
> compatibility hit taken.

Uh. So I don't see how to make this work on a little-endian machine. If the
leading its are 0 we don't know if they're toast flags or bits on the least
significant byte of a longer length.

If we store all lengths in network byte order that problem goes away but then
user code that does "VARATT_SIZEP(datum) = len" is incorrect.

If we declare in-memory format to be host byte order and on-disk format to be
network byte order then every single varlena datum needs to be copied when
heap_deform*tuple runs.

If we only do this for a new kind of varlena then only text/varchar/
char/numeric datums would need to be copied but that's still a lot.

-- 
greg



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: HOT for PostgreSQL 8.3
Next
From: Hannu Krosing
Date:
Subject: Re: HOT for PostgreSQL 8.3