Re: Variable length varlena headers redux - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Variable length varlena headers redux
Date
Msg-id 45D1CA5C.3010506@enterprisedb.com
Whole thread Raw
In response to Re: Variable length varlena headers redux  (Gregory Stark <stark@enterprisedb.com>)
Responses Re: Variable length varlena headers redux  (Gregory Stark <stark@enterprisedb.com>)
Re: Variable length varlena headers redux  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
Gregory Stark wrote:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> For example it'd be easy to implement the previously-discussed design
>> involving storing uncompressed length words in network byte order:
>> SET_VARLENA_LEN does htonl() and VARSIZE does ntohl() and nothing else in
>> the per-datatype functions needs to change. Another idea that we were
>> kicking around is to make an explicit distinction between little-endian and
>> big-endian hardware: on big-endian hardware, store the two TOAST flag bits
>> in the MSBs as now, but on little-endian, store them in the LSBs, shifting
>> the length value up two bits. This would probably be marginally faster than
>> htonl/ntohl depending on hardware and compiler intelligence, but either way
>> you get to guarantee that the flag bits are in the physically first byte,
>> which is the critical thing needed to be able to tell the difference between
>> compressed and uncompressed length values.
> 
> Actually I think neither htonl nor bitshifting the entire 4-byte word is going
> to really work here. Both will require 4-byte alignment. Instead I think we
> have to access the length byte by byte as a (char*) and do arithmetic. Since
> it's the pointer being passed to VARSIZE that isn't too hard, but it might
> perform poorly.

We would still require all datums with a 4-byte header to be 4-byte 
aligned, right? When reading, you would first check if it's a compressed 
or uncompressed header. If compressed, read the 1 byte header, if 
uncompressed, read the 4-byte header and do htonl or bitshifting. No 
need to do htonl or bitshifting on unaligned datums.

>> The important point here is that VARSIZE() still works, so only code that
>> creates a new varlena value need be affected, not code that examines one.
> 
> So what would VARSIZE() return, the size of the payload plus VARHDRSZ
> regardless of what actual size the header was? That seems like it would break
> the least existing code though removing all the VARHDRSZ offsets seems like it
> would be cleaner.

My vote would be to change every caller. Though there's a lot of 
callers, it's a very simple change.

To make it posible to compile an external module against 8.2 and 8.3, 
you could have a simple ifdef block to map the new macro to old 
behavior. Or we could backport the macro definitions as Magnus suggested.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Variable length varlena headers redux
Next
From: Tom Lane
Date:
Subject: Re: HOT for PostgreSQL 8.3