Re: Fixed length data types issue - Mailing list pgsql-hackers

From mark@mark.mielke.cc
Subject Re: Fixed length data types issue
Date
Msg-id 20060908132821.GA24823@mark.mielke.cc
Whole thread Raw
In response to Re: Fixed length data types issue  (Peter Eisentraut <peter_e@gmx.net>)
Responses Re: Fixed length data types issue  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
On Fri, Sep 08, 2006 at 08:57:12AM +0200, Peter Eisentraut wrote:
> Gregory Stark wrote:
> > I think we have to find a way to remove the varlena length header
> > entirely for fixed length data types since it's going to be the same
> > for every single record in the table.
> But that won't help in the example you posted upthread, because char(N) 
> is not fixed-length.

It can be fixed-length, or at least, have an upper bound. If marked
up to contain only ascii characters, it doesn't, at least in theory,
and even if it is unicode, it's not going to need more than 4 bytes
per character. char(2) through char(16) only require 4 bits to
store the length header, leaving 4 bits for encoding information.
bytea(2) through bytea(16), at least in theory, should require none.

For my own uses, I would like for bytea(16) to have no length header.
The length is constant. UUID or MD5SUM. Store the length at the head
of the table, or look up the information from the schema.

I see the complexity argument. Existing code is too heavy to change
completely. People talking about compromises such as allowing the
on disk layout to be different from the in memory layout. I wonder
whether the change could be small enough to not significantly
increase CPU, while still having significant effect. I find myself
doubting the CPU bound numbers. If even 20% data is saved, this
means 20% more RAM for caching, 20% less pages touched when
scanning, and 20% less RAM read. When people say CPU-bound, are we
sure they do not mean RAM speed bound? How do they tell the
difference between the two? RAM lookups count as CPU on most
performance counters I've ever used. RAM speed is also slower than
CPU speed, allowing for calculations between accesses assuming
that the loop allows for prefetching to be possible and accurate.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



pgsql-hackers by date:

Previous
From: Praveen Kumar N
Date:
Subject: postgresql shared buffers
Next
From: Heikki Linnakangas
Date:
Subject: Re: postgresql shared buffers