Re: Column storage positions - Mailing list pgsql-hackers

From Gregory Stark
Subject Re: Column storage positions
Date
Msg-id 871wkjnk5b.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: Column storage positions  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
"Andrew Dunstan" <andrew@dunslane.net> writes:

> Gregory Stark wrote:
>> "Andrew Dunstan" <andrew@dunslane.net> writes:
>>
>>   
>>> I would want to see this very carefully instrumented. Assuming we are putting
>>> all fixed size objects at the front, which seems like the best arrangement,
>>> then the position of every fixed field and the fixed portion of the position of
>>> every varlena field can be precalculated (and in the case of the leftmost
>>> varlena field that's it's complete position).     
>>
>> I'm not sure what you mean by "the fixed portion of the position of every
>> varlena field". Fields are just stuck one after the other (plus alignment)
>> skipping nulls. So any field after a null or varlena field can't have its
>> position cached at all.
>
> I'd forgotten about nulls :-( . Nevertheless, it's hard to think of a case
> where the penalty for shifting fixed size fields to the front is going to be
> very big. If we really wanted to optimise for speed for some varlena case, we'd
> probably need to keep stats on usage patterns, but that seems like massive
> overkill.

Oh, certainly, especially since only one varlena could ever be cached and soon
even that one won't be unless it's the very first column in the table. So
really, not worth thinking about.

Well the statistics we have do include the percentage of nulls in each column,
so we can sort columns by "fixed width not null" first, then "fixed width
nullable" by decreasing probability of being null, then varlenas.

But there's a tradeoff here. The more we try to optimize for cacheable offsets
the more difficult it will be to pack away the alignments.

Consider something like:

int        not null
boolean        not null
int        null
text        null

If we want we can pack this as int,int,boolean,text and (as long as the text
gets a 1-byte header) have them packed with no alignment.

But then the boolean can't use the cache whenever the int column is null. (the
offset will still be cached but it won't be used unless the int column is
non-null).

Alternatively we can pack this as int,boolean,int,text in which case the
boolean will *always* use the cache but it will be preceded by three wasted
padding bytes.

I tend to think the padding is more important than the caching because in
large systems the i/o speed dominates. But that doesn't mean the cpu cost is
negligible either. Especially on very wide tables.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Column storage positions
Next
From: "Henry B. Hotz"
Date:
Subject: Re: log ssl mode with connections?