Re: Column storage positions - Mailing list pgsql-hackers

From Florian G. Pflug
Subject Re: Column storage positions
Date
Msg-id 45DCC441.5000709@phlo.org
Whole thread Raw
In response to Re: Column storage positions  (Stephan Szabo <sszabo@megazone.bigpanda.com>)
Responses Re: Column storage positions  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
Stephan Szabo wrote:
> On Wed, 21 Feb 2007, Alvaro Herrera wrote:
>> Did I miss something in what you were trying to say?  I assume you must
>> already know this.
> 
> I think so. What I was mentioning was that I was pretty sure that there
> was a message with someone saying that they actually tried something that
> did this and that they found left-most varchar access was slightly slower
> after the reordering although general access was faster. I believe the
> table case was alternating smallint and varchar columns, but I don't know
> what was tested for the retrieval. If that turns out to be able to be
> supported by other tests, then for some access patterns, the rearranged
> version might be slower.

Here is the original quote:
----------------------------------------------------------------
The results were encouraging: on a table
with 20 columns of alternating smallint and varchar(10) datatypes,
selecting the max() of one of the rightmost int columns across 1
million rows ran around 3 times faster.  The same query on the
leftmost varchar column (which should suffer the most from this
change) predictably got a little slower (about 10%);
----------------------------------------------------------------

What the OP doesn't mention is how the exact layouts looked before
and after the reordering - maybe a nullable field fixed-length field
got moved before the varchar column in question, which would disable
offset caching I guess.

Let's say the reodering algorithm is changed to only move non-nullable
fixed-width columns to the left - can anyone see an access pattern that 
would run slower after the reodering? I certainly can't - because the set
of columns for which offset caching works after the reodering would
be a superset of the one for which it works before the reordering.

BTW, this is a good case for why the storage order should - directly or
indirectly - be tweakable. You can either optimize for space, and _then_
for speed - which is what the OP did I think - or first for speed, and then 
for space. If the dba cannot choose the strategy, there will always be 
workloads where the engine does it the wrong way around.

greetings, Florian Pflug


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Column storage positions
Next
From: "Andrew Hammond"
Date:
Subject: Re: Plan invalidation design