On Feb 21, 2006, at 3:45 , Simon Riggs wrote:
> On Sun, 2006-02-19 at 21:40 -0500, Tom Lane wrote:
>> After applying Simon's recent sort patch, I was doing some
>> profiling and
>> noticed that sorting spends an unreasonably large fraction of its
>> time
>> extracting datums from tuples (heap_getattr or index_getattr). The
>> attached patch does something about this by pulling out the
>> leading sort
>> column of a tuple when it is received by the sort code or re-read
>> from a
>> "tape".
<snip />
>> The choice to pull out just the leading column, rather than all
>> columns,
>> is driven by concerns of (a) code complexity and (b) memory space.
>> Having the extra columns pre-extracted wouldn't buy anything anyway
>> in the common case where the leading key determines the result of
>> a comparison.
<snip />
> I agree that as long as we are swamped by the cost of heapgetattr,
> then
> it does seem likely that first-key extraction (and keeping it with the
> tuple itself) will be a win in most cases over full-key extraction.
Most of this is way above my head, but I'm trying to follow along:
when you say first key and full key, are these related to relation
keys (e.g., primary key) or attributes that are used in sorting
(regardless of whether they're a key or not)? I notice Tom used the
term "leading [sort] column", which I read to mean the first
attribute used to sort the relation (for whichever purpose, e.g.,
mergejoins, order-by clauses). I'll see if I can't find the Nyberg
paper as well to learn a bit more. (I haven't been sleeping well
recently.)
Michael Glaesemann
grzm myrealbox com