Re: WIP: further sorting speedup - Mailing list pgsql-patches

From Tom Lane
Subject Re: WIP: further sorting speedup
Date
Msg-id 4229.1140499499@sss.pgh.pa.us
Whole thread Raw
In response to Re: WIP: further sorting speedup  (Michael Glaesemann <grzm@myrealbox.com>)
Responses Re: WIP: further sorting speedup  (Michael Glaesemann <grzm@myrealbox.com>)
List pgsql-patches
Michael Glaesemann <grzm@myrealbox.com> writes:
> Most of this is way above my head, but I'm trying to follow along:
> when you say first key and full key, are these related to relation
> keys (e.g., primary key) or attributes that are used in sorting
> (regardless of whether they're a key or not)? I notice Tom used the
> term "leading [sort] column", which I read to mean the first
> attribute used to sort the relation (for whichever purpose, e.g.,
> mergejoins, order-by clauses).

Right, it's whatever is the sort key for this particular sort.
You could have "SELECT ... ORDER BY foo,bar,baz", or you could
have construction of a multi-column btree index.  In either case,
the sort module is given a set of tuples and told to sort by
certain specified column(s) of those tuples.  What I saw in
profiling was that a large fraction of the CPU time was going
into heap_getattr (or index_getattr, for the index-tuple case)
calls to extract Datums for the sort columns.  The Datums are
then passed to the data-type-specific comparison functions,
such as btint4cmp.  In the original code we did this every time
we compared two tuples.  But a tuple is normally compared multiple
times during a sort (about logN times, in fact), so it makes sense
to do the Datum extraction just once and save the value to use
in comparisons.  The question at hand here is whether to pre-extract
Datums for each column when there are multiple sort columns, or
just extract the first column (which is often all you need for
a comparison anyway).  I think the one-column approach wins
because it keeps the sort data associated with a tuple fixed-size.
Extracting all columns would require a more complex data structure
... plus it would take more memory, and memory space is at a premium
here.

            regards, tom lane

pgsql-patches by date:

Previous
From: Michael Glaesemann
Date:
Subject: Re: WIP: further sorting speedup
Next
From: Michael Glaesemann
Date:
Subject: Re: WIP: further sorting speedup