Re: Tuning current tuplesort external sort code for 8.2 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Tuning current tuplesort external sort code for 8.2
Date
Msg-id 3752.1128399688@sss.pgh.pa.us
Whole thread Raw
In response to Tuning current tuplesort external sort code for 8.2  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Tuning current tuplesort external sort code for 8.2
List pgsql-hackers
Simon Riggs <simon@2ndquadrant.com> writes:
> AFAICS the following opportunities exist, without changing any of the
> theoretical algorithms or the flexibility of definable datatypes:

> 1. tuplesort_heap_siftup and tuplesort_heap_insert make no attempt to
> cache the values of keys that have been obtained from *_getattr macros.
> The two routines navigate a tournament sort heap, so that on average 50%
> of comparisons use at least one immediately preceeding tuple and key
> values from that could be cached ready for the next call.

Hmm ... this seems interesting, but you also have to look at the
potential downside: what is the cost of doing the caching?

> All of the remaining ideas relate to NULL handling.

I can't get excited about this.  Most sort scenarios involve few if any
nulls.

One thought that comes to mind is that the current system structure
encourages sorting on keys that are at the end of their tuples.
For instance, "SELECT foo, bar FROM ... ORDER BY baz" will sort by
the third column of the generated tuples, which is certainly the least
efficient to access.  It'd be interesting to look into generating the
working tuples with baz as the first column.  I fear this is nontrivial,
because there are a lot of places that expect resjunk columns to come
last, but we should study it.  (Note: this will do nada for Josh's
original complaint about index build time, since btree index sorts will
by definition use all the tuple's columns, but it seems like a
significant issue for ordinary sorts.)
        regards, tom lane


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: effective SELECT from child tables
Next
From: Rod Taylor
Date:
Subject: Vacuum and Transactions