Re: Tuning current tuplesort external sort code for 8.2 - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Tuning current tuplesort external sort code for 8.2
Date
Msg-id 1128417400.8603.239.camel@localhost.localdomain
Whole thread Raw
In response to Re: Tuning current tuplesort external sort code for 8.2  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, 2005-10-04 at 00:21 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > AFAICS the following opportunities exist, without changing any of the
> > theoretical algorithms or the flexibility of definable datatypes:
> 
> > 1. tuplesort_heap_siftup and tuplesort_heap_insert make no attempt to
> > cache the values of keys that have been obtained from *_getattr macros.
> > The two routines navigate a tournament sort heap, so that on average 50%
> > of comparisons use at least one immediately preceeding tuple and key
> > values from that could be cached ready for the next call.
> 
> Hmm ... this seems interesting, but you also have to look at the
> potential downside: what is the cost of doing the caching?

Actually, nothing for the cache-1-tuple case. We just re-initialise the
fcinfo based around whether we are caching 0,1 or 2 fields. We keep two
fcinfo structures, so that when we cache=0 fields we do not pollute the
cache for the cache=1 case. 

tuplesort_heap_insert allows us to cache one value each time round the
inner loop, always using fcinfo1. (cache=1). tuplesort_heap_siftup has
two comparisons per iteration; it allows us to cache 0 values on the
first call, so we use fcinfo0. We can always cache one value for the
second call, exactly same as _heap_insert case, so we use fcinfo1. Maybe
we can also reuse the winner from the first call as the second input to
the second call. ehem...It's easier to see when you look at the code.

We need to work a bit harder to get the cache=2 case but its possible,
though I take your point that it may not be worth it. I'll do the
cache=1 case before attempting the cache=2 case, so we can measure the
gain.

We run _heap_insert and _heap_siftup once for each incoming tuple, so
the saving should be good.

> > All of the remaining ideas relate to NULL handling.
> 
> I can't get excited about this. Most sort scenarios involve few if any
> nulls.

The idea would be much less important anyway if you follow up on this
next idea:

> One thought that comes to mind is that the current system structure
> encourages sorting on keys that are at the end of their tuples.
> For instance, "SELECT foo, bar FROM ... ORDER BY baz" will sort by
> the third column of the generated tuples, which is certainly the least
> efficient to access.  

Not sure what goes on in there. Are you saying heap sort cols are always
at the end, or only in the cases you mention? What about where we're
doing a GROUP BY, so all the sort columns are always part of the select
list and frequently (by coding convention) at the front of the row?

If the cols aren't always in resjunk we can come up with some
performance test cases to check whether or not this is a problem, how
big and where it starts to show itself.

> It'd be interesting to look into generating the
> working tuples with baz as the first column.  I fear this is nontrivial,
> because there are a lot of places that expect resjunk columns to come
> last, but we should study it.  

It could be worth it to arrange the main select's attr list so that the
sort keys always come first. I'd settle just for that if the second part
is too much work.

It sounds like hard work to optimise the case where the ORDER BY is on
columns not mentioned in the select. The latter seems worth it for when
we do a sort-merge based on join columns not mentioned in the SELECT,
but not for optimizing sort-aggregate or one-table-select cases.

> (Note: this will do nada for Josh's
> original complaint about index build time, since btree index sorts will
> by definition use all the tuple's columns, but it seems like a
> significant issue for ordinary sorts.)

Understood, but we do care about heap sorts too! Heap and index sorts
have many dissimilarities, but both are important, so perhaps we should
tune for each case separately.

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Dennis Bjorklund
Date:
Subject: Re: PG Killed by OOM Condition
Next
From: Martijn van Oosterhout
Date:
Subject: Re: PG Killed by OOM Condition