Sort Refinement - Mailing list pgsql-hackers

From Simon Riggs
Subject Sort Refinement
Date
Msg-id 1206033442.4285.586.camel@ebony.site
Whole thread Raw
Responses Re: Sort Refinement
Re: Sort Refinement
Re: Sort Refinement
List pgsql-hackers
Currently, our sort algorithm assumes that its input is unsorted. So if
your data is sorted on (a) and you would like it to be sorted on (a,b)
then we need to perform the full sort of (a,b).

For small sorts this doesn't matter much. For larger sorts the heap sort
algorithm will typically result in just a single run being written to
disk which must then be read back in. Number of I/Os required is twice
the total volume of data to be sorted.

If we assume we use heap sort, then if we *know* that the data is
presorted on (a) then we should be able to emit tuples directly that the
value of (a) changes and keep emitting them until the heap is empty,
since they will exit the heap in (a,b) order.

The normal pattern of a sort node is to read all of its input and then
begin emitting rows, so the sort is completed before the first row is
returned (OK, lets gloss over the final merge bit for now). So what I'm
suggesting is a sort node that will switch back and forth between
consuming and emitting rows. In some cases it *may* still need to scroll
to disk, but we already have that code.

In this way we can completely avoid writing data to disk for some types
of large sort, even when we have a data set much larger than available
memory. Of course it only applies when we are refining an existing
well-known sort order.

This is a similar concept to the way we handled dynamic buffering of
data for the merge join in 8.3, so I expect it to have an equally warm
reception ;-)

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 
 PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk



pgsql-hackers by date:

Previous
From: Sam Mason
Date:
Subject: writing a MIN(RECORD) aggregate
Next
From: Simon Riggs
Date:
Subject: Re: Unique Constraints using Non-Unique Indexes