Home > mailing lists

Sort Refinement - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Sort Refinement
Date	March 20, 2008 18:20:13
Msg-id	1206033442.4285.586.camel@ebony.site Whole thread Raw
Responses	Re: Sort Refinement Re: Sort Refinement Re: Sort Refinement
List	pgsql-hackers

Tree view

Currently, our sort algorithm assumes that its input is unsorted. So if
your data is sorted on (a) and you would like it to be sorted on (a,b)
then we need to perform the full sort of (a,b).

For small sorts this doesn't matter much. For larger sorts the heap sort
algorithm will typically result in just a single run being written to
disk which must then be read back in. Number of I/Os required is twice
the total volume of data to be sorted.

If we assume we use heap sort, then if we *know* that the data is
presorted on (a) then we should be able to emit tuples directly that the
value of (a) changes and keep emitting them until the heap is empty,
since they will exit the heap in (a,b) order.

The normal pattern of a sort node is to read all of its input and then
begin emitting rows, so the sort is completed before the first row is
returned (OK, lets gloss over the final merge bit for now). So what I'm
suggesting is a sort node that will switch back and forth between
consuming and emitting rows. In some cases it *may* still need to scroll
to disk, but we already have that code.

In this way we can completely avoid writing data to disk for some types
of large sort, even when we have a data set much larger than available
memory. Of course it only applies when we are refining an existing
well-known sort order.

This is a similar concept to the way we handled dynamic buffering of
data for the merge join in 8.3, so I expect it to have an equally warm
reception ;-)

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 
 PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk

pgsql-hackers by date:

From: Sam Mason
Date: 20 March 2008, 18:19:50
Subject: writing a MIN(RECORD) aggregate

From: Simon Riggs
Date: 20 March 2008, 18:24:19
Subject: Re: Unique Constraints using Non-Unique Indexes

Sort Refinement - Mailing list pgsql-hackers

Previous

Next