Re: Sorting Improvements for 8.4 - Mailing list pgsql-hackers

From Dann Corbit
Subject Re: Sorting Improvements for 8.4
Date
Msg-id D425483C2C5C9F49B5B7A41F8944154701000B48@postal.corporate.connx.com
Whole thread Raw
In response to Re: Sorting Improvements for 8.4  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Sorting Improvements for 8.4  ("Dann Corbit" <DCorbit@connx.com>)
Re: Sorting Improvements for 8.4  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
> -----Original Message-----
> From: Jeff Davis [mailto:pgsql@j-davis.com]
> Sent: Wednesday, December 19, 2007 3:10 PM
> To: Dann Corbit
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Sorting Improvements for 8.4
>
> On Wed, 2007-12-19 at 14:41 -0800, Dann Corbit wrote:
> > As long as sorting improvements are being considered, may I suggest
an
> > experiment that uses a very simple model?
> >
> > Assuming that you have K subfiles created by the initial sorting
pass,
> > insert the top record of each file into a priority queue.
> >
> > Then, emit records from the queue until the priority queue is empty.
> >
>
> What is the principle difference between that idea and our existing
sort
> algorithm?
>
> There's a good explanation in the comment at the top of tuplesort.c.


According to the comments, PostgreSQL uses replacement selection.
Replacement selection is a wonderful thing because it creates runs that
are twice as long as normal due to the snowplow effect.  See (for
instance):
http://ieeexplore.ieee.org/Xplore/login.jsp?url=/iel5/69/27216/01209012.
pdf

Then, the merge routine will have half as many runs to merge the files
together.

So (for instance) without replacement selection, if you create 1024
subfiles, then replacement selection will create 512.  That saves one
merge pass.

The algorithm that I am suggesting will take exactly one pass to merge
all of the files.

It works like this...

Imagine an array of pointers to the subfiles:
[*subfile][*subfile]...[*subfile]

Step 0:
We sort the array by a comparison operator that examines the top element
of each subfile.  So now the array is ordered such that the record with
the smallest key is in array slot 0.

Step 1:
We remove the first record from the subfile in array slot 0.  Now, the
priority of the first element *may* have changed.  So if it is no longer
smaller than the subfile immediately to the right, we do a binary
insertion to put this subfile in its new location, moving the contents
of array slot[1] to array slot 0 if it is needed.

Step 2:
Is the entire list of subfiles empty?  If yes, then terminate, if no
then go to Step 1.

Like I said, it is ultra-simple and it sorts the entire contents of all
subfiles to the output with a single pass.

Consider the way that current replacement selection works.  The actual
O(f(N)) behavior of replacement selection is just terrible O(n^2).  But
because we save one full merge pass, it is usually worth it anyway,
since memory access is much faster than disk.  And if we only have a few
subfiles, the savings will be large.  In the case of a priority queue
merge, we only have one single merge pass no matter how many subfiles
there are.




pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Sorting Improvements for 8.4
Next
From: "Dann Corbit"
Date:
Subject: Re: Sorting Improvements for 8.4