Re: No merge sort? - Mailing list pgsql-hackers

From Dann Corbit
Subject Re: No merge sort?
Date
Msg-id D90A5A6C612A39408103E6ECDD77B829408ABD@voyager.corporate.connx.com
Whole thread Raw
In response to No merge sort?  (Taral <taral@taral.net>)
Responses Re: No merge sort?  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
> -----Original Message-----
> From: Greg Stark [mailto:gsstark@mit.edu]
> Sent: Monday, April 07, 2003 12:36 PM
> To: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] No merge sort?
>
>
> "Ron Peacetree" <rjpeace@earthlink.net> writes:
>
> > AFAIK, there are only 3 general purpose internal sorting techniques
> > that have O(n) behavior:
>
> Strictly speaking there are no sorting algorithms that have
> worst-case time behaviour better than O(nlog(n)). Period.

Provably true if the sort uses comparison.  Definitely false for
distribution sorts or distribution sort hybrids.
> The typical kind-of O(n) sorts involve either special-case
> inputs (almost sorted), or only work if you ignore some
> aspect of the input size (radix sort).

There is no need to ignore any aspect of the input with radix sort.
Radix sort can also be generalized in the same way that quicksort can.
The callback function returns the "bucket" of 'radix' signifigant bits.
So, for unsigned char, you just return the 'kth' character when the
'kth' bucket is requested if the radix is 256 for 8 bits.  For an
example of dealing with other sorts of data, see chapter 13 of "C
Unleashed":
http://users.powernet.co.uk/eton/unleashed/
> So, for example, for radix sort the log(n) factor comes
> precisely from having to have log(n) passes. If you use octal
> that might go to log(n)/8 but it's still O(log(n)).

You do not have to make log(n) passes at all.  In fact, with LSD radix
sort, all the buckets can be counted in a single pass.  Demonstration in
the named book (with source code).

> If you assume your input fields are limited to a fixed size
> then O() notation loses meaning. You can always sort in
> linear time by just storing bit flags in a big vector and
> then scanning your (fixed-size) vector to read out the values.
>
> However databases cannot assume fixed size inputs. Regardless
> of whether it's on a 32 or 64 bit system postgres still has
> to sort much larger data structures. floats are typically 64
> - 96 bytes, bigints can be arbitrarily large.

Floats are generally 8 bytes.  If you use hfloat on a VAX, it is 16
bytes.  That is the largest hardware floating point in common use.
Bigints can be arbitrarily large, but PostgreSQL does not handle
anything larger than 16 byte integers so there is no need to worry about
larger sizes.

A numeric type can be arbitrarily large (though a thousand digits is
pure foolishness).  However, these numbers can be sorted by standard
means.  A typical scenario would include an MSD radix sort until the
subsets were 200K rows, then Singleton's sort until 50 rows, then
insertion sort or perhaps Shellsort.  I write sorting routines for a
database company and millions of rows can be sorted in just a few
seconds.
> In fact posgres can sort user-defined datatypes as long as
> they have < and = operators. (Actually I'm not sure on the
> precise constraint.)
>
> Oh, and just to add one last fly in the ointment, postgres
> has to be able to sort large datasets that don't even fit in
> memory. That means storing temporary data on disk and
> minimizing the times data has to move from disk to memory and
> back. Some alogorithms are better at that than others.

A very important advancement in this area is a priority queue based
external sort.  It allows the use of the fastest possible internal sort
and a single pass to read, a single pass to merge, and a single pass to
write all of the data.  In this regard, it can be enormously faster than
even replacement selection.  For replacement selection, the advantage is
that the runs are twice as long as normal, allowing log(n)/2 passes
instead of log(n) passes.  However, the priority queue method only has
one read pass, one write pass, and one merge pass.

It works like this:
1.  Read blocks of data into memory and sort them.
2.  Write the sorted blocks to disk
3.  On the merge phase, we create a priority queue of file pointers, and
read the first item from each.  The priority queue sorts on the first
item in the file list.  When we extract an item from the minimal queue,
we write it to disk.  If the item changes when we read the next item
from that file, we adjust the priority.  We continue until all the files
are empty.  The same book cited previously has a model implementation to
demonstrate the technique.  Since disk I/O totally dominates external
sorts, this technique is a titanic win over conventional sorting means.

Another simple solution is to memory map the file, but do it in large
multi-page blocks instead of mapping the whole thing.

> --
> greg

dann
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo@postgresql.org
>



pgsql-hackers by date:

Previous
From: cbbrowne@cbbrowne.com
Date:
Subject: Re: Anyone working on better transaction locking?
Next
From: Kevin Brown
Date:
Subject: Re: Incorrect expected rows by ANALYZE