Re: No merge sort? - Mailing list pgsql-hackers

From Ron Peacetree
Subject Re: No merge sort?
Date
Msg-id uqbma.22946$4P1.2023021@newsread2.prod.itd.earthlink.net
Whole thread Raw
In response to Re: No merge sort?  ("Dann Corbit" <DCorbit@connx.com>)
List pgsql-hackers
""Dann Corbit"" <DCorbit@connx.com> wrote in message
news:D90A5A6C612A39408103E6ECDD77B829408ACB@voyager.corporate.connx.co
m...
> I would be interested to see your algorithm that performs a counting
> or radix sort on 320 bit bins and that works in the general case
> without using extra space.
>
Since I can't be sure of my audience, I'm writing for the widest
reasonably possible audience, and I apologize in advance to those who
know most of this.

First, let's use some examples for the data to be sorted.  I'll
consider two cases, 1= each 320b entity is a row, and 2= the original
example where there are 3 fields per row; the first being 320b, with
each of the others being 160b.
For case 1, a 2GB file has 2^34/320= 53,687,091 records.  Let's call
it 50x10^6 to keep things simple.  Case 2 is 26,843,545 records.
Let's call that 25x10^6.

Let's examine the non space issues first, then explore space
requirements.

The most space and time efficient comparison based sorting algorithm
(median-of-three partitioning Qsort with Isort used when almost
sorted) will on average execute in ~(32/3)*n*ln(n) tyme units
(~9.455x10^9 for case 1; ~4.543x10^9 for case 2).  The use of "tyme"
here is not a typo or a spice.  It's a necessary abstraction because
real time units are very HW and SW context dependant.

Now let's consider a radix address calculation sort based approach.
The best implementation does a  Rsort on the MSBs of the data to the
point of guaranteeing that the data is "almost sorted" then uses Isort
as a "finishing touch".  The Rsort part of the algorithm will execute
on average in 11*p*n-n+O(p*m) tyme units, where p is the number of
passes used and m is the number of unique states in the radix (2^16
for a 16b radix in which we can't prune any possibilities).

For a 320b data value that we know nothing about, we must consider all
possible values for any length radix we use.  The function
11*p*n-n+O(p*m) has a minimum when we use a 16b radix:
11*20*50x10^6-50x10^6+O(20*2^16)= ~10.951x10^9 tyme units.  If we use
enough Rsort passes to get the data "almost ordered" and then use
Isort to finish things we can do a bit better.  Since Qsort needs a
average of ~9.455x10^9 tyme units to sort the same data.  It's not
clear that the effort to code Rsort is worth it for this case.

However, look what happens if we can assume that the number of
distinct key values is small enough that we can sort the data in a
relatively few number of passes: if the 320b key has <= 2^16 distinct
values, we can sort them using one pass in  ~500x10^6+O(2^16) tyme
units.  Even noting that O(2^16) has just become much larger since we
are now manipulating key values rather than bits, this is stupendously
better than Qsort and there should be no question that Rsort is worth
considering in such cases.  FTR, this shows that my naive (AKA "I
didn't do the math") instinct was wrong.  This technique is FAR better
than I originally was estimating under these circumstances.

As long as we can store a "map" of unique keys in a relatively small
amount of space and use relatively few radix passes, this technique is
far better than comparison based sorting methods.  Said mapping is
easily derived from table indexes, and some DB systems (DB2 and
Oracle) will generate and save this data for you as part of their set
of optimization tweaks.  Even if none of that is true, an extra full
scan of a memory resident data set to derive this information is cheap
if the gamble pays off and you find out that the number of distinct
key values is a small subset of the potential universe of said.

Now let's consider that "extra space".  At the least Qsort is going to
need extra space proportional to lg(n) for its call tree (explicit or
system level to support recursion).  If we are using a 16b radix, the
extra space in question in 64KB.  In either case, it's in the noise.
If we are using a mapping of <= 2^16 keys, the extra space in
question is ~2.5MB.  Considering we are sorting an ~2GB file and the
incredible improvement in running time, this seems to be a very
worthwhile investment.

This would be worth adding support for to PostgreSQL.



pgsql-hackers by date:

Previous
From: "Enke, Michael"
Date:
Subject: Re: [BUGS] Bug #943: Server-Encoding from EUC_TW to UTF-8 doesn'twork
Next
From: Stu Krone
Date:
Subject: Re: How do you execute a postgresql function from perl?