Re: No merge sort? - Mailing list pgsql-hackers
From | Ron Peacetree |
---|---|
Subject | Re: No merge sort? |
Date | |
Msg-id | uqbma.22946$4P1.2023021@newsread2.prod.itd.earthlink.net Whole thread Raw |
In response to | Re: No merge sort? ("Dann Corbit" <DCorbit@connx.com>) |
List | pgsql-hackers |
""Dann Corbit"" <DCorbit@connx.com> wrote in message news:D90A5A6C612A39408103E6ECDD77B829408ACB@voyager.corporate.connx.co m... > I would be interested to see your algorithm that performs a counting > or radix sort on 320 bit bins and that works in the general case > without using extra space. > Since I can't be sure of my audience, I'm writing for the widest reasonably possible audience, and I apologize in advance to those who know most of this. First, let's use some examples for the data to be sorted. I'll consider two cases, 1= each 320b entity is a row, and 2= the original example where there are 3 fields per row; the first being 320b, with each of the others being 160b. For case 1, a 2GB file has 2^34/320= 53,687,091 records. Let's call it 50x10^6 to keep things simple. Case 2 is 26,843,545 records. Let's call that 25x10^6. Let's examine the non space issues first, then explore space requirements. The most space and time efficient comparison based sorting algorithm (median-of-three partitioning Qsort with Isort used when almost sorted) will on average execute in ~(32/3)*n*ln(n) tyme units (~9.455x10^9 for case 1; ~4.543x10^9 for case 2). The use of "tyme" here is not a typo or a spice. It's a necessary abstraction because real time units are very HW and SW context dependant. Now let's consider a radix address calculation sort based approach. The best implementation does a Rsort on the MSBs of the data to the point of guaranteeing that the data is "almost sorted" then uses Isort as a "finishing touch". The Rsort part of the algorithm will execute on average in 11*p*n-n+O(p*m) tyme units, where p is the number of passes used and m is the number of unique states in the radix (2^16 for a 16b radix in which we can't prune any possibilities). For a 320b data value that we know nothing about, we must consider all possible values for any length radix we use. The function 11*p*n-n+O(p*m) has a minimum when we use a 16b radix: 11*20*50x10^6-50x10^6+O(20*2^16)= ~10.951x10^9 tyme units. If we use enough Rsort passes to get the data "almost ordered" and then use Isort to finish things we can do a bit better. Since Qsort needs a average of ~9.455x10^9 tyme units to sort the same data. It's not clear that the effort to code Rsort is worth it for this case. However, look what happens if we can assume that the number of distinct key values is small enough that we can sort the data in a relatively few number of passes: if the 320b key has <= 2^16 distinct values, we can sort them using one pass in ~500x10^6+O(2^16) tyme units. Even noting that O(2^16) has just become much larger since we are now manipulating key values rather than bits, this is stupendously better than Qsort and there should be no question that Rsort is worth considering in such cases. FTR, this shows that my naive (AKA "I didn't do the math") instinct was wrong. This technique is FAR better than I originally was estimating under these circumstances. As long as we can store a "map" of unique keys in a relatively small amount of space and use relatively few radix passes, this technique is far better than comparison based sorting methods. Said mapping is easily derived from table indexes, and some DB systems (DB2 and Oracle) will generate and save this data for you as part of their set of optimization tweaks. Even if none of that is true, an extra full scan of a memory resident data set to derive this information is cheap if the gamble pays off and you find out that the number of distinct key values is a small subset of the potential universe of said. Now let's consider that "extra space". At the least Qsort is going to need extra space proportional to lg(n) for its call tree (explicit or system level to support recursion). If we are using a 16b radix, the extra space in question in 64KB. In either case, it's in the noise. If we are using a mapping of <= 2^16 keys, the extra space in question is ~2.5MB. Considering we are sorting an ~2GB file and the incredible improvement in running time, this seems to be a very worthwhile investment. This would be worth adding support for to PostgreSQL.
pgsql-hackers by date: