Re: qsort, once again - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: qsort, once again |
Date | |
Msg-id | 18732.1142967137@sss.pgh.pa.us Whole thread Raw |
In response to | Re: qsort, once again ("Dann Corbit" <DCorbit@connx.com>) |
Responses |
Re: qsort, once again
|
List | pgsql-hackers |
"Dann Corbit" <DCorbit@connx.com> writes: > Well, my point was that it is a snap to implement and test. Well, having done this, I have to eat my words: it does seem to be a pretty good idea. The following test numbers are using Bentley & McIlroy's test framework, but modified to test only the case N=10000 rather than the four smaller N values they originally used. I did that because it exposes quadratic behavior more obviously, and the variance in N made it harder to compare comparison ratios for different cases. I also added a "NEARSORT" test method, which sorts the input distribution and then exchanges two elements chosen at random. I did that because I was concerned that nearly sorted input would be the worst case for the presorted-input check, as it would waste the most cycles before failing on such input. With our existing qsort code, the results look like distribution SAWTOOTH: max cratio 94.17, min 0.08, average 1.56 over 105 tests distribution RAND: max cratio 1.06, min 0.08, average 0.51 over 105 tests distribution STAGGER: max cratio 6.08, min 0.23, average 1.01 over 105 tests distribution PLATEAU: max cratio 94.17, min 0.08, average 2.12 over 105 tests distribution SHUFFLE: max cratio 94.17, min 0.23, average 1.92 over 105 tests method COPY: max cratio 6.08, min 0.08, average 0.72 over 75 tests method REVERSE: max cratio 5.34, min 0.08, average 0.69 over 75 tests method FREVERSE: max cratio 94.17, min 0.08, average 5.71 over 75 tests method BREVERSE: max cratio 3.86, min 0.08, average 1.41 over 75 tests method SORT: max cratio 0.82, min 0.08, average 0.31 over 75 tests method NEARSORT: max cratio 0.82, min 0.08, average 0.36 over 75 tests method DITHER: max cratio 5.52, min 0.18, average 0.77 over 75 tests Overall: average cratio 1.42 over 525 tests ("cratio" is the ratio of the actual number of comparison function calls to the theoretical expectation, N log2(N)) That's pretty awful: there are several test cases that make it use nearly 100 times the expected number of comparisons. Removing the swap_cnt test to bring it close to B&M's original recommendations, we get distribution SAWTOOTH: max cratio 3.85, min 0.08, average 0.70 over 105 tests distribution RAND: max cratio 1.06, min 0.08, average 0.52 over 105 tests distribution STAGGER: max cratio 6.08, min 0.58, average 1.12 over 105 tests distribution PLATEAU: max cratio 3.70, min 0.08, average 0.34 over 105 tests distribution SHUFFLE: max cratio 3.86, min 0.86, average 1.24 over 105 tests method COPY: max cratio 6.08, min 0.08, average 0.76 over 75 tests method REVERSE: max cratio 5.34, min 0.08, average 0.75 over 75 tests method FREVERSE: max cratio 4.56, min 0.08, average 0.73 over 75 tests method BREVERSE: max cratio 3.86, min 0.08, average 1.41 over 75 tests method SORT: max cratio 0.86, min 0.08, average 0.56 over 75 tests method NEARSORT: max cratio 0.86, min 0.08, average 0.56 over 75 tests method DITHER: max cratio 3.73, min 0.18, average 0.72 over 75 tests Overall: average cratio 0.78 over 525 tests which is a whole lot better as to both average and worst cases. I then added some code to check for presorted input (just after the n<7 insertion sort code): #ifdef CHECK_SORTED presorted = 1; for (pm = (char *) a + es; pm < (char *) a + n * es; pm += es) { if (cmp(pm - es, pm) > 0) { presorted = 0; break; } } if (presorted) return; #endif This gives distribution SAWTOOTH: max cratio 3.88, min 0.08, average 0.62 over 105 tests distribution RAND: max cratio 1.06, min 0.08, average 0.46 over 105 tests distribution STAGGER: max cratio 6.15, min 0.08, average 0.98 over 105 tests distribution PLATEAU: max cratio 3.79, min 0.08, average 0.31 over 105 tests distribution SHUFFLE: max cratio 3.91, min 0.08, average 1.09 over 105 tests method COPY: max cratio 6.15, min 0.08, average 0.72 over 75 tests method REVERSE: max cratio 5.34, min 0.08, average 0.76 over 75 tests method FREVERSE: max cratio 4.58, min 0.08, average 0.73 over 75 tests method BREVERSE: max cratio 3.91, min 0.08, average 1.44 over 75 tests method SORT: max cratio 0.08, min 0.08, average 0.08 over 75 tests method NEARSORT: max cratio 0.89, min 0.08, average 0.39 over 75 tests method DITHER: max cratio 3.73, min 0.18, average 0.72 over 75 tests Overall: average cratio 0.69 over 525 tests So the worst case seems only very marginally worse, and there is a definite improvement in the average case, even for inputs that aren't entirely sorted. Importantly, the "near sorted" case that I thought might send it into quadratic behavior doesn't seem to do that. So, unless anyone wants to do further testing, I'll go ahead and commit these changes. regards, tom lane PS: Just as a comparison point, here are the results when testing HPUX's library qsort: distribution SAWTOOTH: max cratio 7.00, min 0.08, average 0.76 over 105 tests distribution RAND: max cratio 1.11, min 0.08, average 0.53 over 105 tests distribution STAGGER: max cratio 7.05, min 0.58, average 1.24 over 105 tests distribution PLATEAU: max cratio 7.00, min 0.08, average 0.43 over 105 tests distribution SHUFFLE: max cratio 7.00, min 0.86, average 1.54 over 105 tests method COPY: max cratio 6.70, min 0.08, average 0.79 over 75 tests method REVERSE: max cratio 7.05, min 0.08, average 0.78 over 75 tests method FREVERSE: max cratio 7.00, min 0.08, average 0.77 over 75 tests method BREVERSE: max cratio 7.00, min 0.08, average 2.11 over 75 tests method SORT: max cratio 0.86, min 0.08, average 0.56 over 75 tests method NEARSORT: max cratio 0.86, min 0.08, average 0.56 over 75 tests method DITHER: max cratio 4.06, min 0.16, average 0.74 over 75 tests Overall: average cratio 0.90 over 525 tests and here are the results using glibc's qsort, which of course isn't quicksort at all but some kind of merge sort: distribution SAWTOOTH: max cratio 0.90, min 0.49, average 0.65 over 105 tests distribution RAND: max cratio 0.91, min 0.49, average 0.76 over 105 tests distribution STAGGER: max cratio 0.92, min 0.49, average 0.70 over 105 tests distribution PLATEAU: max cratio 0.84, min 0.49, average 0.54 over 105 tests distribution SHUFFLE: max cratio 0.64, min 0.49, average 0.52 over 105 tests method COPY: max cratio 0.92, min 0.49, average 0.66 over 75 tests method REVERSE: max cratio 0.92, min 0.49, average 0.68 over 75 tests method FREVERSE: max cratio 0.92, min 0.49, average 0.67 over 75 tests method BREVERSE: max cratio 0.92, min 0.49, average 0.68 over 75 tests method SORT: max cratio 0.49, min 0.49, average 0.49 over 75 tests method NEARSORT: max cratio 0.55, min 0.49, average 0.51 over 75 tests method DITHER: max cratio 0.92, min 0.50, average 0.74 over 75 tests Overall: average cratio 0.63 over 525 tests PPS: final version of test framework attached for the archives.
Attachment
pgsql-hackers by date: