Re: Why do we still perform a check for pre-sorted input within qsort variants? - Mailing list pgsql-hackers

From Dann Corbit
Subject Re: Why do we still perform a check for pre-sorted input within qsort variants?
Date
Msg-id 87F42982BF2B434F831FCEF4C45FC33E5BD365B6@EXCHANGE.corporate.connx.com
Whole thread Raw
In response to Re: Why do we still perform a check for pre-sorted input within qsort variants?  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Why do we still perform a check for pre-sorted input within qsort variants?  ('Bruce Momjian' <bruce@momjian.us>)
Re: Why do we still perform a check for pre-sorted input within qsort variants?  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
>-----Original Message-----
>From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Bruce Momjian
>Sent: Friday, March 08, 2013 11:22 AM
>To: Peter Geoghegan
>Cc: Robert Haas; Tom Lane; PG Hackers
>Subject: Re: [HACKERS] Why do we still perform a check for pre-sorted input within qsort variants?
>
>On Mon, Feb 25, 2013 at 02:31:21PM +0000, Peter Geoghegan wrote:
>> On 25 February 2013 11:49, Robert Haas <robertmhaas@gmail.com> wrote:
>> > I did attempt to do some tinkering with this while I was playing
>> > with it, but I didn't come up with anything really compelling.  You
>> > can reduce the number of comparisons on particular workloads by
>> > tinkering with the algorithm, but then somebody else ends up doing
>> > more comparisons, so it's hard to say whether you've really made
>> > things better.  Or at least I found it so.
>>
>> Right.
>>
>> To be honest, the real reason that it bothers me is that everything
>> else that our qsort routine does that differs from classic quicksort
>> (mostly quadratic insurance, like the median-of-medians pivot
>> selection, but also the fallback to insertion sort when n < 7) is very
>> well supported by peer reviewed research. Like Tom, I find it
>> implausible that Sedgewick and others missed a trick, where we did
>> not, particularly with something so simple.
>
>Perhaps we are more likely to be fed sorted data than a typical qsort usecase.
>
Checking for pre-sorted input will not make the routine faster on average.
However, it prevents a common perverse case where the runtime goes quadratic, so sorting 10^6 elements will take
K*10^12thoperations when the bad condition does occur. 
Checking for pre-sorted data can be thought of as an insurance policy.  It's kind of a pain to pay the premiums, but
yousure are glad you have it when an accident occurs. 
Because the check is linear, and the algorithm is O(n*log(n)), the cost is not terribly significant.

Switching to insertion sort for small partitions is almost always a good idea.  This idea was invented by Richard
Singletonas ACM algorithm 347. 

A different sort of safeguard, as compared to checking for pre-ordered data, is to watch recursion depth.  If we find
thatwe have already gone past a depth of log2(n) levels and we are not ready to shift to insertion sort, then there is
somesort of perverse data set.  A traditional fix is to switch to heap sort at this point.  Since heap sort is also
O(n*log(n)){though the constant multiplier is larger than for quicksort on average} you never get O(n*n) behavior.
Thismethod is called introspective sort. 

There are, of course, other clever things that can be tried.  The relaxed weak heapsort of Edelkamp and Steigler, for
instance,or using tries as in "Cache-Efficient String Sorting Using Copying" by Ranjan Sinha.  There is also possibly
significantbenefit to using radix sort for fixed width data that can be easily binned. 

I seem to recall that a year or two back some study was done on quicksort methodology as used in PostgreSQL.  As I
recall,the algorithm used in PostgreSQL fared well in the tests. 





pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Why do we still perform a check for pre-sorted input within qsort variants?
Next
From: 'Bruce Momjian'
Date:
Subject: Re: Why do we still perform a check for pre-sorted input within qsort variants?