Re: Why do we still perform a check for pre-sorted input within qsort variants? - Mailing list pgsql-hackers

From Dann Corbit
Subject Re: Why do we still perform a check for pre-sorted input within qsort variants?
Date
Msg-id 87F42982BF2B434F831FCEF4C45FC33E5BD36A4A@EXCHANGE.corporate.connx.com
Whole thread Raw
In response to Re: Why do we still perform a check for pre-sorted input within qsort variants?  (Greg Stark <stark@mit.edu>)
Responses Re: Why do we still perform a check for pre-sorted input within qsort variants?  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
Yes, you are right.  I knew of a median of medians technique for pivot selection and I mistook that for the median of
mediansmedian selection algorithm (which it definitely isn't). 
I was not aware of a true linear time selection of the median algorithm {which is what median of medians accomplishes).
The fastest median selection algorithm that I was aware of was quickselect, which is only linear on average. 
I think that you analysis is correct, at any rate.

I also think I will enjoy learning and experimenting with the median of medians algorithm.
I found a link about it here:
http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm

-----Original Message-----
From: gsstark@gmail.com [mailto:gsstark@gmail.com] On Behalf Of Greg Stark
Sent: Saturday, March 09, 2013 1:21 PM
To: Dann Corbit
Cc: Bruce Momjian; Peter Geoghegan; Robert Haas; Tom Lane; PG Hackers
Subject: Re: Why do we still perform a check for pre-sorted input within qsort variants?

On Sat, Mar 9, 2013 at 8:52 PM, Dann Corbit <DCorbit@connx.com> wrote:
> Median of medians selection of the pivot gives you O(n*log(n)).
>
> No.  It does make O(n*n)  far less probable, but it does not eliminate it.  If it were possible, then introspective
sortwould be totally without purpose. 

No really, quicksort with median of medians pivot selection is most definitely O(n*log(n)) worst case. This is textbook
stuff.In fact even the introspective sort paper mentions it as one of the options to fail over to if the partition size
isn'tdecreasing rapidly enough. 

The problem is that median of medians is O(n) rather than O(1). That doesn't change the O() growth rate since there
willbe log(n) iterations. But it means it contributes to the constant factor and the end result ends up being a
constantfactor larger than heap sort or merge sort. That also explains how your reference on the quicksort adversary
doesn'tapply. It works by ordering elements you haven't compared yet and assumes that all but O(1) elements will still
beeligible for reordering. 

In any case I think you're basically right. What we have is basically a broken introspective sort that does more work
thannecessary and then handles fewer cases than it should. Implementing a better introspection that detects all
perversecases and does so with a lower overhead than the current check is a fine idea. 


--
greg



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Why do we still perform a check for pre-sorted input within qsort variants?
Next
From: Greg Stark
Date:
Subject: Re: Why do we still perform a check for pre-sorted input within qsort variants?