Re: Randomisation for ensuring nlogn complexity in quicksort - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Randomisation for ensuring nlogn complexity in quicksort
Date
Msg-id CA+Tgmoa+4jZf28DQvzqVMWNA5+xeWgcqMcw+5LcDkXwdANWJ=g@mail.gmail.com
Whole thread Raw
In response to Re: Randomisation for ensuring nlogn complexity in quicksort  (Claudio Freire <klaussfreire@gmail.com>)
Responses Re: Randomisation for ensuring nlogn complexity in quicksort
List pgsql-hackers
On Mon, Jul 1, 2013 at 3:54 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Mon, Jul 1, 2013 at 4:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> This shouldn't be too complex, and should give us a fixed nlogn complexity even for wild data sets, without
affectingexisting normal data sets that are present in every day transactions. I even believe that those data sets will
alsobenefit from the above optimisation. 
>>
>> The only method of selecting a pivot for quicksort that obtain O(n lg
>> n) run time with 100% certainty is have a magical oracle inside the
>> computer that tells you in fixed time and with perfect accuracy which
>> pivot you should select.
>
> Doesn't a linear median algorithm (say median of medians) get you O(n lg n)?
>
> Granted, with a huge constant (I think 4)... but it should still be O(n lg n).

No.   Thinking about this a little more, I believe the way it works
out is that any algorithm for picking the median that guarantees that
a certain *percentage* of the tuples will be in the smaller partition
will have O(n lg n) complexity, but any algorithm that only guarantees
that a fixed *number* of tuples in the smaller partition is still
quadratic in complexity.  In the case of a median algorithm, you're
only guaranteed to have 2 elements in the smaller partition, which is
a constant.  If you take a median of medians, you're guaranteed to
have 8 elements in the smaller partition, which is bigger, but still a
constant.

The reason why this doesn't matter much in practice is because the
data distribution that causes quadratic behavior for median-of-medians
is not one which is likely to occur in the real world and will
probably only come up if chosen by an adversary who is attempting to
make your life miserable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: Documentation/help for materialized and recursive views
Next
From: Peter Eisentraut
Date:
Subject: Re: "pg_ctl promote" exit status