Re: No merge sort? - Mailing list pgsql-hackers

From Dann Corbit
Subject Re: No merge sort?
Date
Msg-id D90A5A6C612A39408103E6ECDD77B829408ABF@voyager.corporate.connx.com
Whole thread Raw
In response to No merge sort?  (Taral <taral@taral.net>)
Responses Re: No merge sort?  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
> -----Original Message-----
> From: Greg Stark [mailto:gsstark@mit.edu]
> Sent: Monday, April 07, 2003 1:50 PM
> To: Dann Corbit
> Cc: Greg Stark; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] No merge sort?
>
>
>
> > floats are typically 64 - 96 bytes, bigints can be
> arbitrarily large.
> >
> > Floats are generally 8 bytes.
>
> Er, I meant "bits" there. oops.
>
> I'm still reading the other stuff.
>
> Most of this usually comes down to differences between theory
> and practice. The constants often matter a whole lot, and the
> cache behaviour and memory usage can matter a whole lot.
> quicksort after all is actually O(n^2) worst case...

By the use of a randomized sample of min(n,max(3,log(n))) elements, the
probability of choosing the worst case median is so close to zero as to
never happen in practice.  In real life, a well written quicksort will
beat the pants off of heapsort and mergesort.

In database work, the disk I/O is the most important issue.  Hence,
replacement selection or a priority-queue based approach should be used.

When sorting data, if all the information fits into memory any good
O(n*log(n)) technique is probably good enough. For one million records,
n*log2(n) is just 20 million.  20 million comparison and exchange
operations are an eye-blink on fast, modern hardware when performed in
memory.  The thing we should worry about is what happens when disk I/O
rears its ugly head.  You have a better solution to that situation, and
you have a better sorting routine for a database.



pgsql-hackers by date:

Previous
From: Steve Wampler
Date:
Subject: Re: No merge sort?
Next
From: "Dann Corbit"
Date:
Subject: Re: No merge sort?