Yep. Also, bear in mind that the lg(n!)= ~ nlgn - n lower bound on
the number of comparisions:
a= says nothing about the amount of data movement used.
b= only holds for generic comparison based sorting algorithms.
As Knuth says (vol 3, p180), Distribution Counting sorts without
ever comparing elements to each other at all, and so does Radix
Sort. Similar comments can be found in many algorithms texts.
Any time we know that the range of the data to be sorted is substantially
restricted compared to the number of items to be sorted, we can sort in
less than O(lg(n!)) time. DB fields tend to take on few values and are
therefore "substantially restricted".
Given the proper resources and algorithms, O(n) sorts are very plausible
when sorting DB records.
All of the fastest external sorts of the last decade or so take advantage of
this. Check out that URL I posted.
Ron
-----Original Message-----
From: Mark Lewis <mark.lewis@mir3.com>
Sent: Sep 23, 2005 1:43 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Subject: Re: [PERFORM] Releasing memory during External sorting?
operations != passes. If you were clever, you could probably write a
modified bubble-sort algorithm that only made 2 passes. A pass is a
disk scan, operations are then performed (hopefully in memory) on what
you read from the disk. So there's no theoretical log N lower-bound on
the number of disk passes.
Not that I have anything else useful to add to this discussion, just a
tidbit I remembered from my CS classes back in college :)
-- Mark
On Fri, 2005-09-23 at 13:17 -0400, Tom Lane wrote:
> Ron Peacetree <rjpeace@earthlink.net> writes:
> > 2= No optimal external sorting algorithm should use more than 2 passes.
> > 3= Optimal external sorting algorithms should use 1 pass if at all possible.
>
> A comparison-based sort must use at least N log N operations, so it
> would appear to me that if you haven't got approximately log N passes
> then your algorithm doesn't work.
>
> regards, tom lane