Re: No merge sort? - Mailing list pgsql-hackers

From Greg Stark
Subject Re: No merge sort?
Date
Msg-id 878yumvu7v.fsf@stark.dyndns.tv
Whole thread Raw
In response to Re: No merge sort?  ("Dann Corbit" <DCorbit@connx.com>)
List pgsql-hackers
"Dann Corbit" <DCorbit@connx.com> writes:

> By the use of a randomized sample of min(n,max(3,log(n))) elements, the
> probability of choosing the worst case median is so close to zero as to
> never happen in practice.  

Not exactly. In fact the probability is no different than otherwise, but it
won't be repeatable and it won't be on common data sets like pre-sorted data.

> When sorting data, if all the information fits into memory any good
> O(n*log(n)) technique is probably good enough. For one million records,
> n*log2(n) is just 20 million.  20 million comparison and exchange
> operations are an eye-blink on fast, modern hardware when performed in
> memory.  

Woah, this is extremely dangerous thinking. For all you know the application
needs to execute this sort 200 times per second... 

Applications that can execute their queries entirely in memory are frequently
OLTP applications that need response within a hard upper limit. Often that
upper limit is measured in milliseconds...

> The thing we should worry about is what happens when disk I/O rears its ugly
> head. You have a better solution to that situation, and you have a better
> sorting routine for a database.

Not "better" just different. When dealing with disk I/O the database has to
use an algorithm that doesn't require too much random access reads. Disk is
quite fast at reading and writing sequentially but extremely slow and random
access.


While I/O bandwidth is often the bottleneck on databases it's also a lot
easier to deal with. You can make huge stripesets and use super fast i/o
controllers, to the point of saturating the machine's internal bus. It's much
harder to increase the amount of cpu cycles available.

Actually, in my experience cpu has been a bottleneck as often as disk i/o. But
my experience is quite heavily skewed to OLTP applications.

--
greg



pgsql-hackers by date:

Previous
From: "Dann Corbit"
Date:
Subject: Re: No merge sort?
Next
From: "Dann Corbit"
Date:
Subject: Re: No merge sort?