Re: No merge sort? - Mailing list pgsql-hackers

From Greg Stark
Subject Re: No merge sort?
Date
Msg-id 87y92mw15h.fsf@stark.dyndns.tv
Whole thread Raw
In response to Re: No merge sort?  ("Ron Peacetree" <rjpeace@earthlink.net>)
Responses Re: No merge sort?  ("Jason M. Felice" <jfelice@cronosys.com>)
List pgsql-hackers
"Ron Peacetree" <rjpeace@earthlink.net> writes:

> AFAIK, there are only 3 general purpose internal sorting techniques
> that have O(n) behavior:

Strictly speaking there are no sorting algorithms that have worst-case time
behaviour better than O(nlog(n)). Period.

The typical kind-of O(n) sorts involve either special-case inputs (almost
sorted), or only work if you ignore some aspect of the input size (radix
sort).

So, for example, for radix sort the log(n) factor comes precisely from having
to have log(n) passes. If you use octal that might go to log(n)/8 but it's
still O(log(n)).

If you assume your input fields are limited to a fixed size then O() notation
loses meaning. You can always sort in linear time by just storing bit flags in
a big vector and then scanning your (fixed-size) vector to read out the
values. 

However databases cannot assume fixed size inputs. Regardless of whether it's
on a 32 or 64 bit system postgres still has to sort much larger data
structures. floats are typically 64 - 96 bytes, bigints can be arbitrarily
large.

In fact posgres can sort user-defined datatypes as long as they have < and =
operators. (Actually I'm not sure on the precise constraint.)

Oh, and just to add one last fly in the ointment, postgres has to be able to
sort large datasets that don't even fit in memory. That means storing
temporary data on disk and minimizing the times data has to move from disk to
memory and back. Some alogorithms are better at that than others.

--
greg



pgsql-hackers by date:

Previous
From: cbbrowne@cbbrowne.com
Date:
Subject: Re: more contrib: log rotator
Next
From: Brian
Date:
Subject: pg_trigger.tgtype question