Thread: No merge sort?

No merge sort?

From
Taral
Date:
I tried general, but no response. Anyone here can shed some light on the
issue? Do I need to code merge sort into postgresql?

----- Forwarded message from Taral <taral@taral.net> -----

From: Taral <taral@taral.net>
To: pgsql-general@postgresql.org
Date: Wed, 12 Mar 2003 17:54:35 -0600
Subject: [GENERAL] No merge sort?
Message-ID: <20030312235435.GA3007@taral.net>

I have a table "test" that looks like this:

CREATE TABLE test (   id BIGINT,   time INTEGER
);

There is an index:

CREATE INDEX idx ON test(id, time);

The table has been loaded with 2M rows, where time ranges sequentially
from 0 to 1999999 and id is random values from 0 to 49999.

This query:

SELECT * FROM idx WHERE id IN (...) AND time > 198000 AND time < 199800
ORDER BY time DESC LIMIT 20;

has an EXPLAIN ANALYZE of:

Limit  (cost=3635.28..3635.28 rows=20 width=12) (actual time=22.94...22.96 rows=14 loops=1) ->  Sort
(cost=3635.28..3635.28rows=23 width=12) (actual time=22.93..22.93 rows=14 loops=1)       ->  Index Scan using idx, idx,
...,idx, idx on test  (cost=0.00...3634.77 rows=23 width=12) (actual time=1.01..22.10 rows=14 loops=1) 
Total runtime: 29.12 msec

This query:

SELECT * FROM idx WHERE id IN (...) AND time < 199800 ORDER BY time DESC
LIMIT 20;

has an EXPLAIN ANALYZE of:

Limit  (cost=14516.46..14516.46 rows=20 width=12) (actual time=1448..83..1448.86 rows=20 loops=1) ->  Sort
(cost=14516.46..14516.46rows=2527 width=12) (actual time=1448.82..1448.83 rows=21 loops=1)       ->  Index Scan using
idx,idx, ..., idx, idx on test  (cost=0.00...14373.67 rows=2527 width=12) (actual time=0.14..1437.33 rows=2048 loops=1) 
Total runtime: 1454.62 msec

Since the index will output 'time' sorted data for each 'id', why isn't
a merge sort being used here? A merge sort would reduce the execution
time back to 30 ms.

--
Taral <taral@taral.net>
This message is digitally signed. Please PGP encrypt mail to me.
"Most parents have better things to do with their time than take care of
their children." -- Me

Re: No merge sort?

From
Tom Lane
Date:
Taral <taral@taral.net> writes:
> Do I need to code merge sort into postgresql?

Seems like a waste of effort to me.  I find this example less than
compelling --- the case that could be sped up is quite narrow,
and the potential performance gain not all that large.  (A sort
is a sort however you slice it, with O(N log N) runtime...)

Also, the direction we'd likely be going in in future is to merge
multiple indexscans using bitmap techniques, so that the output
ordering of the scans couldn't be counted on anyway.
        regards, tom lane


Re: No merge sort?

From
Taral
Date:
On Thu, Mar 13, 2003 at 04:28:34PM -0500, Tom Lane wrote:
> Seems like a waste of effort to me.  I find this example less than
> compelling --- the case that could be sped up is quite narrow,
> and the potential performance gain not all that large.  (A sort
> is a sort however you slice it, with O(N log N) runtime...)

Actually, it's O(N) time. The index can produce "time" sorted data for
each "id" in linear time, and the merge sort can merge them in linear
time. Also, the existing system insists on loading _all_ candidate rows
whereas this method can benefit from the limit.

If you don't want to code it, I will. I need it for the livejournal
mysql->postgresql transition. (No, mysql doesn't do it right either.)
But a few pointers to the right places to look in the code would be
helpful.

> Also, the direction we'd likely be going in in future is to merge
> multiple indexscans using bitmap techniques, so that the output
> ordering of the scans couldn't be counted on anyway.

I don't understand this. What do these bitmap techniques do?

--
Taral <taral@taral.net>
This message is digitally signed. Please PGP encrypt mail to me.
"Most parents have better things to do with their time than take care of
their children." -- Me

Re: No merge sort?

From
Tom Lane
Date:
Taral <taral@taral.net> writes:
> On Thu, Mar 13, 2003 at 04:28:34PM -0500, Tom Lane wrote:
>> Seems like a waste of effort to me.  I find this example less than
>> compelling --- the case that could be sped up is quite narrow,
>> and the potential performance gain not all that large.  (A sort
>> is a sort however you slice it, with O(N log N) runtime...)

> Actually, it's O(N) time.

Only if you assume a fixed number of input streams.

>> Also, the direction we'd likely be going in in future is to merge
>> multiple indexscans using bitmap techniques, so that the output
>> ordering of the scans couldn't be counted on anyway.

> I don't understand this. What do these bitmap techniques do?

The idea is you look at the index to make a list of main-table tuple
positions you are interested in, which you represent compactly as a
compressed bitmap.  (There is some finagling needed because PG actually
uses block/line number rather than a pure tuple number to identify
tuples, but you can fake it with a reasonably small amount of overhead.)
Then you can combine multiple index inputs by ANDing or ORing bitmaps
(the OR case applies to your example).  Finally, you traverse the heap,
accessing the desired rows in heap-location order.  This loses in terms
of producing presorted output --- but it often saves enough in I/O costs
to more than justify doing the sort in memory.
        regards, tom lane


Re: No merge sort?

From
Taral
Date:
On Thu, Mar 13, 2003 at 10:30:27PM -0500, Tom Lane wrote:
> The idea is you look at the index to make a list of main-table tuple
> positions you are interested in, which you represent compactly as a
> compressed bitmap.  (There is some finagling needed because PG actually
> uses block/line number rather than a pure tuple number to identify
> tuples, but you can fake it with a reasonably small amount of overhead.)
> Then you can combine multiple index inputs by ANDing or ORing bitmaps
> (the OR case applies to your example).  Finally, you traverse the heap,
> accessing the desired rows in heap-location order.  This loses in terms
> of producing presorted output --- but it often saves enough in I/O costs
> to more than justify doing the sort in memory.

And it loses bigtime in the case of LIMIT. If the unlimited query
returns 4,000 records and I only want 20, you're retrieving 200x too
much data from disk.

--
Taral <taral@taral.net>
This message is digitally signed. Please PGP encrypt mail to me.
"Most parents have better things to do with their time than take care of
their children." -- Me

No index maximum? (was Re: No merge sort?)

From
Taral
Date:
Same setup, different query:

test=> explain select max(time) from test where id = '1';
NOTICE:  QUERY PLAN:

Aggregate  (cost=5084.67..5084.67 rows=1 width=0) ->  Index Scan using idx on test  (cost=0.00..5081.33 rows=1333
width=0)

Since the index is (id, time), why isn't the index being used to
retrieve the maximum value?

On Thu, Mar 13, 2003 at 03:10:49PM -0600, Taral wrote:
> I have a table "test" that looks like this:
>
> CREATE TABLE test (
>     id BIGINT,
>     time INTEGER
> );
>
> There is an index:
>
> CREATE INDEX idx ON test(id, time);
>
> The table has been loaded with 2M rows, where time ranges sequentially
> from 0 to 1999999 and id is random values from 0 to 49999.

--
Taral <taral@taral.net>
This message is digitally signed. Please PGP encrypt mail to me.
"Most parents have better things to do with their time than take care of
their children." -- Me

Re: No merge sort?

From
Tom Lane
Date:
Taral <taral@taral.net> writes:
> On Thu, Mar 13, 2003 at 10:30:27PM -0500, Tom Lane wrote:
>> The idea is you look at the index to make a list of main-table tuple
>> positions you are interested in, which you represent compactly as a
>> compressed bitmap.  [snip]

> And it loses bigtime in the case of LIMIT. If the unlimited query
> returns 4,000 records and I only want 20, you're retrieving 200x too
> much data from disk.

Sure.  That's why we have a planner that distinguishes between startup
cost and total cost, and interpolates when a LIMIT is involved.  But
if this mergesort idea only helps for small-limit cases, that's another
restriction on its scope of usefulness...
        regards, tom lane


Re: No merge sort?

From
Taral
Date:
On Fri, Mar 14, 2003 at 10:43:30PM -0500, Tom Lane wrote:
> Sure.  That's why we have a planner that distinguishes between startup
> cost and total cost, and interpolates when a LIMIT is involved.  But
> if this mergesort idea only helps for small-limit cases, that's another
> restriction on its scope of usefulness...

I don't think so, since even in the non-limit case it avoids having to
do a full sort if the number of initial streams is finite and small (as
in the case I demonstrated), reducing time complexity to O(N).

--
Taral <taral@taral.net>
This message is digitally signed. Please PGP encrypt mail to me.
"Most parents have better things to do with their time than take care of
their children." -- Me

Re: No index maximum? (was Re: No merge sort?)

From
Bruno Wolff III
Date:
On Fri, Mar 14, 2003 at 14:19:46 -0600, Taral <taral@taral.net> wrote:
> Same setup, different query:
> 
> test=> explain select max(time) from test where id = '1';
> NOTICE:  QUERY PLAN:
> 
> Aggregate  (cost=5084.67..5084.67 rows=1 width=0)
>   ->  Index Scan using idx on test  (cost=0.00..5081.33 rows=1333 width=0)
> 
> Since the index is (id, time), why isn't the index being used to
> retrieve the maximum value?

It looks like an index scan is being done.

If the index was on (time, id) instead of (id, time), then you could get
a further speed up by rewriting the query as:
select time from test where id = '1' order by time desc limit 1;


Re: No index maximum? (was Re: No merge sort?)

From
Taral
Date:
On Sat, Mar 15, 2003 at 09:23:28AM -0600, Bruno Wolff III wrote:
> On Fri, Mar 14, 2003 at 14:19:46 -0600,
>   Taral <taral@taral.net> wrote:
> > Same setup, different query:
> >
> > test=> explain select max(time) from test where id = '1';
> > NOTICE:  QUERY PLAN:
> >
> > Aggregate  (cost=5084.67..5084.67 rows=1 width=0)
> >   ->  Index Scan using idx on test  (cost=0.00..5081.33 rows=1333 width=0)
> >
> > Since the index is (id, time), why isn't the index being used to
> > retrieve the maximum value?
>
> It looks like an index scan is being done.
>
> If the index was on (time, id) instead of (id, time), then you could get
> a further speed up by rewriting the query as:
> select time from test where id = '1' order by time desc limit 1;

Yes, that's exactly it. It's an index _scan_. It should simply be able
to read the maximum straight from the btree.

--
Taral <taral@taral.net>
This message is digitally signed. Please PGP encrypt mail to me.
"Most parents have better things to do with their time than take care of
their children." -- Me

Re: No index maximum? (was Re: No merge sort?)

From
Bruno Wolff III
Date:
On Mon, Mar 17, 2003 at 11:23:47 -0600, Taral <taral@taral.net> wrote:
> On Sat, Mar 15, 2003 at 09:23:28AM -0600, Bruno Wolff III wrote:
> > On Fri, Mar 14, 2003 at 14:19:46 -0600,
> >   Taral <taral@taral.net> wrote:
> > > Same setup, different query:
> > > 
> > > test=> explain select max(time) from test where id = '1';
> > > NOTICE:  QUERY PLAN:
> > > 
> > > Aggregate  (cost=5084.67..5084.67 rows=1 width=0)
> > >   ->  Index Scan using idx on test  (cost=0.00..5081.33 rows=1333 width=0)
> > > 
> > > Since the index is (id, time), why isn't the index being used to
> > > retrieve the maximum value?
> > 
> > It looks like an index scan is being done.
> > 
> > If the index was on (time, id) instead of (id, time), then you could get
> > a further speed up by rewriting the query as:
> > select time from test where id = '1' order by time desc limit 1;
> 
> Yes, that's exactly it. It's an index _scan_. It should simply be able
> to read the maximum straight from the btree.

max and min don't use indexes. They are generic aggregate functions and
postgres doesn't have the special knowledge to know that for those
aggregate functions and index can be used. You can get around this
by rewriting the query as I previously indicated.

For more details on why things are this way, search the archives. This
topic comes up a lot.

I was also mistaken about have to switch the index around for this case.
It should work the way you have it (if you rewrite the query).


Re: No index maximum? (was Re: No merge sort?)

From
Taral
Date:
On Mon, Mar 17, 2003 at 11:23:47AM -0600, Taral wrote:
> Yes, that's exactly it. It's an index _scan_. It should simply be able
> to read the maximum straight from the btree.

Still doesn't work, even with rewritten query. It sort a
Limit(Sort(Index Scan)), with 1333 rows being pulled from the index.

--
Taral <taral@taral.net>
This message is digitally signed. Please PGP encrypt mail to me.
"Most parents have better things to do with their time than take care of
their children." -- Me

Re: No merge sort?

From
"Ron Peacetree"
Date:
AFAIK, there are only 3 general purpose internal sorting techniques
that have O(n) behavior:
1= Insertion Sort for "almost sorted" lists.  Since "almost sorted" is
a fuzzy concept, let's make the first approximation definition that no
more than n^(1/2) of the elements can be disordered.  There are better
definitions in the literature, but I don't remember them off the top
of my head.

2= Sorts from the class of Address Calculation, Counting, or
Distribution Sort.  These need to be able to carry out something more
complex than simply a comparison in order to achieve O(n), and
therefore have high constants in their execution.  For large enough n
though, these are the performance kings.

3= Straight Radix Sort where you minimize the number of passes by
using a base much greater than two for the the radix.  Usually octal
or hexidecimal.  On a 32b or 64b system, this approach will let you
sort in 2 passes.

All of the above have potentially nasty trade-offs in comparision to
the standard heavily tuned median-of-three quicksort used by the sort
unix command.  Nonetheless, I could see some value in providing all of
these with PostgeSQL (including a decent port of the unix sort routine
for the Win folks).  I'll note in passing that quicksort's Achille's
heel is that it's unstable (all of the rest are stable), which can be
a problem in a DB.

In the general case there's a few papers out there that state you can
sort in O(n) if you can throw O(n^2) space at the problem.  That
implies to me that for DB's, we are not going to be able to use O(n)
algorithms for most of our needs.


As for external sorts, everything I've ever read says that some sort
of merge technique is used:  balanced, multiway, or polyphase.  In all
cases, I've seen comments to the effect that reading some part of the
data into internal buffers, sorting them, and then merging them with
already sorted data is the best practice.


All of this seems to imply that instead of mergesort (which is
stable), PostgreSQL might be better off with the 4 sorts I've listed
plus a viciously efficient merge utility for combining partial results
that do fit into memory into result files on disk that don't.


Or am I crazy?

Ron Peacetree



Re: No merge sort?

From
Greg Stark
Date:
"Ron Peacetree" <rjpeace@earthlink.net> writes:

> AFAIK, there are only 3 general purpose internal sorting techniques
> that have O(n) behavior:

Strictly speaking there are no sorting algorithms that have worst-case time
behaviour better than O(nlog(n)). Period.

The typical kind-of O(n) sorts involve either special-case inputs (almost
sorted), or only work if you ignore some aspect of the input size (radix
sort).

So, for example, for radix sort the log(n) factor comes precisely from having
to have log(n) passes. If you use octal that might go to log(n)/8 but it's
still O(log(n)).

If you assume your input fields are limited to a fixed size then O() notation
loses meaning. You can always sort in linear time by just storing bit flags in
a big vector and then scanning your (fixed-size) vector to read out the
values. 

However databases cannot assume fixed size inputs. Regardless of whether it's
on a 32 or 64 bit system postgres still has to sort much larger data
structures. floats are typically 64 - 96 bytes, bigints can be arbitrarily
large.

In fact posgres can sort user-defined datatypes as long as they have < and =
operators. (Actually I'm not sure on the precise constraint.)

Oh, and just to add one last fly in the ointment, postgres has to be able to
sort large datasets that don't even fit in memory. That means storing
temporary data on disk and minimizing the times data has to move from disk to
memory and back. Some alogorithms are better at that than others.

--
greg



Re: No merge sort?

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: Greg Stark [mailto:gsstark@mit.edu]
> Sent: Monday, April 07, 2003 12:36 PM
> To: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] No merge sort?
>
>
> "Ron Peacetree" <rjpeace@earthlink.net> writes:
>
> > AFAIK, there are only 3 general purpose internal sorting techniques
> > that have O(n) behavior:
>
> Strictly speaking there are no sorting algorithms that have
> worst-case time behaviour better than O(nlog(n)). Period.

Provably true if the sort uses comparison.  Definitely false for
distribution sorts or distribution sort hybrids.
> The typical kind-of O(n) sorts involve either special-case
> inputs (almost sorted), or only work if you ignore some
> aspect of the input size (radix sort).

There is no need to ignore any aspect of the input with radix sort.
Radix sort can also be generalized in the same way that quicksort can.
The callback function returns the "bucket" of 'radix' signifigant bits.
So, for unsigned char, you just return the 'kth' character when the
'kth' bucket is requested if the radix is 256 for 8 bits.  For an
example of dealing with other sorts of data, see chapter 13 of "C
Unleashed":
http://users.powernet.co.uk/eton/unleashed/
> So, for example, for radix sort the log(n) factor comes
> precisely from having to have log(n) passes. If you use octal
> that might go to log(n)/8 but it's still O(log(n)).

You do not have to make log(n) passes at all.  In fact, with LSD radix
sort, all the buckets can be counted in a single pass.  Demonstration in
the named book (with source code).

> If you assume your input fields are limited to a fixed size
> then O() notation loses meaning. You can always sort in
> linear time by just storing bit flags in a big vector and
> then scanning your (fixed-size) vector to read out the values.
>
> However databases cannot assume fixed size inputs. Regardless
> of whether it's on a 32 or 64 bit system postgres still has
> to sort much larger data structures. floats are typically 64
> - 96 bytes, bigints can be arbitrarily large.

Floats are generally 8 bytes.  If you use hfloat on a VAX, it is 16
bytes.  That is the largest hardware floating point in common use.
Bigints can be arbitrarily large, but PostgreSQL does not handle
anything larger than 16 byte integers so there is no need to worry about
larger sizes.

A numeric type can be arbitrarily large (though a thousand digits is
pure foolishness).  However, these numbers can be sorted by standard
means.  A typical scenario would include an MSD radix sort until the
subsets were 200K rows, then Singleton's sort until 50 rows, then
insertion sort or perhaps Shellsort.  I write sorting routines for a
database company and millions of rows can be sorted in just a few
seconds.
> In fact posgres can sort user-defined datatypes as long as
> they have < and = operators. (Actually I'm not sure on the
> precise constraint.)
>
> Oh, and just to add one last fly in the ointment, postgres
> has to be able to sort large datasets that don't even fit in
> memory. That means storing temporary data on disk and
> minimizing the times data has to move from disk to memory and
> back. Some alogorithms are better at that than others.

A very important advancement in this area is a priority queue based
external sort.  It allows the use of the fastest possible internal sort
and a single pass to read, a single pass to merge, and a single pass to
write all of the data.  In this regard, it can be enormously faster than
even replacement selection.  For replacement selection, the advantage is
that the runs are twice as long as normal, allowing log(n)/2 passes
instead of log(n) passes.  However, the priority queue method only has
one read pass, one write pass, and one merge pass.

It works like this:
1.  Read blocks of data into memory and sort them.
2.  Write the sorted blocks to disk
3.  On the merge phase, we create a priority queue of file pointers, and
read the first item from each.  The priority queue sorts on the first
item in the file list.  When we extract an item from the minimal queue,
we write it to disk.  If the item changes when we read the next item
from that file, we adjust the priority.  We continue until all the files
are empty.  The same book cited previously has a model implementation to
demonstrate the technique.  Since disk I/O totally dominates external
sorts, this technique is a titanic win over conventional sorting means.

Another simple solution is to memory map the file, but do it in large
multi-page blocks instead of mapping the whole thing.

> --
> greg

dann
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo@postgresql.org
>



Re: No merge sort?

From
cbbrowne@cbbrowne.com
Date:
Ron Peacetree wrote:
> AFAIK, there are only 3 general purpose internal sorting techniques
> that have O(n) behavior:

Hum?  NO "general purpose" sorting technique has O(n) behaviour.

The theoretical best scenario, _in general_, is O(n log n).

Insertion sort is expected to provide O(n^2) behaviour, and radix-like
schemes get arbitrarily memory hungry and have bad pathological results.

> All of the above have potentially nasty trade-offs in comparision to
> the standard heavily tuned median-of-three quicksort used by the sort
> unix command.  Nonetheless, I could see some value in providing all of
> these with PostgeSQL (including a decent port of the unix sort routine
> for the Win folks).  I'll note in passing that quicksort's Achille's
> heel is that it's unstable (all of the rest are stable), which can be
> a problem in a DB.

Making one sort algorithm work very efficiently is quite likely to be a
lot more effective than frittering away time trying to get some special
cases (that you can't regularly use) to work acceptably.

> All of this seems to imply that instead of mergesort (which is
> stable), PostgreSQL might be better off with the 4 sorts I've listed
> plus a viciously efficient merge utility for combining partial results
> that do fit into memory into result files on disk that don't.
> 
> Or am I crazy?

More than likely.  It is highly likely that it will typically take more
computational effort to figure out that one of the 4 sorts provided
/any/ improvement than any computational effort they would save.

That's a /very/ common problem.  There's also a fair chance, seen in
practice, that the action of collecting additional statistics to improve
query optimization will cost more than the savings provided by the
optimizations.
--
(concatenate 'string "cbbrowne" "@acm.org")
http://www3.sympatico.ca/cbbrowne/wp.html
When ever in doubt consult a song. --JT Fletcher 



Re: No merge sort?

From
"Jason M. Felice"
Date:
On Mon, Apr 07, 2003 at 03:36:10PM -0400, Greg Stark wrote:
> "Ron Peacetree" <rjpeace@earthlink.net> writes:
> 
> > AFAIK, there are only 3 general purpose internal sorting techniques
> > that have O(n) behavior:
> 
> Strictly speaking there are no sorting algorithms that have worst-case time
> behaviour better than O(nlog(n)). Period.
> 

Not true.

http://www.elsewhere.org/jargon/html/entry/bogo-sort.html

-Jay 'Eraserhead' Felice

P.S. <g>



Re: No merge sort?

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: cbbrowne@cbbrowne.com [mailto:cbbrowne@cbbrowne.com]
> Sent: Monday, April 07, 2003 1:20 PM
> To: Ron Peacetree
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] No merge sort?
>
>
> Ron Peacetree wrote:
> > AFAIK, there are only 3 general purpose internal sorting techniques
> > that have O(n) behavior:
>
> Hum?  NO "general purpose" sorting technique has O(n) behaviour.
>
> The theoretical best scenario, _in general_, is O(n log n).

The ACM has published papers on several techniques that are O(n log log
n)
Radix and counting sort can be totally generalized in the same way that
comparision sorts can by creating a callback function.

If I have strings of 50 characters in length, I can count all RADIX
characters in an array of unsigned long counts[RADIX][50] in a single
pass.  As the key strings become successively larger, radix sorts fall
flat compared to comparison techniques, but a counting pass can also
determine whether distribution or comparision sorting will be better.
> Insertion sort is expected to provide O(n^2) behaviour, and
> radix-like schemes get arbitrarily memory hungry and have bad
> pathological results.

However, the pathological cases can be detected.  A typical pathological
case for real database work is an array where the initial data is
identical for a long distance.  This is not unusual in an ORDER BY/GROUP
BY scenario.
> > All of the above have potentially nasty trade-offs in
> comparision to
> > the standard heavily tuned median-of-three quicksort used
> by the sort
> > unix command.  Nonetheless, I could see some value in
> providing all of
> > these with PostgeSQL (including a decent port of the unix
> sort routine
> > for the Win folks).  I'll note in passing that quicksort's
> Achille's
> > heel is that it's unstable (all of the rest are stable),
> which can be
> > a problem in a DB.

A better sort can make your database dominatingly better than others for
some operations.  It can be an enormous technical advantage.
> Making one sort algorithm work very efficiently is quite
> likely to be a lot more effective than frittering away time
> trying to get some special cases (that you can't regularly
> use) to work acceptably.

I don't agree.  In fact, this is never true.  A decent sorting routine
usually involves several techniques.  Consider the library qsort
routine.  Usually, it will use quicksort with several special tests and
a randomized median selection until the set size becomes small enough.
Then, we switch to insertion sort or shellsort.
Or consider the STL and Plauger's library sort routine.  It is
introspective sort and is a hybrid of three algorithms (quicksort,
heapsort, and insertion sort).
> > All of this seems to imply that instead of mergesort (which is
> > stable), PostgreSQL might be better off with the 4 sorts
> I've listed
> > plus a viciously efficient merge utility for combining
> partial results
> > that do fit into memory into result files on disk that don't.
> >
> > Or am I crazy?
>
> More than likely.  It is highly likely that it will typically
> take more computational effort to figure out that one of the
> 4 sorts provided /any/ improvement than any computational
> effort they would save.

It is highly unlikely that this is true.  In fact, every good sorting
routine does *exactly* this.
> That's a /very/ common problem.  There's also a fair chance,
> seen in practice, that the action of collecting additional
> statistics to improve query optimization will cost more than
> the savings provided by the optimizations.
> --
> (concatenate 'string "cbbrowne" "@acm.org")
> http://www3.sympatico.ca/cbbrowne/wp.html
> When ever in doubt
> consult a song. --JT Fletcher
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
http://archives.postgresql.org



Re: No merge sort?

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: Jason M. Felice [mailto:jfelice@cronosys.com]
> Sent: Monday, April 07, 2003 1:10 PM
> To: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] No merge sort?
>
>
> On Mon, Apr 07, 2003 at 03:36:10PM -0400, Greg Stark wrote:
> > "Ron Peacetree" <rjpeace@earthlink.net> writes:
> >
> > > AFAIK, there are only 3 general purpose internal sorting
> techniques
> > > that have O(n) behavior:
> >
> > Strictly speaking there are no sorting algorithms that have
> worst-case
> > time behaviour better than O(nlog(n)). Period.
> >
>
> Not true.
>
http://www.elsewhere.org/jargon/html/entry/bogo-sort.html

He said "better than" not "worse than".

For comparison based sorting it is _provably_ true that you cannot sort
faster than log(n!) which is O(n*(log(n))).



Re: No merge sort?

From
Greg Stark
Date:
> floats are typically 64 - 96 bytes, bigints can be arbitrarily large.
> 
> Floats are generally 8 bytes.  

Er, I meant "bits" there. oops.

I'm still reading the other stuff.

Most of this usually comes down to differences between theory and practice.
The constants often matter a whole lot, and the cache behaviour and memory
usage can matter a whole lot. quicksort after all is actually O(n^2) worst
case...

--
greg



Re: No merge sort?

From
Steve Wampler
Date:
On Mon, 2003-04-07 at 13:39, Dann Corbit wrote:
> > -----Original Message-----
> > From: Jason M. Felice [mailto:jfelice@cronosys.com] 
> > Sent: Monday, April 07, 2003 1:10 PM
> > To: pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] No merge sort?
> > 
> > 
> > On Mon, Apr 07, 2003 at 03:36:10PM -0400, Greg Stark wrote:
> > > "Ron Peacetree" <rjpeace@earthlink.net> writes:
> > > 
> > > > AFAIK, there are only 3 general purpose internal sorting 
> > techniques 
> > > > that have O(n) behavior:
> > > 
> > > Strictly speaking there are no sorting algorithms that have 
> > worst-case 
> > > time behaviour better than O(nlog(n)). Period.
> > > 
> > 
> > Not true.
> > 
> http://www.elsewhere.org/jargon/html/entry/bogo-sort.html
> 
> He said "better than" not "worse than".
> 
> For comparison based sorting it is _provably_ true that you cannot sort
> faster than log(n!) which is O(n*(log(n))).

You didn't read far enough.  The 2nd form of the bogo sort is O(1),
at least in *one* universe...

-- 
Steve Wampler <swampler@noao.edu>
National Solar Observatory



Re: No merge sort?

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: Greg Stark [mailto:gsstark@mit.edu]
> Sent: Monday, April 07, 2003 1:50 PM
> To: Dann Corbit
> Cc: Greg Stark; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] No merge sort?
>
>
>
> > floats are typically 64 - 96 bytes, bigints can be
> arbitrarily large.
> >
> > Floats are generally 8 bytes.
>
> Er, I meant "bits" there. oops.
>
> I'm still reading the other stuff.
>
> Most of this usually comes down to differences between theory
> and practice. The constants often matter a whole lot, and the
> cache behaviour and memory usage can matter a whole lot.
> quicksort after all is actually O(n^2) worst case...

By the use of a randomized sample of min(n,max(3,log(n))) elements, the
probability of choosing the worst case median is so close to zero as to
never happen in practice.  In real life, a well written quicksort will
beat the pants off of heapsort and mergesort.

In database work, the disk I/O is the most important issue.  Hence,
replacement selection or a priority-queue based approach should be used.

When sorting data, if all the information fits into memory any good
O(n*log(n)) technique is probably good enough. For one million records,
n*log2(n) is just 20 million.  20 million comparison and exchange
operations are an eye-blink on fast, modern hardware when performed in
memory.  The thing we should worry about is what happens when disk I/O
rears its ugly head.  You have a better solution to that situation, and
you have a better sorting routine for a database.



Re: No merge sort?

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: Steve Wampler [mailto:swampler@noao.edu]
> Sent: Monday, April 07, 2003 1:58 PM
> To: Dann Corbit
> Cc: Jason M. Felice; Postgres-hackers
> Subject: Re: [HACKERS] No merge sort?
> On Mon, 2003-04-07 at 13:39, Dann Corbit wrote:
> > > -----Original Message-----
> > > From: Jason M. Felice [mailto:jfelice@cronosys.com]
> > > Sent: Monday, April 07, 2003 1:10 PM
> > > To: pgsql-hackers@postgresql.org
> > > Subject: Re: [HACKERS] No merge sort?
> > >
> > >
> > > On Mon, Apr 07, 2003 at 03:36:10PM -0400, Greg Stark wrote:
> > > > "Ron Peacetree" <rjpeace@earthlink.net> writes:
> > > >
> > > > > AFAIK, there are only 3 general purpose internal sorting
> > > techniques
> > > > > that have O(n) behavior:
> > > >
> > > > Strictly speaking there are no sorting algorithms that have
> > > worst-case
> > > > time behaviour better than O(nlog(n)). Period.
> > > >
> > >
> > > Not true.
> > >
> > http://www.elsewhere.org/jargon/html/entry/bogo-sort.html
> >
> > He said "better than" not "worse than".
> >
> > For comparison based sorting it is _provably_ true that you cannot
> > sort faster than log(n!) which is O(n*(log(n))).
>
> You didn't read far enough.  The 2nd form of the bogo sort is
> O(1), at least in *one* universe...

An array holding all possible outcomes is an implementation of this
algorithm.
Quite frankly, it's rather silly (as it is intended to be).



Re: No merge sort?

From
Greg Stark
Date:
"Dann Corbit" <DCorbit@connx.com> writes:

> By the use of a randomized sample of min(n,max(3,log(n))) elements, the
> probability of choosing the worst case median is so close to zero as to
> never happen in practice.  

Not exactly. In fact the probability is no different than otherwise, but it
won't be repeatable and it won't be on common data sets like pre-sorted data.

> When sorting data, if all the information fits into memory any good
> O(n*log(n)) technique is probably good enough. For one million records,
> n*log2(n) is just 20 million.  20 million comparison and exchange
> operations are an eye-blink on fast, modern hardware when performed in
> memory.  

Woah, this is extremely dangerous thinking. For all you know the application
needs to execute this sort 200 times per second... 

Applications that can execute their queries entirely in memory are frequently
OLTP applications that need response within a hard upper limit. Often that
upper limit is measured in milliseconds...

> The thing we should worry about is what happens when disk I/O rears its ugly
> head. You have a better solution to that situation, and you have a better
> sorting routine for a database.

Not "better" just different. When dealing with disk I/O the database has to
use an algorithm that doesn't require too much random access reads. Disk is
quite fast at reading and writing sequentially but extremely slow and random
access.


While I/O bandwidth is often the bottleneck on databases it's also a lot
easier to deal with. You can make huge stripesets and use super fast i/o
controllers, to the point of saturating the machine's internal bus. It's much
harder to increase the amount of cpu cycles available.

Actually, in my experience cpu has been a bottleneck as often as disk i/o. But
my experience is quite heavily skewed to OLTP applications.

--
greg



Re: No merge sort?

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: Greg Stark [mailto:gsstark@mit.edu]
> Sent: Monday, April 07, 2003 3:06 PM
> To: Dann Corbit
> Cc: Greg Stark; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] No merge sort?
>
> "Dann Corbit" <DCorbit@connx.com> writes:
>
> > By the use of a randomized sample of min(n,max(3,log(n))) elements,
> > the probability of choosing the worst case median is so
> close to zero
> > as to never happen in practice.
>
> Not exactly. In fact the probability is no different than
> otherwise, but it won't be repeatable and it won't be on
> common data sets like pre-sorted data.

This is wrong.  If I choose a single median, based on a pattern (e.g.
first/last/middle elements) then the probability is very high that I
have made a bad choice.  If I choose the median from a sample of 3, then
it is less likely that I have made a bad choice.  If I choose the median
from a sample of 32 elements chosen at random, it becomes extremely
unlikely that I have made a stupid choice.  If I choose the median based
by examination of every element, the chance that I have chosen the wrong
pivot is zero.  This technique [choosing from several sample pivots from
a random sample] clearly makes failure due to perverse cases less
likely.  Also, a good quicksort algorithm will also make a scan for
sequential or reversed data ordering on the partition.

By using a sample of many medians, the chances of choosing the wrong
median go down.  By randomizing the selection, the changes of being
fooled by a pattern go down.  Most real data has some pattern to it.
> > When sorting data, if all the information fits into memory any good
> > O(n*log(n)) technique is probably good enough. For one million
> > records,
> > n*log2(n) is just 20 million.  20 million comparison and exchange
> > operations are an eye-blink on fast, modern hardware when
> performed in
> > memory.
>
> Woah, this is extremely dangerous thinking. For all you know
> the application needs to execute this sort 200 times per second...

We are talking about SQL database applications.  If they need to sort
one million records 20 times per second, they have a serious design
flaw, either in the database or the application.
> Applications that can execute their queries entirely in
> memory are frequently OLTP applications that need response
> within a hard upper limit. Often that upper limit is measured
> in milliseconds...

The choice of internal sort routine (for any of the good ones) will only
change the speed by a small, constant factor for a given distribution.
For any choice of algorithm, there will be some data sets where any of
the alternatives would have been better.

These algorithms are all about the same:
Quicksort (as modified by Richard Singleton -- the original is useless
for serious work)
Heapsort
Mergesort

The heapsort algorithm has the highest constant cost of the three, but
never goes perverse and does not require additional memory.
The mergesort algorithm requires additional memory and the allocation
can fail.  This means that it is not certain that it can be performed.
The quicksort algorithm has three perverse cases for the standard
algorithm (sorted/reversed/organpipe) but all of these can be detected
and prevented.
> > The thing we should worry about is what happens when disk I/O rears
> > its ugly head. You have a better solution to that
> situation, and you
> > have a better sorting routine for a database.
>
> Not "better" just different. When dealing with disk I/O the
> database has to use an algorithm that doesn't require too
> much random access reads. Disk is quite fast at reading and
> writing sequentially but extremely slow and random access.
>
>
> While I/O bandwidth is often the bottleneck on databases it's
> also a lot easier to deal with. You can make huge stripesets
> and use super fast i/o controllers, to the point of
> saturating the machine's internal bus. It's much harder to
> increase the amount of cpu cycles available.
>
> Actually, in my experience cpu has been a bottleneck as often
> as disk i/o. But my experience is quite heavily skewed to
> OLTP applications.
>
> --
> greg
>
>



Re: No merge sort?

From
"Ron Peacetree"
Date:
<cbbrowne@cbbrowne.com> wrote in message
news:20030407202001.1EC3C58E0D@cbbrowne.com...
> Ron Peacetree wrote:
> > AFAIK, there are only 3 general purpose internal sorting
techniques
> > that have O(n) behavior:
>
> Hum?  NO "general purpose" sorting technique has O(n) behaviour.
>
> The theoretical best scenario, _in general_, is O(n log n).
The O(log(n!)) bound is only for comparison based sorts operating on
arbitarily disordered input.  There are general techniques that can
sort in O(n) time if we can "break" either assumption.  In a DB, we
often are dealing with data that is only slightly disordered, or we
are have enough meta knowledge that we can use more powerful ordering
operators than simple comparisons, or both.


> Insertion sort is expected to provide O(n^2) behaviour, and
radix-like
> schemes get arbitrarily memory hungry and have bad pathological
results.
>
None of these comments is accurate.
The sources for the following discussion are
A= Vol 3 of Knuth 2nd ed, (ISBN 0-201-89685-0)
B= Sedgewick's Algorithms in C, (0-201-51425-7)
C= Handbook of Algorithms and Data Structures 2nd ed by Gonnet and
Baeza-Yates. (0-201-41607-7)

1= Insertion sort is O(n) for "almost sorted" input.
p103 of Sedgewick.  There's also discussion on this in Knuth.

2= Distribution Sort and it's "cousins" which use more powerful
ordering operators than simply comparisons are a= reasonably general,
and b= O(n).
Look in all three references.

3= A proper implementation of straight Radix sort using 8b or 16b at a
time a= is NOT pathological, and b= is O(n).
Sedgewick is the most blunt about it on p142-143, but Knuth discusses
this as well.

All of the above are stable, which quicksort is not.  There are no
"pessimal" inputs for any of the above that will force worst case
behavior.  For quicksort there are (they are =very= unlikely however).
In real world terms, if you can use any of these approaches you should
be able to internally sort your data between 2X and 3X faster.


Unfortunately, most of us do not have the luxury of working with
Memory Resident DB's.  In the real world, disk access is an important
part of our DB sorting efforts, and that changes things.  Very fast
internal sorting routines combined with multidisk merging algorithms
that minimize overall disk I/O while maximizing the sequential nature
of that I/O are the best we can do overall in such a situation.



Re: No merge sort?

From
"Ron Peacetree"
Date:
<cbbrowne@cbbrowne.com> wrote in message
news:20030407202001.1EC3C58E0D@cbbrowne.com...
> It is highly likely that it will typically take more
> computational effort to figure out that one of the 4 sorts provided
> /any/ improvement than any computational effort they would save.
>
> That's a /very/ common problem.  There's also a fair chance, seen in
> practice, that the action of collecting additional statistics to
improve
> query optimization will cost more than the savings provided by the
> optimizations.
>
"Back in the Day" I heard similar arguments when discussing whether
there should be support for hashing [O(n)], interpolation search
[O(lglg(n))], binary search [O(lg(n))], and sequential search [O(n)]
or for only some subset of these for a DB system I was working on.  To
make a long story short, it was worth it to have support for all of
them because the "useful domain" of each was reasonably large and
reasonably unique compared to the others.

I submit a similar situation exists for sorting.  (and yes, I've been
here before too ;-)

Giving end users of PostgreSQL a reasonable set of tools for building
high performance applications is just good business.



Re: No merge sort?

From
"Ron Peacetree"
Date:
"Ron Peacetree" <rjpeace@earthlink.net> wrote in message
news:%cqka.13403$ey1.1154620@newsread1.prod.itd.earthlink.net...
> <cbbrowne@cbbrowne.com> wrote in message
> news:20030407202001.1EC3C58E0D@cbbrowne.com...
> > It is highly likely that it will typically take more
> > computational effort to figure out that one of the 4 sorts
provided
> > /any/ improvement than any computational effort they would save.
> >
> > That's a /very/ common problem.  There's also a fair chance, seen
in
> > practice, that the action of collecting additional statistics to
> improve
> > query optimization will cost more than the savings provided by the
> > optimizations.
> >
> "Back in the Day" I heard similar arguments when discussing whether
> there should be support for hashing [O(n)], interpolation search
TYPO ALERT:  hashing is, of course, O(1)



Re: No merge sort?

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: Ron Peacetree [mailto:rjpeace@earthlink.net]
> Sent: Monday, April 07, 2003 6:33 PM
> To: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] No merge sort?
>
>
> <cbbrowne@cbbrowne.com> wrote in message
> news:20030407202001.1EC3C58E0D@cbbrowne.com...
> > Ron Peacetree wrote:
> > > AFAIK, there are only 3 general purpose internal sorting
> techniques
> > > that have O(n) behavior:
> >
> > Hum?  NO "general purpose" sorting technique has O(n) behaviour.
> >
> > The theoretical best scenario, _in general_, is O(n log n).
> The O(log(n!)) bound is only for comparison based sorts
> operating on arbitarily disordered input.  There are general
> techniques that can sort in O(n) time if we can "break"
> either assumption.  In a DB, we often are dealing with data
> that is only slightly disordered, or we are have enough meta
> knowledge that we can use more powerful ordering operators
> than simple comparisons, or both.
>
>
> > Insertion sort is expected to provide O(n^2) behaviour, and
> radix-like
> > schemes get arbitrarily memory hungry and have bad pathological
> results.
> >
> None of these comments is accurate.
> The sources for the following discussion are
> A= Vol 3 of Knuth 2nd ed, (ISBN 0-201-89685-0)
> B= Sedgewick's Algorithms in C, (0-201-51425-7)
> C= Handbook of Algorithms and Data Structures 2nd ed by
> Gonnet and Baeza-Yates. (0-201-41607-7)
>
> 1= Insertion sort is O(n) for "almost sorted" input.
> p103 of Sedgewick.  There's also discussion on this in Knuth.
>
> 2= Distribution Sort and it's "cousins" which use more
> powerful ordering operators than simply comparisons are a=
> reasonably general, and b= O(n). Look in all three references.
>
> 3= A proper implementation of straight Radix sort using 8b or
> 16b at a time a= is NOT pathological, and b= is O(n).
> Sedgewick is the most blunt about it on p142-143, but Knuth
> discusses this as well.

There can be a problem here with 2 & 3.  Imagine the following query:
SELECT company, division, plant, sum(revenue) FROM corporate GROUP BY
company, division, plant ORDER BY company, division, plant

Let's suppose further that company, division, and plant are all char 20.
For a given section of company + division, the first 40 characters will
be the same.  So a radix or distribution sort will need 40 passes over
the identical data before it even gets to the differentiating field
plant.

Now, a comparison sort will be O(n*log(n)).  The counting passes alone
for MSD radix sort will be 40 passes * N.  How large will n have to be
before Radix sort outperforms a comparison based sort?
2^40 = 1,099,511,627,776
About one trillion items appears to be break even.

If they are fixed width, we could use LSD radix sort and count all the
bins in one pass.  But that technique will fail with varchar.
Naturally, most long character fields will be varchar.  The basic upshot
is that radix and distribution sorts work best with short keys.
> All of the above are stable, which quicksort is not.  There
> are no "pessimal" inputs for any of the above that will force
> worst case behavior.  For quicksort there are (they are
> =very= unlikely however). In real world terms, if you can use
> any of these approaches you should be able to internally sort
> your data between 2X and 3X faster.
>
>
> Unfortunately, most of us do not have the luxury of working
> with Memory Resident DB's.  In the real world, disk access is
> an important part of our DB sorting efforts, and that changes
> things.  Very fast internal sorting routines combined with
> multidisk merging algorithms that minimize overall disk I/O
> while maximizing the sequential nature of that I/O are the
> best we can do overall in such a situation.
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>



Re: No merge sort?

From
"Ron Peacetree"
Date:
""Dann Corbit"" <DCorbit@connx.com> wrote in message
news:D90A5A6C612A39408103E6ECDD77B829408AC4@voyager.corporate.connx.co
m...
> There can be a problem here Distribution & Radix sorts.  Imagine the
following query:
> SELECT company, division, plant, sum(revenue) FROM corporate GROUP
BY
> company, division, plant ORDER BY company, division, plant
>
> Let's suppose further that company, division, and plant are all char
20.
> For a given section of company + division, the first 40 characters
will
> be the same.  So a radix or distribution sort will need 40 passes
over
> the identical data before it even gets to the differentiating field
> plant.
>
1= Distribution sort does not have this problem.  Any data that can be
constrained by the "pigeon hole principle" can be sorted in o(n) time.
2= For Radis sort, that's iff ("if and only if") you use =characters=
as the radix of a radix sort and do a MSB aka partition-exchange sort.
The appropriate radix here is a =field= not a character.  Since there
are 3 fields vs 60 characters the problem becomes 2/3 wasted passes
instead of 40/60.  We can do even better by making one field for
company+division and one for plant.  Now we have no wasted passes and
an O(n) sort.  Since PostgreSQL has wonderful built in support for
string handling and regexp, this is clearly an easy to implement and
superior approach.

Since a comparison based sorting algorithm would be best implemented
by working on the same atomic units, the fair comparison becomes the
log(n!) behavior of that sort vs the O(n) behavior of Distribution or
Radix sort on data of that atomic size.  The constants involved in
each ordering operation are bigger for the two O(n) sorts, but despite
that you'll find that for reasonable sizes of real world data, you'll
complete the O(n) sorts 2-3X faster.

Of course, internal sorting performance is vastly dominated by disk
I/O costs if there is even a moderate amount
disk I/O involved.  However, there are ways to combine decent internal
sorting with efficient disk merging, getting the best possible out of
both subsystems.


> If they are fixed width, we could use LSD radix sort and count all
the
> bins in one pass.  But that technique will fail with varchar.
> Naturally, most long character fields will be varchar.  The basic
upshot
> is that radix and distribution sorts work best with short keys.
>
More accurate would be say that such searches work best with keys that
have as little in common as possible to each other.  Length isn't the
problem, =the number of unique keys= and having few equal keys is.
Large quantities of equal keys can trash the performance of many
sorting algorithms, including quicksort, unless steps are taken to
either modify the algorithm to handle it or the data to minimize such
an occurrence.



Re: No merge sort?

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: Ron Peacetree [mailto:rjpeace@earthlink.net]
> Sent: Tuesday, April 08, 2003 3:07 PM
> To: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] No merge sort?
>
>
> ""Dann Corbit"" <DCorbit@connx.com> wrote in message
> news:D90A5A6C612A39408103E6ECDD77B829408AC4@voyager.corporate.connx.co
> m...
> > There can be a problem here Distribution & Radix sorts.  Imagine the
> following query:
> > SELECT company, division, plant, sum(revenue) FROM corporate GROUP
> BY
> > company, division, plant ORDER BY company, division, plant
> >
> > Let's suppose further that company, division, and plant are all char
> 20.
> > For a given section of company + division, the first 40 characters
> will
> > be the same.  So a radix or distribution sort will need 40 passes
> over
> > the identical data before it even gets to the differentiating field
> > plant.
> >
> 1= Distribution sort does not have this problem.  Any data
> that can be constrained by the "pigeon hole principle" can be
> sorted in o(n) time.

Distribution sort is not an algorithm.  It is a general technique.  Both
counting sort and radix sort are distribution sorts.  I think you are
talking about counting sort here.  In order to insert a technique into a
database, you must solve the general case.

>2= For Radis sort, that's iff ("if and
> only if") you use =characters= as the radix of a radix sort
> and do a MSB aka partition-exchange sort. The appropriate
> radix here is a =field= not a character.  Since there are 3
> fields vs 60 characters the problem becomes 2/3 wasted passes
> instead of 40/60.

This is lunacy.  If you count the field or use it as a radix, you will
need a radix of 40*8 bits = 320 bits.  That means you will have 2^320 =
2.136e96 bins.

>We can do even better by making one field for
> company+division and one for plant.  Now we have no wasted passes and
> an O(n) sort.  Since PostgreSQL has wonderful built in
> support for string handling and regexp, this is clearly an
> easy to implement and superior approach.

You have no idea what you are talking about.
> Since a comparison based sorting algorithm would be best
> implemented by working on the same atomic units, the fair
> comparison becomes the
> log(n!) behavior of that sort vs the O(n) behavior of
> Distribution or Radix sort on data of that atomic size.  The
> constants involved in each ordering operation are bigger for
> the two O(n) sorts, but despite that you'll find that for
> reasonable sizes of real world data, you'll complete the O(n)
> sorts 2-3X faster.

With 2e96 * 4 space requirement (assuming that 4 billion is the largest
row count allowed in the database).  If we could build a memory that
large, how long would it take to reach the last bin?  There are an
estimated 10^80 elementary particles in the universe.
> Of course, internal sorting performance is vastly dominated
> by disk I/O costs if there is even a moderate amount disk I/O
> involved.  However, there are ways to combine decent internal
> sorting with efficient disk merging, getting the best
> possible out of both subsystems.

You are confusing terms here.  An internal sort is performed 100% in
memory.  It is an external sort that uses disk space.

> > If they are fixed width, we could use LSD radix sort and count all
> the
> > bins in one pass.  But that technique will fail with varchar.
> > Naturally, most long character fields will be varchar.  The basic
> upshot
> > is that radix and distribution sorts work best with short keys.
> >
> More accurate would be say that such searches work best with
> keys that have as little in common as possible to each other.
>  Length isn't the problem, =the number of unique keys= and
> having few equal keys is. Large quantities of equal keys can
> trash the performance of many sorting algorithms, including
> quicksort, unless steps are taken to either modify the
> algorithm to handle it or the data to minimize such an occurrence.

I would guess that you have never done any sorting work.
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo@postgresql.org
>



Re: No merge sort?

From
"Ron Peacetree"
Date:
""Dann Corbit"" <DCorbit@connx.com> wrote in message
news:D90A5A6C612A39408103E6ECDD77B829408AC6@voyager.corporate.connx.co
m...
> Distribution sort is not an algorithm.  It is a general technique.
Both
> counting sort and radix sort are distribution sorts.  I think you
are
> talking about counting sort here.  In order to insert a technique
into a
> database, you must solve the general case.
>
Terminology problem.  Please look in the references I posted.


> > 2= For Radix sort, that's iff ("if and only if") you use
> > =characters= as the radix of a radix sort
> > and do a MSB aka partition-exchange sort. The appropriate
> > radix here is a =3Dfield=3D not a character.  Since there are 3
> > fields vs 60 characters the problem becomes 2/3 wasted passes
> > instead of 40/60.
>
> This is lunacy.  If you count the field or use it as a radix, you
will
> need a radix of 40*8 bits= 320 bits.  That means you will have 2^320
> 2.136e96 bins.
>
You only need as many bins as you have unique key values silly ;-)
Remember, at its core Radix sort is just a distribution counting sort
(that's the name I learned for the general technique).  The simplest
implementation uses bits as the atomic unit, but there's nothing
saying you have to...  In a DB, we know all the values of the fields
we currently have stored in the DB.  We can take advantage of that.

As you correctly implied, the goal is to minimize the number of unique
bins you have to, err, distribute, items into.  That and having as few
duplicates as feasible are the important points.

If, for example, we have <= 255 possible values for the each radix
(Using your previous example, let's say you have <= 255 unique values
for the combined field "Company+Division" in the DB and the same for
"Plant" ), and 4 sets of radix to sort over, it doesn't matter if we
are sorting a 32b quantity using a radix of 8b, or a 4 field quantity
using a radix of one field (TBF, we should use key and index
techniques to minimize the amount of actual data we retrieve and
manipulate from disk during the sort.  We want to minimize disk I/O,
particularly seeking disk I/O).  The situations are analogous.

Note TANSTAAFL (as Heinlein would've said):  We have to use extra
space for mapping the radix values to the radix keys, and our "inner
loop" for the sorting operation is considerably more complicated than
that for a quicksort or a mergesort.  Hence the fact that even though
this is an O(n) technique, in real world terms you can expect only a
2-3X performance improvement over say quicksort for realistic amounts
of data.

Oh and FTR, IME for most "interesting" sorts in the DB domain, even
for "internal" sorting techniques, the time to read data from disk and
(possibly) write results back to disk tends to dominate the time spent
actually doing the internal sort...

I learned tricks like this 20 years ago.  I thought this stuff was
part of general lore in the DB field... *shrug*



Re: No merge sort?

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: Ron Peacetree [mailto:rjpeace@earthlink.net]
> Sent: Wednesday, April 09, 2003 12:38 PM
> To: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] No merge sort?
>
>
> ""Dann Corbit"" <DCorbit@connx.com> wrote in message
> news:D90A5A6C612A39408103E6ECDD77B829408AC6@voyager.corporate.connx.co
> m...
> > Distribution sort is not an algorithm.  It is a general technique.
> Both
> > counting sort and radix sort are distribution sorts.  I think you
> are
> > talking about counting sort here.  In order to insert a technique
> into a
> > database, you must solve the general case.
> >
> Terminology problem.  Please look in the references I posted.
>
>
> > > 2= For Radix sort, that's iff ("if and only if") you use
> > > =characters= as the radix of a radix sort and do a MSB aka
> > > partition-exchange sort. The appropriate radix here is a
> =3Dfield=3D
> > > not a character.  Since there are 3 fields vs 60 characters the
> > > problem becomes 2/3 wasted passes instead of 40/60.
> >
> > This is lunacy.  If you count the field or use it as a radix, you
> will
> > need a radix of 40*8 bits= 320 bits.  That means you will
> have 2^320
> > 2.136e96 bins.
> >
> You only need as many bins as you have unique key values
> silly ;-) Remember, at its core Radix sort is just a
> distribution counting sort (that's the name I learned for the
> general technique).  The simplest implementation uses bits as
> the atomic unit, but there's nothing saying you have to...
> In a DB, we know all the values of the fields we currently
> have stored in the DB.  We can take advantage of that.

By what magic do we know this?  If a database knew a-priori what all the
distinct values were, it would indeed be excellent magic.
> As you correctly implied, the goal is to minimize the number
> of unique bins you have to, err, distribute, items into.
> That and having as few duplicates as feasible are the
> important points.
>
> If, for example, we have <= 255 possible values for the each
> radix (Using your previous example, let's say you have <= 255
> unique values for the combined field "Company+Division" in
> the DB and the same for "Plant" ), and 4 sets of radix to
> sort over, it doesn't matter if we are sorting a 32b quantity
> using a radix of 8b, or a 4 field quantity using a radix of
> one field (TBF, we should use key and index techniques to
> minimize the amount of actual data we retrieve and manipulate
> from disk during the sort.  We want to minimize disk I/O,
> particularly seeking disk I/O).  The situations are analogous.

With 80 bytes, you have 2^320 possible values.  There is no way around
that.  If you are going to count them or use them as a radix, you will
have to classify them.  The only way you will know how many unique
values you have in "Company+Division" is to ... Either sort them or by
some other means discover all that are distinct.  The database does not
know how many there are beforehand.  Indeed, there could be anywhere
from zero to 2^320 (given enough space) distinct values.

I would be interested to see your algorithm that performs a counting or
radix sort on 320 bit bins and that works in the general case without
using extra space.

> Note TANSTAAFL (as Heinlein would've said):  We have to use
> extra space for mapping the radix values to the radix keys,
> and our "inner loop" for the sorting operation is
> considerably more complicated than that for a quicksort or a
> mergesort.  Hence the fact that even though this is an O(n)
> technique, in real world terms you can expect only a 2-3X
> performance improvement over say quicksort for realistic
> amounts of data.
>
> Oh and FTR, IME for most "interesting" sorts in the DB
> domain, even for "internal" sorting techniques, the time to
> read data from disk and
> (possibly) write results back to disk tends to dominate the
> time spent actually doing the internal sort...
>
> I learned tricks like this 20 years ago.  I thought this
> stuff was part of general lore in the DB field... *shrug*

As my grandfather used to say:
"Picklesmoke."

> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo@postgresql.org
>



Re: No merge sort?

From
"Ron Peacetree"
Date:
"Ron Peacetree" <rjpeace@earthlink.net> wrote in message
news:uqbma.22946$4P1.2023021@newsread2.prod.itd.earthlink.net...
>  If we are using a mapping of <= 2^16 keys, the extra space in
> question is ~2.5MB.
EDIT:  add "...for 320b keys"



Re: No merge sort?

From
"Ron Peacetree"
Date:
""Dann Corbit"" <DCorbit@connx.com> wrote in message
news:D90A5A6C612A39408103E6ECDD77B829408ACB@voyager.corporate.connx.co
m...
> > From: Ron Peacetree [mailto:rjpeace@earthlink.net]=20
> > Sent: Wednesday, April 09, 2003 12:38 PM
> > You only need as many bins as you have unique key values=20
> > silly ;-) Remember, at its core Radix sort is just a=20
> > distribution counting sort (that's the name I learned for the=20
> > general technique).  The simplest implementation uses bits as=20
> > the atomic unit, but there's nothing saying you have to...=20=20
> > In a DB, we know all the values of the fields we currently=20
> > have stored in the DB.  We can take advantage of that.
>
> By what magic do we know this?  If a database knew a-priori what all
the
> distinct values were, it would indeed be excellent magic.
For any table already in the DB, this is self evident.  If you are
talking about sorting data =before= it gets put into the DB (say for a
load), then things are different, and the best you can do is used
comparison based methods (and probably some reasonably sophisticated
external merging routines in addition if the data set to be sorted and
loaded is big enough).  The original question as I understood it was
about a sort as part of a query.  That means everything to be sorted
is in the DB, and we can take advantage of what we know.


> With 80 bytes, you have 2^320 possible values.  There is no way
> around that.  If you are going to count them or use them as a
> radix, you will have to classify them.  The only way you will
> know how many unique values you have in
> "Company+Division" is to ...
> Either sort them or by some means discover all that are distinct
Ummm, Indexes, particularly Primary Key Indexes, anyone?  Finding the
unique values in an index should be perceived as trivial... ...and you
often have the index in memory for other reasons already.


> .  The database does not know how many there are
> beforehand.  Indeed, there could be anywhere
> from zero to 2^320 (given enough space) distinct values.
>
> I would be interested to see your algorithm that
> performs a counting or radix sort on 320 bit bins and that
> works in the general case without using extra space.
>
1= See below.  I clearly stated that we need to use extra space
2= If your definition of "general" is "we know nothing about the
data", then of course any method based on using more sophisticated
ordering operators than comparisons is severely hampered, if not
doomed.  This is not a comparison based method.  You have to know some
things about the data being sorted before you can use it.  I've been
=very= clear about that throughout this.


> > Note TANSTAAFL (as Heinlein would've said):  We have to use=20
> > extra space for mapping the radix values to the radix keys,=20
> > and our "inner loop" for the sorting operation is=20
> > considerably more complicated than that for a quicksort or a=20
> > mergesort.  Hence the fact that even though this is an O(n)=20
> > technique, in real world terms you can expect only a 2-3X=20
> > performance improvement over say quicksort for realistic=20
> > amounts of data.
> >=20
> > Oh and FTR, IME for most "interesting" sorts in the DB=20
> > domain, even for "internal" sorting techniques, the time to=20
> > read data from disk and
> > (possibly) write results back to disk tends to dominate the=20
> > time spent actually doing the internal sort...
> >=20
Since you don't believe me, and have implied you wouldn't believe me
even if I posted results of efforts on my part, go sort some 2GB files
by implementing the algorithms in the sources I've given and as
mentioned in this thread.  (I'm assuming that you have access to
"real" machines that can perform this as an internal sort, or we
wouldn't be bothering to have this discussion). Then come back to the
table if you still think there are open issues to be discussed.



Re: No merge sort?

From
"Ron Peacetree"
Date:
"Ron Peacetree" <rjpeace@earthlink.net> wrote in message
news:EHIla.19833$ey1.1702921@newsread1.prod.itd.earthlink.net...
> ""Dann Corbit"" <DCorbit@connx.com> wrote in message
>
news:D90A5A6C612A39408103E6ECDD77B829408ACB@voyager.corporate.connx.co
> m...
> > With 80 bytes, you have 2^320 possible values.  There is no way
> > around that.  If you are going to count them or use them as a
> > radix, you will have to classify them.  The only way you will
> > know how many unique values you have in
> > "Company+Division" is to ...
> > Either sort them or by some means discover all that are distinct
> Ummm, Indexes, particularly Primary Key Indexes, anyone?  Finding
the
> unique values in an index should be perceived as trivial... ...and
you
> often have the index in memory for other reasons already.
>
Interesting Note: DB2 and Oracle have switches that turn on a table
feature that keeps track of all the unique values of a field and a
counter for how often each of those unique values occurs.  The
implications for speeding up non write querys involving those tables
should be obvious (again, TANSTAAFL: writes are now going to have the
extra overhead of updating this information)...

Wonder how hard this would be to put into PostgreSQL?



Re: No merge sort?

From
"Ron Peacetree"
Date:
""Dann Corbit"" <DCorbit@connx.com> wrote in message
news:D90A5A6C612A39408103E6ECDD77B829408ACB@voyager.corporate.connx.co
m...
> I would be interested to see your algorithm that performs a counting
> or radix sort on 320 bit bins and that works in the general case
> without using extra space.
>
Since I can't be sure of my audience, I'm writing for the widest
reasonably possible audience, and I apologize in advance to those who
know most of this.

First, let's use some examples for the data to be sorted.  I'll
consider two cases, 1= each 320b entity is a row, and 2= the original
example where there are 3 fields per row; the first being 320b, with
each of the others being 160b.
For case 1, a 2GB file has 2^34/320= 53,687,091 records.  Let's call
it 50x10^6 to keep things simple.  Case 2 is 26,843,545 records.
Let's call that 25x10^6.

Let's examine the non space issues first, then explore space
requirements.

The most space and time efficient comparison based sorting algorithm
(median-of-three partitioning Qsort with Isort used when almost
sorted) will on average execute in ~(32/3)*n*ln(n) tyme units
(~9.455x10^9 for case 1; ~4.543x10^9 for case 2).  The use of "tyme"
here is not a typo or a spice.  It's a necessary abstraction because
real time units are very HW and SW context dependant.

Now let's consider a radix address calculation sort based approach.
The best implementation does a  Rsort on the MSBs of the data to the
point of guaranteeing that the data is "almost sorted" then uses Isort
as a "finishing touch".  The Rsort part of the algorithm will execute
on average in 11*p*n-n+O(p*m) tyme units, where p is the number of
passes used and m is the number of unique states in the radix (2^16
for a 16b radix in which we can't prune any possibilities).

For a 320b data value that we know nothing about, we must consider all
possible values for any length radix we use.  The function
11*p*n-n+O(p*m) has a minimum when we use a 16b radix:
11*20*50x10^6-50x10^6+O(20*2^16)= ~10.951x10^9 tyme units.  If we use
enough Rsort passes to get the data "almost ordered" and then use
Isort to finish things we can do a bit better.  Since Qsort needs a
average of ~9.455x10^9 tyme units to sort the same data.  It's not
clear that the effort to code Rsort is worth it for this case.

However, look what happens if we can assume that the number of
distinct key values is small enough that we can sort the data in a
relatively few number of passes: if the 320b key has <= 2^16 distinct
values, we can sort them using one pass in  ~500x10^6+O(2^16) tyme
units.  Even noting that O(2^16) has just become much larger since we
are now manipulating key values rather than bits, this is stupendously
better than Qsort and there should be no question that Rsort is worth
considering in such cases.  FTR, this shows that my naive (AKA "I
didn't do the math") instinct was wrong.  This technique is FAR better
than I originally was estimating under these circumstances.

As long as we can store a "map" of unique keys in a relatively small
amount of space and use relatively few radix passes, this technique is
far better than comparison based sorting methods.  Said mapping is
easily derived from table indexes, and some DB systems (DB2 and
Oracle) will generate and save this data for you as part of their set
of optimization tweaks.  Even if none of that is true, an extra full
scan of a memory resident data set to derive this information is cheap
if the gamble pays off and you find out that the number of distinct
key values is a small subset of the potential universe of said.

Now let's consider that "extra space".  At the least Qsort is going to
need extra space proportional to lg(n) for its call tree (explicit or
system level to support recursion).  If we are using a 16b radix, the
extra space in question in 64KB.  In either case, it's in the noise.
If we are using a mapping of <= 2^16 keys, the extra space in
question is ~2.5MB.  Considering we are sorting an ~2GB file and the
incredible improvement in running time, this seems to be a very
worthwhile investment.

This would be worth adding support for to PostgreSQL.



Re: No merge sort?

From
terr@terralogic.net
Date:
My 2 cents worth.

The issue in my mind is that qsort cannot sort while the data is being read off the disk.  By using a balenced tree the
datacan be sorted into partitions while it is being read, then these partitions can be merged. 

IMHO this will give the best performance.

I wrote the code to do this years ago and if people want to play with it I suppose I can GPL it.  A language conversion
wouldbe required because I didn't have C available - but the code is runnable in its present form. 

Last time I benched it was against the DEC system sort routines on a VAX 7800 and it ran quite a bit faster... and this
isin spite of the fact that I had to do many rather UGLY things to fit this code into the constraints I had...  IE - it
hadto run under 5 operating systems including MS-DOS and like I said - it <unfortunatly> Isn't written in C. 




Re: No merge sort?

From
cbbrowne@cbbrowne.com
Date:
Ron Peacetree wrote:
> ""Dann Corbit"" <DCorbit@connx.com> wrote in message
> news:D90A5A6C612A39408103E6ECDD77B829408ACB@voyager.corporate.connx.co
> m...
> > > From: Ron Peacetree [mailto:rjpeace@earthlink.net]=20
> > > Sent: Wednesday, April 09, 2003 12:38 PM
> > > You only need as many bins as you have unique key values=20
> > > silly ;-) Remember, at its core Radix sort is just a=20
> > > distribution counting sort (that's the name I learned for the=20
> > > general technique).  The simplest implementation uses bits as=20
> > > the atomic unit, but there's nothing saying you have to...=20=20
> > > In a DB, we know all the values of the fields we currently=20
> > > have stored in the DB.  We can take advantage of that.
> >
> > By what magic do we know this?  If a database knew a-priori what all
> the
> > distinct values were, it would indeed be excellent magic.
> For any table already in the DB, this is self evident.  If you are
> talking about sorting data =before= it gets put into the DB (say for a
> load), then things are different, and the best you can do is used
> comparison based methods (and probably some reasonably sophisticated
> external merging routines in addition if the data set to be sorted and
> loaded is big enough).  The original question as I understood it was
> about a sort as part of a query.  That means everything to be sorted
> is in the DB, and we can take advantage of what we know.
> 
> > With 80 bytes, you have 2^320 possible values.  There is no way
> > around that.  If you are going to count them or use them as a
> > radix, you will have to classify them.  The only way you will
> > know how many unique values you have in
> > "Company+Division" is to ...
> > Either sort them or by some means discover all that are distinct

> Ummm, Indexes, particularly Primary Key Indexes, anyone?  Finding the
> unique values in an index should be perceived as trivial... ...and you
> often have the index in memory for other reasons already.

But this does not remove the fact that a radix sort on an 80 byte field
requires O(2^320) space, that is, O(N), where N is the size of the state
space of the key...

Perhaps you need to reread Gonnet; it tells you that...
--
output = reverse("moc.enworbbc@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/multiplexor.html
Rules of  the Evil  Overlord #28. "My  pet monster  will be kept  in a
secure cage  from which it  cannot escape and  into which I  could not
accidentally stumble." <http://www.eviloverlord.com/>



Re: No merge sort?

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: Ron Peacetree [mailto:rjpeace@earthlink.net]
> Sent: Friday, April 11, 2003 5:02 PM
> To: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] No merge sort?
>
>
> ""Dann Corbit"" <DCorbit@connx.com> wrote in message
> news:D90A5A6C612A39408103E6ECDD77B829408ACB@voyager.corporate.connx.co
> m...
> > > From: Ron Peacetree [mailto:rjpeace@earthlink.net]=20
> > > Sent: Wednesday, April 09, 2003 12:38 PM
> > > You only need as many bins as you have unique key values=20 silly
> > > ;-) Remember, at its core Radix sort is just a=20 distribution
> > > counting sort (that's the name I learned for the=20 general
> > > technique).  The simplest implementation uses bits as=20
> the atomic
> > > unit, but there's nothing saying you have to...=20=20 In a DB, we
> > > know all the values of the fields we currently=20 have
> stored in the
> > > DB.  We can take advantage of that.
> >
> > By what magic do we know this?  If a database knew a-priori what all
> the
> > distinct values were, it would indeed be excellent magic.
> For any table already in the DB, this is self evident.  If
> you are talking about sorting data =before= it gets put into
> the DB (say for a load), then things are different, and the
> best you can do is used comparison based methods (and
> probably some reasonably sophisticated external merging
> routines in addition if the data set to be sorted and loaded
> is big enough).  The original question as I understood it was
> about a sort as part of a query.  That means everything to be
> sorted is in the DB, and we can take advantage of what we know.

The database does not know all the distinct values that it contains.

> > With 80 bytes, you have 2^320 possible values.  There is no
> way around
> > that.  If you are going to count them or use them as a
> radix, you will
> > have to classify them.  The only way you will know how many unique
> > values you have in "Company+Division" is to ...
> > Either sort them or by some means discover all that are distinct
> Ummm, Indexes, particularly Primary Key Indexes, anyone?
> Finding the unique values in an index should be perceived as
> trivial... ...and you often have the index in memory for
> other reasons already.

If we have a unique clustered primary key index, why bother to sort?
The order by will be faster without sorting.

> > .  The database does not know how many there are
> > beforehand.  Indeed, there could be anywhere
> > from zero to 2^320 (given enough space) distinct values.
> >
> > I would be interested to see your algorithm that
> > performs a counting or radix sort on 320 bit bins and that works in
> > the general case without using extra space.
> >
> 1= See below.  I clearly stated that we need to use extra
> space 2= If your definition of "general" is "we know nothing
> about the data", then of course any method based on using
> more sophisticated ordering operators than comparisons is
> severely hampered, if not doomed.  This is not a comparison
> based method.  You have to know some things about the data
> being sorted before you can use it.  I've been =very= clear
> about that throughout this.
>
>
> > > Note TANSTAAFL (as Heinlein would've said):  We have to use=20
> > >extra space for mapping the radix values to the radix
> keys,=20  and
> > >our "inner loop" for the sorting operation is=20
> considerably more
> > >complicated than that for a quicksort or a=20  mergesort.
> Hence the
> > >fact that even though this is an O(n)=20  technique, in real world
> > >terms you can expect only a 2-3X=20  performance
> improvement over say
> > >quicksort for realistic=20  amounts of data.
> > >=20
> > > Oh and FTR, IME for most "interesting" sorts in the DB=20
> > > domain, even for "internal" sorting techniques, the time to=20
> > > read data from disk and
> > > (possibly) write results back to disk tends to dominate the=20
> > > time spent actually doing the internal sort...
> > >=20
> Since you don't believe me, and have implied you wouldn't
> believe me even if I posted results of efforts on my part, go
> sort some 2GB files by implementing the algorithms in the
> sources I've given and as mentioned in this thread.  (I'm
> assuming that you have access to "real" machines that can
> perform this as an internal sort, or we wouldn't be bothering
> to have this discussion). Then come back to the table if you
> still think there are open issues to be discussed.

I don't believe you and I am not going to try to code a sort that does
not work [and is demonstrably impossible].  But I will tell you what I
will do.  I will post the binary for a Win32 sorting routine that I
wrote which will sort files of any size.  If you can write a tool that
sorts faster (and does not rely on a primary key which would render the
task of sorting moot) then I will believe you.
Here is the routine:
ftp://cap.connx.com/chess-engines/new-approach/External.exe

The syntax to sort a text file with it is:
External <input_file> <output_file> <internal_sorting_buffer_size>

So (for instance) to sort a 100 gigabyte file with 1 gig of physical
memory allowed for the sort, the syntax would be:
External 100gig.dat 100gig.sor 1000000000