Thread: No merge sort?
I tried general, but no response. Anyone here can shed some light on the issue? Do I need to code merge sort into postgresql? ----- Forwarded message from Taral <taral@taral.net> ----- From: Taral <taral@taral.net> To: pgsql-general@postgresql.org Date: Wed, 12 Mar 2003 17:54:35 -0600 Subject: [GENERAL] No merge sort? Message-ID: <20030312235435.GA3007@taral.net> I have a table "test" that looks like this: CREATE TABLE test ( id BIGINT, time INTEGER ); There is an index: CREATE INDEX idx ON test(id, time); The table has been loaded with 2M rows, where time ranges sequentially from 0 to 1999999 and id is random values from 0 to 49999. This query: SELECT * FROM idx WHERE id IN (...) AND time > 198000 AND time < 199800 ORDER BY time DESC LIMIT 20; has an EXPLAIN ANALYZE of: Limit (cost=3635.28..3635.28 rows=20 width=12) (actual time=22.94...22.96 rows=14 loops=1) -> Sort (cost=3635.28..3635.28rows=23 width=12) (actual time=22.93..22.93 rows=14 loops=1) -> Index Scan using idx, idx, ...,idx, idx on test (cost=0.00...3634.77 rows=23 width=12) (actual time=1.01..22.10 rows=14 loops=1) Total runtime: 29.12 msec This query: SELECT * FROM idx WHERE id IN (...) AND time < 199800 ORDER BY time DESC LIMIT 20; has an EXPLAIN ANALYZE of: Limit (cost=14516.46..14516.46 rows=20 width=12) (actual time=1448..83..1448.86 rows=20 loops=1) -> Sort (cost=14516.46..14516.46rows=2527 width=12) (actual time=1448.82..1448.83 rows=21 loops=1) -> Index Scan using idx,idx, ..., idx, idx on test (cost=0.00...14373.67 rows=2527 width=12) (actual time=0.14..1437.33 rows=2048 loops=1) Total runtime: 1454.62 msec Since the index will output 'time' sorted data for each 'id', why isn't a merge sort being used here? A merge sort would reduce the execution time back to 30 ms. -- Taral <taral@taral.net> This message is digitally signed. Please PGP encrypt mail to me. "Most parents have better things to do with their time than take care of their children." -- Me
Taral <taral@taral.net> writes: > Do I need to code merge sort into postgresql? Seems like a waste of effort to me. I find this example less than compelling --- the case that could be sped up is quite narrow, and the potential performance gain not all that large. (A sort is a sort however you slice it, with O(N log N) runtime...) Also, the direction we'd likely be going in in future is to merge multiple indexscans using bitmap techniques, so that the output ordering of the scans couldn't be counted on anyway. regards, tom lane
On Thu, Mar 13, 2003 at 04:28:34PM -0500, Tom Lane wrote: > Seems like a waste of effort to me. I find this example less than > compelling --- the case that could be sped up is quite narrow, > and the potential performance gain not all that large. (A sort > is a sort however you slice it, with O(N log N) runtime...) Actually, it's O(N) time. The index can produce "time" sorted data for each "id" in linear time, and the merge sort can merge them in linear time. Also, the existing system insists on loading _all_ candidate rows whereas this method can benefit from the limit. If you don't want to code it, I will. I need it for the livejournal mysql->postgresql transition. (No, mysql doesn't do it right either.) But a few pointers to the right places to look in the code would be helpful. > Also, the direction we'd likely be going in in future is to merge > multiple indexscans using bitmap techniques, so that the output > ordering of the scans couldn't be counted on anyway. I don't understand this. What do these bitmap techniques do? -- Taral <taral@taral.net> This message is digitally signed. Please PGP encrypt mail to me. "Most parents have better things to do with their time than take care of their children." -- Me
Taral <taral@taral.net> writes: > On Thu, Mar 13, 2003 at 04:28:34PM -0500, Tom Lane wrote: >> Seems like a waste of effort to me. I find this example less than >> compelling --- the case that could be sped up is quite narrow, >> and the potential performance gain not all that large. (A sort >> is a sort however you slice it, with O(N log N) runtime...) > Actually, it's O(N) time. Only if you assume a fixed number of input streams. >> Also, the direction we'd likely be going in in future is to merge >> multiple indexscans using bitmap techniques, so that the output >> ordering of the scans couldn't be counted on anyway. > I don't understand this. What do these bitmap techniques do? The idea is you look at the index to make a list of main-table tuple positions you are interested in, which you represent compactly as a compressed bitmap. (There is some finagling needed because PG actually uses block/line number rather than a pure tuple number to identify tuples, but you can fake it with a reasonably small amount of overhead.) Then you can combine multiple index inputs by ANDing or ORing bitmaps (the OR case applies to your example). Finally, you traverse the heap, accessing the desired rows in heap-location order. This loses in terms of producing presorted output --- but it often saves enough in I/O costs to more than justify doing the sort in memory. regards, tom lane
On Thu, Mar 13, 2003 at 10:30:27PM -0500, Tom Lane wrote: > The idea is you look at the index to make a list of main-table tuple > positions you are interested in, which you represent compactly as a > compressed bitmap. (There is some finagling needed because PG actually > uses block/line number rather than a pure tuple number to identify > tuples, but you can fake it with a reasonably small amount of overhead.) > Then you can combine multiple index inputs by ANDing or ORing bitmaps > (the OR case applies to your example). Finally, you traverse the heap, > accessing the desired rows in heap-location order. This loses in terms > of producing presorted output --- but it often saves enough in I/O costs > to more than justify doing the sort in memory. And it loses bigtime in the case of LIMIT. If the unlimited query returns 4,000 records and I only want 20, you're retrieving 200x too much data from disk. -- Taral <taral@taral.net> This message is digitally signed. Please PGP encrypt mail to me. "Most parents have better things to do with their time than take care of their children." -- Me
Same setup, different query: test=> explain select max(time) from test where id = '1'; NOTICE: QUERY PLAN: Aggregate (cost=5084.67..5084.67 rows=1 width=0) -> Index Scan using idx on test (cost=0.00..5081.33 rows=1333 width=0) Since the index is (id, time), why isn't the index being used to retrieve the maximum value? On Thu, Mar 13, 2003 at 03:10:49PM -0600, Taral wrote: > I have a table "test" that looks like this: > > CREATE TABLE test ( > id BIGINT, > time INTEGER > ); > > There is an index: > > CREATE INDEX idx ON test(id, time); > > The table has been loaded with 2M rows, where time ranges sequentially > from 0 to 1999999 and id is random values from 0 to 49999. -- Taral <taral@taral.net> This message is digitally signed. Please PGP encrypt mail to me. "Most parents have better things to do with their time than take care of their children." -- Me
Taral <taral@taral.net> writes: > On Thu, Mar 13, 2003 at 10:30:27PM -0500, Tom Lane wrote: >> The idea is you look at the index to make a list of main-table tuple >> positions you are interested in, which you represent compactly as a >> compressed bitmap. [snip] > And it loses bigtime in the case of LIMIT. If the unlimited query > returns 4,000 records and I only want 20, you're retrieving 200x too > much data from disk. Sure. That's why we have a planner that distinguishes between startup cost and total cost, and interpolates when a LIMIT is involved. But if this mergesort idea only helps for small-limit cases, that's another restriction on its scope of usefulness... regards, tom lane
On Fri, Mar 14, 2003 at 10:43:30PM -0500, Tom Lane wrote: > Sure. That's why we have a planner that distinguishes between startup > cost and total cost, and interpolates when a LIMIT is involved. But > if this mergesort idea only helps for small-limit cases, that's another > restriction on its scope of usefulness... I don't think so, since even in the non-limit case it avoids having to do a full sort if the number of initial streams is finite and small (as in the case I demonstrated), reducing time complexity to O(N). -- Taral <taral@taral.net> This message is digitally signed. Please PGP encrypt mail to me. "Most parents have better things to do with their time than take care of their children." -- Me
On Fri, Mar 14, 2003 at 14:19:46 -0600, Taral <taral@taral.net> wrote: > Same setup, different query: > > test=> explain select max(time) from test where id = '1'; > NOTICE: QUERY PLAN: > > Aggregate (cost=5084.67..5084.67 rows=1 width=0) > -> Index Scan using idx on test (cost=0.00..5081.33 rows=1333 width=0) > > Since the index is (id, time), why isn't the index being used to > retrieve the maximum value? It looks like an index scan is being done. If the index was on (time, id) instead of (id, time), then you could get a further speed up by rewriting the query as: select time from test where id = '1' order by time desc limit 1;
On Sat, Mar 15, 2003 at 09:23:28AM -0600, Bruno Wolff III wrote: > On Fri, Mar 14, 2003 at 14:19:46 -0600, > Taral <taral@taral.net> wrote: > > Same setup, different query: > > > > test=> explain select max(time) from test where id = '1'; > > NOTICE: QUERY PLAN: > > > > Aggregate (cost=5084.67..5084.67 rows=1 width=0) > > -> Index Scan using idx on test (cost=0.00..5081.33 rows=1333 width=0) > > > > Since the index is (id, time), why isn't the index being used to > > retrieve the maximum value? > > It looks like an index scan is being done. > > If the index was on (time, id) instead of (id, time), then you could get > a further speed up by rewriting the query as: > select time from test where id = '1' order by time desc limit 1; Yes, that's exactly it. It's an index _scan_. It should simply be able to read the maximum straight from the btree. -- Taral <taral@taral.net> This message is digitally signed. Please PGP encrypt mail to me. "Most parents have better things to do with their time than take care of their children." -- Me
On Mon, Mar 17, 2003 at 11:23:47 -0600, Taral <taral@taral.net> wrote: > On Sat, Mar 15, 2003 at 09:23:28AM -0600, Bruno Wolff III wrote: > > On Fri, Mar 14, 2003 at 14:19:46 -0600, > > Taral <taral@taral.net> wrote: > > > Same setup, different query: > > > > > > test=> explain select max(time) from test where id = '1'; > > > NOTICE: QUERY PLAN: > > > > > > Aggregate (cost=5084.67..5084.67 rows=1 width=0) > > > -> Index Scan using idx on test (cost=0.00..5081.33 rows=1333 width=0) > > > > > > Since the index is (id, time), why isn't the index being used to > > > retrieve the maximum value? > > > > It looks like an index scan is being done. > > > > If the index was on (time, id) instead of (id, time), then you could get > > a further speed up by rewriting the query as: > > select time from test where id = '1' order by time desc limit 1; > > Yes, that's exactly it. It's an index _scan_. It should simply be able > to read the maximum straight from the btree. max and min don't use indexes. They are generic aggregate functions and postgres doesn't have the special knowledge to know that for those aggregate functions and index can be used. You can get around this by rewriting the query as I previously indicated. For more details on why things are this way, search the archives. This topic comes up a lot. I was also mistaken about have to switch the index around for this case. It should work the way you have it (if you rewrite the query).
On Mon, Mar 17, 2003 at 11:23:47AM -0600, Taral wrote: > Yes, that's exactly it. It's an index _scan_. It should simply be able > to read the maximum straight from the btree. Still doesn't work, even with rewritten query. It sort a Limit(Sort(Index Scan)), with 1333 rows being pulled from the index. -- Taral <taral@taral.net> This message is digitally signed. Please PGP encrypt mail to me. "Most parents have better things to do with their time than take care of their children." -- Me
AFAIK, there are only 3 general purpose internal sorting techniques that have O(n) behavior: 1= Insertion Sort for "almost sorted" lists. Since "almost sorted" is a fuzzy concept, let's make the first approximation definition that no more than n^(1/2) of the elements can be disordered. There are better definitions in the literature, but I don't remember them off the top of my head. 2= Sorts from the class of Address Calculation, Counting, or Distribution Sort. These need to be able to carry out something more complex than simply a comparison in order to achieve O(n), and therefore have high constants in their execution. For large enough n though, these are the performance kings. 3= Straight Radix Sort where you minimize the number of passes by using a base much greater than two for the the radix. Usually octal or hexidecimal. On a 32b or 64b system, this approach will let you sort in 2 passes. All of the above have potentially nasty trade-offs in comparision to the standard heavily tuned median-of-three quicksort used by the sort unix command. Nonetheless, I could see some value in providing all of these with PostgeSQL (including a decent port of the unix sort routine for the Win folks). I'll note in passing that quicksort's Achille's heel is that it's unstable (all of the rest are stable), which can be a problem in a DB. In the general case there's a few papers out there that state you can sort in O(n) if you can throw O(n^2) space at the problem. That implies to me that for DB's, we are not going to be able to use O(n) algorithms for most of our needs. As for external sorts, everything I've ever read says that some sort of merge technique is used: balanced, multiway, or polyphase. In all cases, I've seen comments to the effect that reading some part of the data into internal buffers, sorting them, and then merging them with already sorted data is the best practice. All of this seems to imply that instead of mergesort (which is stable), PostgreSQL might be better off with the 4 sorts I've listed plus a viciously efficient merge utility for combining partial results that do fit into memory into result files on disk that don't. Or am I crazy? Ron Peacetree
"Ron Peacetree" <rjpeace@earthlink.net> writes: > AFAIK, there are only 3 general purpose internal sorting techniques > that have O(n) behavior: Strictly speaking there are no sorting algorithms that have worst-case time behaviour better than O(nlog(n)). Period. The typical kind-of O(n) sorts involve either special-case inputs (almost sorted), or only work if you ignore some aspect of the input size (radix sort). So, for example, for radix sort the log(n) factor comes precisely from having to have log(n) passes. If you use octal that might go to log(n)/8 but it's still O(log(n)). If you assume your input fields are limited to a fixed size then O() notation loses meaning. You can always sort in linear time by just storing bit flags in a big vector and then scanning your (fixed-size) vector to read out the values. However databases cannot assume fixed size inputs. Regardless of whether it's on a 32 or 64 bit system postgres still has to sort much larger data structures. floats are typically 64 - 96 bytes, bigints can be arbitrarily large. In fact posgres can sort user-defined datatypes as long as they have < and = operators. (Actually I'm not sure on the precise constraint.) Oh, and just to add one last fly in the ointment, postgres has to be able to sort large datasets that don't even fit in memory. That means storing temporary data on disk and minimizing the times data has to move from disk to memory and back. Some alogorithms are better at that than others. -- greg
> -----Original Message----- > From: Greg Stark [mailto:gsstark@mit.edu] > Sent: Monday, April 07, 2003 12:36 PM > To: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] No merge sort? > > > "Ron Peacetree" <rjpeace@earthlink.net> writes: > > > AFAIK, there are only 3 general purpose internal sorting techniques > > that have O(n) behavior: > > Strictly speaking there are no sorting algorithms that have > worst-case time behaviour better than O(nlog(n)). Period. Provably true if the sort uses comparison. Definitely false for distribution sorts or distribution sort hybrids. > The typical kind-of O(n) sorts involve either special-case > inputs (almost sorted), or only work if you ignore some > aspect of the input size (radix sort). There is no need to ignore any aspect of the input with radix sort. Radix sort can also be generalized in the same way that quicksort can. The callback function returns the "bucket" of 'radix' signifigant bits. So, for unsigned char, you just return the 'kth' character when the 'kth' bucket is requested if the radix is 256 for 8 bits. For an example of dealing with other sorts of data, see chapter 13 of "C Unleashed": http://users.powernet.co.uk/eton/unleashed/ > So, for example, for radix sort the log(n) factor comes > precisely from having to have log(n) passes. If you use octal > that might go to log(n)/8 but it's still O(log(n)). You do not have to make log(n) passes at all. In fact, with LSD radix sort, all the buckets can be counted in a single pass. Demonstration in the named book (with source code). > If you assume your input fields are limited to a fixed size > then O() notation loses meaning. You can always sort in > linear time by just storing bit flags in a big vector and > then scanning your (fixed-size) vector to read out the values. > > However databases cannot assume fixed size inputs. Regardless > of whether it's on a 32 or 64 bit system postgres still has > to sort much larger data structures. floats are typically 64 > - 96 bytes, bigints can be arbitrarily large. Floats are generally 8 bytes. If you use hfloat on a VAX, it is 16 bytes. That is the largest hardware floating point in common use. Bigints can be arbitrarily large, but PostgreSQL does not handle anything larger than 16 byte integers so there is no need to worry about larger sizes. A numeric type can be arbitrarily large (though a thousand digits is pure foolishness). However, these numbers can be sorted by standard means. A typical scenario would include an MSD radix sort until the subsets were 200K rows, then Singleton's sort until 50 rows, then insertion sort or perhaps Shellsort. I write sorting routines for a database company and millions of rows can be sorted in just a few seconds. > In fact posgres can sort user-defined datatypes as long as > they have < and = operators. (Actually I'm not sure on the > precise constraint.) > > Oh, and just to add one last fly in the ointment, postgres > has to be able to sort large datasets that don't even fit in > memory. That means storing temporary data on disk and > minimizing the times data has to move from disk to memory and > back. Some alogorithms are better at that than others. A very important advancement in this area is a priority queue based external sort. It allows the use of the fastest possible internal sort and a single pass to read, a single pass to merge, and a single pass to write all of the data. In this regard, it can be enormously faster than even replacement selection. For replacement selection, the advantage is that the runs are twice as long as normal, allowing log(n)/2 passes instead of log(n) passes. However, the priority queue method only has one read pass, one write pass, and one merge pass. It works like this: 1. Read blocks of data into memory and sort them. 2. Write the sorted blocks to disk 3. On the merge phase, we create a priority queue of file pointers, and read the first item from each. The priority queue sorts on the first item in the file list. When we extract an item from the minimal queue, we write it to disk. If the item changes when we read the next item from that file, we adjust the priority. We continue until all the files are empty. The same book cited previously has a model implementation to demonstrate the technique. Since disk I/O totally dominates external sorts, this technique is a titanic win over conventional sorting means. Another simple solution is to memory map the file, but do it in large multi-page blocks instead of mapping the whole thing. > -- > greg dann > ---------------------------(end of > broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to > majordomo@postgresql.org >
Ron Peacetree wrote: > AFAIK, there are only 3 general purpose internal sorting techniques > that have O(n) behavior: Hum? NO "general purpose" sorting technique has O(n) behaviour. The theoretical best scenario, _in general_, is O(n log n). Insertion sort is expected to provide O(n^2) behaviour, and radix-like schemes get arbitrarily memory hungry and have bad pathological results. > All of the above have potentially nasty trade-offs in comparision to > the standard heavily tuned median-of-three quicksort used by the sort > unix command. Nonetheless, I could see some value in providing all of > these with PostgeSQL (including a decent port of the unix sort routine > for the Win folks). I'll note in passing that quicksort's Achille's > heel is that it's unstable (all of the rest are stable), which can be > a problem in a DB. Making one sort algorithm work very efficiently is quite likely to be a lot more effective than frittering away time trying to get some special cases (that you can't regularly use) to work acceptably. > All of this seems to imply that instead of mergesort (which is > stable), PostgreSQL might be better off with the 4 sorts I've listed > plus a viciously efficient merge utility for combining partial results > that do fit into memory into result files on disk that don't. > > Or am I crazy? More than likely. It is highly likely that it will typically take more computational effort to figure out that one of the 4 sorts provided /any/ improvement than any computational effort they would save. That's a /very/ common problem. There's also a fair chance, seen in practice, that the action of collecting additional statistics to improve query optimization will cost more than the savings provided by the optimizations. -- (concatenate 'string "cbbrowne" "@acm.org") http://www3.sympatico.ca/cbbrowne/wp.html When ever in doubt consult a song. --JT Fletcher
On Mon, Apr 07, 2003 at 03:36:10PM -0400, Greg Stark wrote: > "Ron Peacetree" <rjpeace@earthlink.net> writes: > > > AFAIK, there are only 3 general purpose internal sorting techniques > > that have O(n) behavior: > > Strictly speaking there are no sorting algorithms that have worst-case time > behaviour better than O(nlog(n)). Period. > Not true. http://www.elsewhere.org/jargon/html/entry/bogo-sort.html -Jay 'Eraserhead' Felice P.S. <g>
> -----Original Message----- > From: cbbrowne@cbbrowne.com [mailto:cbbrowne@cbbrowne.com] > Sent: Monday, April 07, 2003 1:20 PM > To: Ron Peacetree > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] No merge sort? > > > Ron Peacetree wrote: > > AFAIK, there are only 3 general purpose internal sorting techniques > > that have O(n) behavior: > > Hum? NO "general purpose" sorting technique has O(n) behaviour. > > The theoretical best scenario, _in general_, is O(n log n). The ACM has published papers on several techniques that are O(n log log n) Radix and counting sort can be totally generalized in the same way that comparision sorts can by creating a callback function. If I have strings of 50 characters in length, I can count all RADIX characters in an array of unsigned long counts[RADIX][50] in a single pass. As the key strings become successively larger, radix sorts fall flat compared to comparison techniques, but a counting pass can also determine whether distribution or comparision sorting will be better. > Insertion sort is expected to provide O(n^2) behaviour, and > radix-like schemes get arbitrarily memory hungry and have bad > pathological results. However, the pathological cases can be detected. A typical pathological case for real database work is an array where the initial data is identical for a long distance. This is not unusual in an ORDER BY/GROUP BY scenario. > > All of the above have potentially nasty trade-offs in > comparision to > > the standard heavily tuned median-of-three quicksort used > by the sort > > unix command. Nonetheless, I could see some value in > providing all of > > these with PostgeSQL (including a decent port of the unix > sort routine > > for the Win folks). I'll note in passing that quicksort's > Achille's > > heel is that it's unstable (all of the rest are stable), > which can be > > a problem in a DB. A better sort can make your database dominatingly better than others for some operations. It can be an enormous technical advantage. > Making one sort algorithm work very efficiently is quite > likely to be a lot more effective than frittering away time > trying to get some special cases (that you can't regularly > use) to work acceptably. I don't agree. In fact, this is never true. A decent sorting routine usually involves several techniques. Consider the library qsort routine. Usually, it will use quicksort with several special tests and a randomized median selection until the set size becomes small enough. Then, we switch to insertion sort or shellsort. Or consider the STL and Plauger's library sort routine. It is introspective sort and is a hybrid of three algorithms (quicksort, heapsort, and insertion sort). > > All of this seems to imply that instead of mergesort (which is > > stable), PostgreSQL might be better off with the 4 sorts > I've listed > > plus a viciously efficient merge utility for combining > partial results > > that do fit into memory into result files on disk that don't. > > > > Or am I crazy? > > More than likely. It is highly likely that it will typically > take more computational effort to figure out that one of the > 4 sorts provided /any/ improvement than any computational > effort they would save. It is highly unlikely that this is true. In fact, every good sorting routine does *exactly* this. > That's a /very/ common problem. There's also a fair chance, > seen in practice, that the action of collecting additional > statistics to improve query optimization will cost more than > the savings provided by the optimizations. > -- > (concatenate 'string "cbbrowne" "@acm.org") > http://www3.sympatico.ca/cbbrowne/wp.html > When ever in doubt > consult a song. --JT Fletcher > > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: Have you searched our list archives? > http://archives.postgresql.org
> -----Original Message----- > From: Jason M. Felice [mailto:jfelice@cronosys.com] > Sent: Monday, April 07, 2003 1:10 PM > To: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] No merge sort? > > > On Mon, Apr 07, 2003 at 03:36:10PM -0400, Greg Stark wrote: > > "Ron Peacetree" <rjpeace@earthlink.net> writes: > > > > > AFAIK, there are only 3 general purpose internal sorting > techniques > > > that have O(n) behavior: > > > > Strictly speaking there are no sorting algorithms that have > worst-case > > time behaviour better than O(nlog(n)). Period. > > > > Not true. > http://www.elsewhere.org/jargon/html/entry/bogo-sort.html He said "better than" not "worse than". For comparison based sorting it is _provably_ true that you cannot sort faster than log(n!) which is O(n*(log(n))).
> floats are typically 64 - 96 bytes, bigints can be arbitrarily large. > > Floats are generally 8 bytes. Er, I meant "bits" there. oops. I'm still reading the other stuff. Most of this usually comes down to differences between theory and practice. The constants often matter a whole lot, and the cache behaviour and memory usage can matter a whole lot. quicksort after all is actually O(n^2) worst case... -- greg
On Mon, 2003-04-07 at 13:39, Dann Corbit wrote: > > -----Original Message----- > > From: Jason M. Felice [mailto:jfelice@cronosys.com] > > Sent: Monday, April 07, 2003 1:10 PM > > To: pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] No merge sort? > > > > > > On Mon, Apr 07, 2003 at 03:36:10PM -0400, Greg Stark wrote: > > > "Ron Peacetree" <rjpeace@earthlink.net> writes: > > > > > > > AFAIK, there are only 3 general purpose internal sorting > > techniques > > > > that have O(n) behavior: > > > > > > Strictly speaking there are no sorting algorithms that have > > worst-case > > > time behaviour better than O(nlog(n)). Period. > > > > > > > Not true. > > > http://www.elsewhere.org/jargon/html/entry/bogo-sort.html > > He said "better than" not "worse than". > > For comparison based sorting it is _provably_ true that you cannot sort > faster than log(n!) which is O(n*(log(n))). You didn't read far enough. The 2nd form of the bogo sort is O(1), at least in *one* universe... -- Steve Wampler <swampler@noao.edu> National Solar Observatory
> -----Original Message----- > From: Greg Stark [mailto:gsstark@mit.edu] > Sent: Monday, April 07, 2003 1:50 PM > To: Dann Corbit > Cc: Greg Stark; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] No merge sort? > > > > > floats are typically 64 - 96 bytes, bigints can be > arbitrarily large. > > > > Floats are generally 8 bytes. > > Er, I meant "bits" there. oops. > > I'm still reading the other stuff. > > Most of this usually comes down to differences between theory > and practice. The constants often matter a whole lot, and the > cache behaviour and memory usage can matter a whole lot. > quicksort after all is actually O(n^2) worst case... By the use of a randomized sample of min(n,max(3,log(n))) elements, the probability of choosing the worst case median is so close to zero as to never happen in practice. In real life, a well written quicksort will beat the pants off of heapsort and mergesort. In database work, the disk I/O is the most important issue. Hence, replacement selection or a priority-queue based approach should be used. When sorting data, if all the information fits into memory any good O(n*log(n)) technique is probably good enough. For one million records, n*log2(n) is just 20 million. 20 million comparison and exchange operations are an eye-blink on fast, modern hardware when performed in memory. The thing we should worry about is what happens when disk I/O rears its ugly head. You have a better solution to that situation, and you have a better sorting routine for a database.
> -----Original Message----- > From: Steve Wampler [mailto:swampler@noao.edu] > Sent: Monday, April 07, 2003 1:58 PM > To: Dann Corbit > Cc: Jason M. Felice; Postgres-hackers > Subject: Re: [HACKERS] No merge sort? > On Mon, 2003-04-07 at 13:39, Dann Corbit wrote: > > > -----Original Message----- > > > From: Jason M. Felice [mailto:jfelice@cronosys.com] > > > Sent: Monday, April 07, 2003 1:10 PM > > > To: pgsql-hackers@postgresql.org > > > Subject: Re: [HACKERS] No merge sort? > > > > > > > > > On Mon, Apr 07, 2003 at 03:36:10PM -0400, Greg Stark wrote: > > > > "Ron Peacetree" <rjpeace@earthlink.net> writes: > > > > > > > > > AFAIK, there are only 3 general purpose internal sorting > > > techniques > > > > > that have O(n) behavior: > > > > > > > > Strictly speaking there are no sorting algorithms that have > > > worst-case > > > > time behaviour better than O(nlog(n)). Period. > > > > > > > > > > Not true. > > > > > http://www.elsewhere.org/jargon/html/entry/bogo-sort.html > > > > He said "better than" not "worse than". > > > > For comparison based sorting it is _provably_ true that you cannot > > sort faster than log(n!) which is O(n*(log(n))). > > You didn't read far enough. The 2nd form of the bogo sort is > O(1), at least in *one* universe... An array holding all possible outcomes is an implementation of this algorithm. Quite frankly, it's rather silly (as it is intended to be).
"Dann Corbit" <DCorbit@connx.com> writes: > By the use of a randomized sample of min(n,max(3,log(n))) elements, the > probability of choosing the worst case median is so close to zero as to > never happen in practice. Not exactly. In fact the probability is no different than otherwise, but it won't be repeatable and it won't be on common data sets like pre-sorted data. > When sorting data, if all the information fits into memory any good > O(n*log(n)) technique is probably good enough. For one million records, > n*log2(n) is just 20 million. 20 million comparison and exchange > operations are an eye-blink on fast, modern hardware when performed in > memory. Woah, this is extremely dangerous thinking. For all you know the application needs to execute this sort 200 times per second... Applications that can execute their queries entirely in memory are frequently OLTP applications that need response within a hard upper limit. Often that upper limit is measured in milliseconds... > The thing we should worry about is what happens when disk I/O rears its ugly > head. You have a better solution to that situation, and you have a better > sorting routine for a database. Not "better" just different. When dealing with disk I/O the database has to use an algorithm that doesn't require too much random access reads. Disk is quite fast at reading and writing sequentially but extremely slow and random access. While I/O bandwidth is often the bottleneck on databases it's also a lot easier to deal with. You can make huge stripesets and use super fast i/o controllers, to the point of saturating the machine's internal bus. It's much harder to increase the amount of cpu cycles available. Actually, in my experience cpu has been a bottleneck as often as disk i/o. But my experience is quite heavily skewed to OLTP applications. -- greg
> -----Original Message----- > From: Greg Stark [mailto:gsstark@mit.edu] > Sent: Monday, April 07, 2003 3:06 PM > To: Dann Corbit > Cc: Greg Stark; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] No merge sort? > > "Dann Corbit" <DCorbit@connx.com> writes: > > > By the use of a randomized sample of min(n,max(3,log(n))) elements, > > the probability of choosing the worst case median is so > close to zero > > as to never happen in practice. > > Not exactly. In fact the probability is no different than > otherwise, but it won't be repeatable and it won't be on > common data sets like pre-sorted data. This is wrong. If I choose a single median, based on a pattern (e.g. first/last/middle elements) then the probability is very high that I have made a bad choice. If I choose the median from a sample of 3, then it is less likely that I have made a bad choice. If I choose the median from a sample of 32 elements chosen at random, it becomes extremely unlikely that I have made a stupid choice. If I choose the median based by examination of every element, the chance that I have chosen the wrong pivot is zero. This technique [choosing from several sample pivots from a random sample] clearly makes failure due to perverse cases less likely. Also, a good quicksort algorithm will also make a scan for sequential or reversed data ordering on the partition. By using a sample of many medians, the chances of choosing the wrong median go down. By randomizing the selection, the changes of being fooled by a pattern go down. Most real data has some pattern to it. > > When sorting data, if all the information fits into memory any good > > O(n*log(n)) technique is probably good enough. For one million > > records, > > n*log2(n) is just 20 million. 20 million comparison and exchange > > operations are an eye-blink on fast, modern hardware when > performed in > > memory. > > Woah, this is extremely dangerous thinking. For all you know > the application needs to execute this sort 200 times per second... We are talking about SQL database applications. If they need to sort one million records 20 times per second, they have a serious design flaw, either in the database or the application. > Applications that can execute their queries entirely in > memory are frequently OLTP applications that need response > within a hard upper limit. Often that upper limit is measured > in milliseconds... The choice of internal sort routine (for any of the good ones) will only change the speed by a small, constant factor for a given distribution. For any choice of algorithm, there will be some data sets where any of the alternatives would have been better. These algorithms are all about the same: Quicksort (as modified by Richard Singleton -- the original is useless for serious work) Heapsort Mergesort The heapsort algorithm has the highest constant cost of the three, but never goes perverse and does not require additional memory. The mergesort algorithm requires additional memory and the allocation can fail. This means that it is not certain that it can be performed. The quicksort algorithm has three perverse cases for the standard algorithm (sorted/reversed/organpipe) but all of these can be detected and prevented. > > The thing we should worry about is what happens when disk I/O rears > > its ugly head. You have a better solution to that > situation, and you > > have a better sorting routine for a database. > > Not "better" just different. When dealing with disk I/O the > database has to use an algorithm that doesn't require too > much random access reads. Disk is quite fast at reading and > writing sequentially but extremely slow and random access. > > > While I/O bandwidth is often the bottleneck on databases it's > also a lot easier to deal with. You can make huge stripesets > and use super fast i/o controllers, to the point of > saturating the machine's internal bus. It's much harder to > increase the amount of cpu cycles available. > > Actually, in my experience cpu has been a bottleneck as often > as disk i/o. But my experience is quite heavily skewed to > OLTP applications. > > -- > greg > >
<cbbrowne@cbbrowne.com> wrote in message news:20030407202001.1EC3C58E0D@cbbrowne.com... > Ron Peacetree wrote: > > AFAIK, there are only 3 general purpose internal sorting techniques > > that have O(n) behavior: > > Hum? NO "general purpose" sorting technique has O(n) behaviour. > > The theoretical best scenario, _in general_, is O(n log n). The O(log(n!)) bound is only for comparison based sorts operating on arbitarily disordered input. There are general techniques that can sort in O(n) time if we can "break" either assumption. In a DB, we often are dealing with data that is only slightly disordered, or we are have enough meta knowledge that we can use more powerful ordering operators than simple comparisons, or both. > Insertion sort is expected to provide O(n^2) behaviour, and radix-like > schemes get arbitrarily memory hungry and have bad pathological results. > None of these comments is accurate. The sources for the following discussion are A= Vol 3 of Knuth 2nd ed, (ISBN 0-201-89685-0) B= Sedgewick's Algorithms in C, (0-201-51425-7) C= Handbook of Algorithms and Data Structures 2nd ed by Gonnet and Baeza-Yates. (0-201-41607-7) 1= Insertion sort is O(n) for "almost sorted" input. p103 of Sedgewick. There's also discussion on this in Knuth. 2= Distribution Sort and it's "cousins" which use more powerful ordering operators than simply comparisons are a= reasonably general, and b= O(n). Look in all three references. 3= A proper implementation of straight Radix sort using 8b or 16b at a time a= is NOT pathological, and b= is O(n). Sedgewick is the most blunt about it on p142-143, but Knuth discusses this as well. All of the above are stable, which quicksort is not. There are no "pessimal" inputs for any of the above that will force worst case behavior. For quicksort there are (they are =very= unlikely however). In real world terms, if you can use any of these approaches you should be able to internally sort your data between 2X and 3X faster. Unfortunately, most of us do not have the luxury of working with Memory Resident DB's. In the real world, disk access is an important part of our DB sorting efforts, and that changes things. Very fast internal sorting routines combined with multidisk merging algorithms that minimize overall disk I/O while maximizing the sequential nature of that I/O are the best we can do overall in such a situation.
<cbbrowne@cbbrowne.com> wrote in message news:20030407202001.1EC3C58E0D@cbbrowne.com... > It is highly likely that it will typically take more > computational effort to figure out that one of the 4 sorts provided > /any/ improvement than any computational effort they would save. > > That's a /very/ common problem. There's also a fair chance, seen in > practice, that the action of collecting additional statistics to improve > query optimization will cost more than the savings provided by the > optimizations. > "Back in the Day" I heard similar arguments when discussing whether there should be support for hashing [O(n)], interpolation search [O(lglg(n))], binary search [O(lg(n))], and sequential search [O(n)] or for only some subset of these for a DB system I was working on. To make a long story short, it was worth it to have support for all of them because the "useful domain" of each was reasonably large and reasonably unique compared to the others. I submit a similar situation exists for sorting. (and yes, I've been here before too ;-) Giving end users of PostgreSQL a reasonable set of tools for building high performance applications is just good business.
"Ron Peacetree" <rjpeace@earthlink.net> wrote in message news:%cqka.13403$ey1.1154620@newsread1.prod.itd.earthlink.net... > <cbbrowne@cbbrowne.com> wrote in message > news:20030407202001.1EC3C58E0D@cbbrowne.com... > > It is highly likely that it will typically take more > > computational effort to figure out that one of the 4 sorts provided > > /any/ improvement than any computational effort they would save. > > > > That's a /very/ common problem. There's also a fair chance, seen in > > practice, that the action of collecting additional statistics to > improve > > query optimization will cost more than the savings provided by the > > optimizations. > > > "Back in the Day" I heard similar arguments when discussing whether > there should be support for hashing [O(n)], interpolation search TYPO ALERT: hashing is, of course, O(1)
> -----Original Message----- > From: Ron Peacetree [mailto:rjpeace@earthlink.net] > Sent: Monday, April 07, 2003 6:33 PM > To: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] No merge sort? > > > <cbbrowne@cbbrowne.com> wrote in message > news:20030407202001.1EC3C58E0D@cbbrowne.com... > > Ron Peacetree wrote: > > > AFAIK, there are only 3 general purpose internal sorting > techniques > > > that have O(n) behavior: > > > > Hum? NO "general purpose" sorting technique has O(n) behaviour. > > > > The theoretical best scenario, _in general_, is O(n log n). > The O(log(n!)) bound is only for comparison based sorts > operating on arbitarily disordered input. There are general > techniques that can sort in O(n) time if we can "break" > either assumption. In a DB, we often are dealing with data > that is only slightly disordered, or we are have enough meta > knowledge that we can use more powerful ordering operators > than simple comparisons, or both. > > > > Insertion sort is expected to provide O(n^2) behaviour, and > radix-like > > schemes get arbitrarily memory hungry and have bad pathological > results. > > > None of these comments is accurate. > The sources for the following discussion are > A= Vol 3 of Knuth 2nd ed, (ISBN 0-201-89685-0) > B= Sedgewick's Algorithms in C, (0-201-51425-7) > C= Handbook of Algorithms and Data Structures 2nd ed by > Gonnet and Baeza-Yates. (0-201-41607-7) > > 1= Insertion sort is O(n) for "almost sorted" input. > p103 of Sedgewick. There's also discussion on this in Knuth. > > 2= Distribution Sort and it's "cousins" which use more > powerful ordering operators than simply comparisons are a= > reasonably general, and b= O(n). Look in all three references. > > 3= A proper implementation of straight Radix sort using 8b or > 16b at a time a= is NOT pathological, and b= is O(n). > Sedgewick is the most blunt about it on p142-143, but Knuth > discusses this as well. There can be a problem here with 2 & 3. Imagine the following query: SELECT company, division, plant, sum(revenue) FROM corporate GROUP BY company, division, plant ORDER BY company, division, plant Let's suppose further that company, division, and plant are all char 20. For a given section of company + division, the first 40 characters will be the same. So a radix or distribution sort will need 40 passes over the identical data before it even gets to the differentiating field plant. Now, a comparison sort will be O(n*log(n)). The counting passes alone for MSD radix sort will be 40 passes * N. How large will n have to be before Radix sort outperforms a comparison based sort? 2^40 = 1,099,511,627,776 About one trillion items appears to be break even. If they are fixed width, we could use LSD radix sort and count all the bins in one pass. But that technique will fail with varchar. Naturally, most long character fields will be varchar. The basic upshot is that radix and distribution sorts work best with short keys. > All of the above are stable, which quicksort is not. There > are no "pessimal" inputs for any of the above that will force > worst case behavior. For quicksort there are (they are > =very= unlikely however). In real world terms, if you can use > any of these approaches you should be able to internally sort > your data between 2X and 3X faster. > > > Unfortunately, most of us do not have the luxury of working > with Memory Resident DB's. In the real world, disk access is > an important part of our DB sorting efforts, and that changes > things. Very fast internal sorting routines combined with > multidisk merging algorithms that minimize overall disk I/O > while maximizing the sequential nature of that I/O are the > best we can do overall in such a situation. > > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
""Dann Corbit"" <DCorbit@connx.com> wrote in message news:D90A5A6C612A39408103E6ECDD77B829408AC4@voyager.corporate.connx.co m... > There can be a problem here Distribution & Radix sorts. Imagine the following query: > SELECT company, division, plant, sum(revenue) FROM corporate GROUP BY > company, division, plant ORDER BY company, division, plant > > Let's suppose further that company, division, and plant are all char 20. > For a given section of company + division, the first 40 characters will > be the same. So a radix or distribution sort will need 40 passes over > the identical data before it even gets to the differentiating field > plant. > 1= Distribution sort does not have this problem. Any data that can be constrained by the "pigeon hole principle" can be sorted in o(n) time. 2= For Radis sort, that's iff ("if and only if") you use =characters= as the radix of a radix sort and do a MSB aka partition-exchange sort. The appropriate radix here is a =field= not a character. Since there are 3 fields vs 60 characters the problem becomes 2/3 wasted passes instead of 40/60. We can do even better by making one field for company+division and one for plant. Now we have no wasted passes and an O(n) sort. Since PostgreSQL has wonderful built in support for string handling and regexp, this is clearly an easy to implement and superior approach. Since a comparison based sorting algorithm would be best implemented by working on the same atomic units, the fair comparison becomes the log(n!) behavior of that sort vs the O(n) behavior of Distribution or Radix sort on data of that atomic size. The constants involved in each ordering operation are bigger for the two O(n) sorts, but despite that you'll find that for reasonable sizes of real world data, you'll complete the O(n) sorts 2-3X faster. Of course, internal sorting performance is vastly dominated by disk I/O costs if there is even a moderate amount disk I/O involved. However, there are ways to combine decent internal sorting with efficient disk merging, getting the best possible out of both subsystems. > If they are fixed width, we could use LSD radix sort and count all the > bins in one pass. But that technique will fail with varchar. > Naturally, most long character fields will be varchar. The basic upshot > is that radix and distribution sorts work best with short keys. > More accurate would be say that such searches work best with keys that have as little in common as possible to each other. Length isn't the problem, =the number of unique keys= and having few equal keys is. Large quantities of equal keys can trash the performance of many sorting algorithms, including quicksort, unless steps are taken to either modify the algorithm to handle it or the data to minimize such an occurrence.
> -----Original Message----- > From: Ron Peacetree [mailto:rjpeace@earthlink.net] > Sent: Tuesday, April 08, 2003 3:07 PM > To: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] No merge sort? > > > ""Dann Corbit"" <DCorbit@connx.com> wrote in message > news:D90A5A6C612A39408103E6ECDD77B829408AC4@voyager.corporate.connx.co > m... > > There can be a problem here Distribution & Radix sorts. Imagine the > following query: > > SELECT company, division, plant, sum(revenue) FROM corporate GROUP > BY > > company, division, plant ORDER BY company, division, plant > > > > Let's suppose further that company, division, and plant are all char > 20. > > For a given section of company + division, the first 40 characters > will > > be the same. So a radix or distribution sort will need 40 passes > over > > the identical data before it even gets to the differentiating field > > plant. > > > 1= Distribution sort does not have this problem. Any data > that can be constrained by the "pigeon hole principle" can be > sorted in o(n) time. Distribution sort is not an algorithm. It is a general technique. Both counting sort and radix sort are distribution sorts. I think you are talking about counting sort here. In order to insert a technique into a database, you must solve the general case. >2= For Radis sort, that's iff ("if and > only if") you use =characters= as the radix of a radix sort > and do a MSB aka partition-exchange sort. The appropriate > radix here is a =field= not a character. Since there are 3 > fields vs 60 characters the problem becomes 2/3 wasted passes > instead of 40/60. This is lunacy. If you count the field or use it as a radix, you will need a radix of 40*8 bits = 320 bits. That means you will have 2^320 = 2.136e96 bins. >We can do even better by making one field for > company+division and one for plant. Now we have no wasted passes and > an O(n) sort. Since PostgreSQL has wonderful built in > support for string handling and regexp, this is clearly an > easy to implement and superior approach. You have no idea what you are talking about. > Since a comparison based sorting algorithm would be best > implemented by working on the same atomic units, the fair > comparison becomes the > log(n!) behavior of that sort vs the O(n) behavior of > Distribution or Radix sort on data of that atomic size. The > constants involved in each ordering operation are bigger for > the two O(n) sorts, but despite that you'll find that for > reasonable sizes of real world data, you'll complete the O(n) > sorts 2-3X faster. With 2e96 * 4 space requirement (assuming that 4 billion is the largest row count allowed in the database). If we could build a memory that large, how long would it take to reach the last bin? There are an estimated 10^80 elementary particles in the universe. > Of course, internal sorting performance is vastly dominated > by disk I/O costs if there is even a moderate amount disk I/O > involved. However, there are ways to combine decent internal > sorting with efficient disk merging, getting the best > possible out of both subsystems. You are confusing terms here. An internal sort is performed 100% in memory. It is an external sort that uses disk space. > > If they are fixed width, we could use LSD radix sort and count all > the > > bins in one pass. But that technique will fail with varchar. > > Naturally, most long character fields will be varchar. The basic > upshot > > is that radix and distribution sorts work best with short keys. > > > More accurate would be say that such searches work best with > keys that have as little in common as possible to each other. > Length isn't the problem, =the number of unique keys= and > having few equal keys is. Large quantities of equal keys can > trash the performance of many sorting algorithms, including > quicksort, unless steps are taken to either modify the > algorithm to handle it or the data to minimize such an occurrence. I would guess that you have never done any sorting work. > ---------------------------(end of > broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to > majordomo@postgresql.org >
""Dann Corbit"" <DCorbit@connx.com> wrote in message news:D90A5A6C612A39408103E6ECDD77B829408AC6@voyager.corporate.connx.co m... > Distribution sort is not an algorithm. It is a general technique. Both > counting sort and radix sort are distribution sorts. I think you are > talking about counting sort here. In order to insert a technique into a > database, you must solve the general case. > Terminology problem. Please look in the references I posted. > > 2= For Radix sort, that's iff ("if and only if") you use > > =characters= as the radix of a radix sort > > and do a MSB aka partition-exchange sort. The appropriate > > radix here is a =3Dfield=3D not a character. Since there are 3 > > fields vs 60 characters the problem becomes 2/3 wasted passes > > instead of 40/60. > > This is lunacy. If you count the field or use it as a radix, you will > need a radix of 40*8 bits= 320 bits. That means you will have 2^320 > 2.136e96 bins. > You only need as many bins as you have unique key values silly ;-) Remember, at its core Radix sort is just a distribution counting sort (that's the name I learned for the general technique). The simplest implementation uses bits as the atomic unit, but there's nothing saying you have to... In a DB, we know all the values of the fields we currently have stored in the DB. We can take advantage of that. As you correctly implied, the goal is to minimize the number of unique bins you have to, err, distribute, items into. That and having as few duplicates as feasible are the important points. If, for example, we have <= 255 possible values for the each radix (Using your previous example, let's say you have <= 255 unique values for the combined field "Company+Division" in the DB and the same for "Plant" ), and 4 sets of radix to sort over, it doesn't matter if we are sorting a 32b quantity using a radix of 8b, or a 4 field quantity using a radix of one field (TBF, we should use key and index techniques to minimize the amount of actual data we retrieve and manipulate from disk during the sort. We want to minimize disk I/O, particularly seeking disk I/O). The situations are analogous. Note TANSTAAFL (as Heinlein would've said): We have to use extra space for mapping the radix values to the radix keys, and our "inner loop" for the sorting operation is considerably more complicated than that for a quicksort or a mergesort. Hence the fact that even though this is an O(n) technique, in real world terms you can expect only a 2-3X performance improvement over say quicksort for realistic amounts of data. Oh and FTR, IME for most "interesting" sorts in the DB domain, even for "internal" sorting techniques, the time to read data from disk and (possibly) write results back to disk tends to dominate the time spent actually doing the internal sort... I learned tricks like this 20 years ago. I thought this stuff was part of general lore in the DB field... *shrug*
> -----Original Message----- > From: Ron Peacetree [mailto:rjpeace@earthlink.net] > Sent: Wednesday, April 09, 2003 12:38 PM > To: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] No merge sort? > > > ""Dann Corbit"" <DCorbit@connx.com> wrote in message > news:D90A5A6C612A39408103E6ECDD77B829408AC6@voyager.corporate.connx.co > m... > > Distribution sort is not an algorithm. It is a general technique. > Both > > counting sort and radix sort are distribution sorts. I think you > are > > talking about counting sort here. In order to insert a technique > into a > > database, you must solve the general case. > > > Terminology problem. Please look in the references I posted. > > > > > 2= For Radix sort, that's iff ("if and only if") you use > > > =characters= as the radix of a radix sort and do a MSB aka > > > partition-exchange sort. The appropriate radix here is a > =3Dfield=3D > > > not a character. Since there are 3 fields vs 60 characters the > > > problem becomes 2/3 wasted passes instead of 40/60. > > > > This is lunacy. If you count the field or use it as a radix, you > will > > need a radix of 40*8 bits= 320 bits. That means you will > have 2^320 > > 2.136e96 bins. > > > You only need as many bins as you have unique key values > silly ;-) Remember, at its core Radix sort is just a > distribution counting sort (that's the name I learned for the > general technique). The simplest implementation uses bits as > the atomic unit, but there's nothing saying you have to... > In a DB, we know all the values of the fields we currently > have stored in the DB. We can take advantage of that. By what magic do we know this? If a database knew a-priori what all the distinct values were, it would indeed be excellent magic. > As you correctly implied, the goal is to minimize the number > of unique bins you have to, err, distribute, items into. > That and having as few duplicates as feasible are the > important points. > > If, for example, we have <= 255 possible values for the each > radix (Using your previous example, let's say you have <= 255 > unique values for the combined field "Company+Division" in > the DB and the same for "Plant" ), and 4 sets of radix to > sort over, it doesn't matter if we are sorting a 32b quantity > using a radix of 8b, or a 4 field quantity using a radix of > one field (TBF, we should use key and index techniques to > minimize the amount of actual data we retrieve and manipulate > from disk during the sort. We want to minimize disk I/O, > particularly seeking disk I/O). The situations are analogous. With 80 bytes, you have 2^320 possible values. There is no way around that. If you are going to count them or use them as a radix, you will have to classify them. The only way you will know how many unique values you have in "Company+Division" is to ... Either sort them or by some other means discover all that are distinct. The database does not know how many there are beforehand. Indeed, there could be anywhere from zero to 2^320 (given enough space) distinct values. I would be interested to see your algorithm that performs a counting or radix sort on 320 bit bins and that works in the general case without using extra space. > Note TANSTAAFL (as Heinlein would've said): We have to use > extra space for mapping the radix values to the radix keys, > and our "inner loop" for the sorting operation is > considerably more complicated than that for a quicksort or a > mergesort. Hence the fact that even though this is an O(n) > technique, in real world terms you can expect only a 2-3X > performance improvement over say quicksort for realistic > amounts of data. > > Oh and FTR, IME for most "interesting" sorts in the DB > domain, even for "internal" sorting techniques, the time to > read data from disk and > (possibly) write results back to disk tends to dominate the > time spent actually doing the internal sort... > > I learned tricks like this 20 years ago. I thought this > stuff was part of general lore in the DB field... *shrug* As my grandfather used to say: "Picklesmoke." > ---------------------------(end of > broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to > majordomo@postgresql.org >
"Ron Peacetree" <rjpeace@earthlink.net> wrote in message news:uqbma.22946$4P1.2023021@newsread2.prod.itd.earthlink.net... > If we are using a mapping of <= 2^16 keys, the extra space in > question is ~2.5MB. EDIT: add "...for 320b keys"
""Dann Corbit"" <DCorbit@connx.com> wrote in message news:D90A5A6C612A39408103E6ECDD77B829408ACB@voyager.corporate.connx.co m... > > From: Ron Peacetree [mailto:rjpeace@earthlink.net]=20 > > Sent: Wednesday, April 09, 2003 12:38 PM > > You only need as many bins as you have unique key values=20 > > silly ;-) Remember, at its core Radix sort is just a=20 > > distribution counting sort (that's the name I learned for the=20 > > general technique). The simplest implementation uses bits as=20 > > the atomic unit, but there's nothing saying you have to...=20=20 > > In a DB, we know all the values of the fields we currently=20 > > have stored in the DB. We can take advantage of that. > > By what magic do we know this? If a database knew a-priori what all the > distinct values were, it would indeed be excellent magic. For any table already in the DB, this is self evident. If you are talking about sorting data =before= it gets put into the DB (say for a load), then things are different, and the best you can do is used comparison based methods (and probably some reasonably sophisticated external merging routines in addition if the data set to be sorted and loaded is big enough). The original question as I understood it was about a sort as part of a query. That means everything to be sorted is in the DB, and we can take advantage of what we know. > With 80 bytes, you have 2^320 possible values. There is no way > around that. If you are going to count them or use them as a > radix, you will have to classify them. The only way you will > know how many unique values you have in > "Company+Division" is to ... > Either sort them or by some means discover all that are distinct Ummm, Indexes, particularly Primary Key Indexes, anyone? Finding the unique values in an index should be perceived as trivial... ...and you often have the index in memory for other reasons already. > . The database does not know how many there are > beforehand. Indeed, there could be anywhere > from zero to 2^320 (given enough space) distinct values. > > I would be interested to see your algorithm that > performs a counting or radix sort on 320 bit bins and that > works in the general case without using extra space. > 1= See below. I clearly stated that we need to use extra space 2= If your definition of "general" is "we know nothing about the data", then of course any method based on using more sophisticated ordering operators than comparisons is severely hampered, if not doomed. This is not a comparison based method. You have to know some things about the data being sorted before you can use it. I've been =very= clear about that throughout this. > > Note TANSTAAFL (as Heinlein would've said): We have to use=20 > > extra space for mapping the radix values to the radix keys,=20 > > and our "inner loop" for the sorting operation is=20 > > considerably more complicated than that for a quicksort or a=20 > > mergesort. Hence the fact that even though this is an O(n)=20 > > technique, in real world terms you can expect only a 2-3X=20 > > performance improvement over say quicksort for realistic=20 > > amounts of data. > >=20 > > Oh and FTR, IME for most "interesting" sorts in the DB=20 > > domain, even for "internal" sorting techniques, the time to=20 > > read data from disk and > > (possibly) write results back to disk tends to dominate the=20 > > time spent actually doing the internal sort... > >=20 Since you don't believe me, and have implied you wouldn't believe me even if I posted results of efforts on my part, go sort some 2GB files by implementing the algorithms in the sources I've given and as mentioned in this thread. (I'm assuming that you have access to "real" machines that can perform this as an internal sort, or we wouldn't be bothering to have this discussion). Then come back to the table if you still think there are open issues to be discussed.
"Ron Peacetree" <rjpeace@earthlink.net> wrote in message news:EHIla.19833$ey1.1702921@newsread1.prod.itd.earthlink.net... > ""Dann Corbit"" <DCorbit@connx.com> wrote in message > news:D90A5A6C612A39408103E6ECDD77B829408ACB@voyager.corporate.connx.co > m... > > With 80 bytes, you have 2^320 possible values. There is no way > > around that. If you are going to count them or use them as a > > radix, you will have to classify them. The only way you will > > know how many unique values you have in > > "Company+Division" is to ... > > Either sort them or by some means discover all that are distinct > Ummm, Indexes, particularly Primary Key Indexes, anyone? Finding the > unique values in an index should be perceived as trivial... ...and you > often have the index in memory for other reasons already. > Interesting Note: DB2 and Oracle have switches that turn on a table feature that keeps track of all the unique values of a field and a counter for how often each of those unique values occurs. The implications for speeding up non write querys involving those tables should be obvious (again, TANSTAAFL: writes are now going to have the extra overhead of updating this information)... Wonder how hard this would be to put into PostgreSQL?
""Dann Corbit"" <DCorbit@connx.com> wrote in message news:D90A5A6C612A39408103E6ECDD77B829408ACB@voyager.corporate.connx.co m... > I would be interested to see your algorithm that performs a counting > or radix sort on 320 bit bins and that works in the general case > without using extra space. > Since I can't be sure of my audience, I'm writing for the widest reasonably possible audience, and I apologize in advance to those who know most of this. First, let's use some examples for the data to be sorted. I'll consider two cases, 1= each 320b entity is a row, and 2= the original example where there are 3 fields per row; the first being 320b, with each of the others being 160b. For case 1, a 2GB file has 2^34/320= 53,687,091 records. Let's call it 50x10^6 to keep things simple. Case 2 is 26,843,545 records. Let's call that 25x10^6. Let's examine the non space issues first, then explore space requirements. The most space and time efficient comparison based sorting algorithm (median-of-three partitioning Qsort with Isort used when almost sorted) will on average execute in ~(32/3)*n*ln(n) tyme units (~9.455x10^9 for case 1; ~4.543x10^9 for case 2). The use of "tyme" here is not a typo or a spice. It's a necessary abstraction because real time units are very HW and SW context dependant. Now let's consider a radix address calculation sort based approach. The best implementation does a Rsort on the MSBs of the data to the point of guaranteeing that the data is "almost sorted" then uses Isort as a "finishing touch". The Rsort part of the algorithm will execute on average in 11*p*n-n+O(p*m) tyme units, where p is the number of passes used and m is the number of unique states in the radix (2^16 for a 16b radix in which we can't prune any possibilities). For a 320b data value that we know nothing about, we must consider all possible values for any length radix we use. The function 11*p*n-n+O(p*m) has a minimum when we use a 16b radix: 11*20*50x10^6-50x10^6+O(20*2^16)= ~10.951x10^9 tyme units. If we use enough Rsort passes to get the data "almost ordered" and then use Isort to finish things we can do a bit better. Since Qsort needs a average of ~9.455x10^9 tyme units to sort the same data. It's not clear that the effort to code Rsort is worth it for this case. However, look what happens if we can assume that the number of distinct key values is small enough that we can sort the data in a relatively few number of passes: if the 320b key has <= 2^16 distinct values, we can sort them using one pass in ~500x10^6+O(2^16) tyme units. Even noting that O(2^16) has just become much larger since we are now manipulating key values rather than bits, this is stupendously better than Qsort and there should be no question that Rsort is worth considering in such cases. FTR, this shows that my naive (AKA "I didn't do the math") instinct was wrong. This technique is FAR better than I originally was estimating under these circumstances. As long as we can store a "map" of unique keys in a relatively small amount of space and use relatively few radix passes, this technique is far better than comparison based sorting methods. Said mapping is easily derived from table indexes, and some DB systems (DB2 and Oracle) will generate and save this data for you as part of their set of optimization tweaks. Even if none of that is true, an extra full scan of a memory resident data set to derive this information is cheap if the gamble pays off and you find out that the number of distinct key values is a small subset of the potential universe of said. Now let's consider that "extra space". At the least Qsort is going to need extra space proportional to lg(n) for its call tree (explicit or system level to support recursion). If we are using a 16b radix, the extra space in question in 64KB. In either case, it's in the noise. If we are using a mapping of <= 2^16 keys, the extra space in question is ~2.5MB. Considering we are sorting an ~2GB file and the incredible improvement in running time, this seems to be a very worthwhile investment. This would be worth adding support for to PostgreSQL.
My 2 cents worth. The issue in my mind is that qsort cannot sort while the data is being read off the disk. By using a balenced tree the datacan be sorted into partitions while it is being read, then these partitions can be merged. IMHO this will give the best performance. I wrote the code to do this years ago and if people want to play with it I suppose I can GPL it. A language conversion wouldbe required because I didn't have C available - but the code is runnable in its present form. Last time I benched it was against the DEC system sort routines on a VAX 7800 and it ran quite a bit faster... and this isin spite of the fact that I had to do many rather UGLY things to fit this code into the constraints I had... IE - it hadto run under 5 operating systems including MS-DOS and like I said - it <unfortunatly> Isn't written in C.
Ron Peacetree wrote: > ""Dann Corbit"" <DCorbit@connx.com> wrote in message > news:D90A5A6C612A39408103E6ECDD77B829408ACB@voyager.corporate.connx.co > m... > > > From: Ron Peacetree [mailto:rjpeace@earthlink.net]=20 > > > Sent: Wednesday, April 09, 2003 12:38 PM > > > You only need as many bins as you have unique key values=20 > > > silly ;-) Remember, at its core Radix sort is just a=20 > > > distribution counting sort (that's the name I learned for the=20 > > > general technique). The simplest implementation uses bits as=20 > > > the atomic unit, but there's nothing saying you have to...=20=20 > > > In a DB, we know all the values of the fields we currently=20 > > > have stored in the DB. We can take advantage of that. > > > > By what magic do we know this? If a database knew a-priori what all > the > > distinct values were, it would indeed be excellent magic. > For any table already in the DB, this is self evident. If you are > talking about sorting data =before= it gets put into the DB (say for a > load), then things are different, and the best you can do is used > comparison based methods (and probably some reasonably sophisticated > external merging routines in addition if the data set to be sorted and > loaded is big enough). The original question as I understood it was > about a sort as part of a query. That means everything to be sorted > is in the DB, and we can take advantage of what we know. > > > With 80 bytes, you have 2^320 possible values. There is no way > > around that. If you are going to count them or use them as a > > radix, you will have to classify them. The only way you will > > know how many unique values you have in > > "Company+Division" is to ... > > Either sort them or by some means discover all that are distinct > Ummm, Indexes, particularly Primary Key Indexes, anyone? Finding the > unique values in an index should be perceived as trivial... ...and you > often have the index in memory for other reasons already. But this does not remove the fact that a radix sort on an 80 byte field requires O(2^320) space, that is, O(N), where N is the size of the state space of the key... Perhaps you need to reread Gonnet; it tells you that... -- output = reverse("moc.enworbbc@" "enworbbc") http://www3.sympatico.ca/cbbrowne/multiplexor.html Rules of the Evil Overlord #28. "My pet monster will be kept in a secure cage from which it cannot escape and into which I could not accidentally stumble." <http://www.eviloverlord.com/>
> -----Original Message----- > From: Ron Peacetree [mailto:rjpeace@earthlink.net] > Sent: Friday, April 11, 2003 5:02 PM > To: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] No merge sort? > > > ""Dann Corbit"" <DCorbit@connx.com> wrote in message > news:D90A5A6C612A39408103E6ECDD77B829408ACB@voyager.corporate.connx.co > m... > > > From: Ron Peacetree [mailto:rjpeace@earthlink.net]=20 > > > Sent: Wednesday, April 09, 2003 12:38 PM > > > You only need as many bins as you have unique key values=20 silly > > > ;-) Remember, at its core Radix sort is just a=20 distribution > > > counting sort (that's the name I learned for the=20 general > > > technique). The simplest implementation uses bits as=20 > the atomic > > > unit, but there's nothing saying you have to...=20=20 In a DB, we > > > know all the values of the fields we currently=20 have > stored in the > > > DB. We can take advantage of that. > > > > By what magic do we know this? If a database knew a-priori what all > the > > distinct values were, it would indeed be excellent magic. > For any table already in the DB, this is self evident. If > you are talking about sorting data =before= it gets put into > the DB (say for a load), then things are different, and the > best you can do is used comparison based methods (and > probably some reasonably sophisticated external merging > routines in addition if the data set to be sorted and loaded > is big enough). The original question as I understood it was > about a sort as part of a query. That means everything to be > sorted is in the DB, and we can take advantage of what we know. The database does not know all the distinct values that it contains. > > With 80 bytes, you have 2^320 possible values. There is no > way around > > that. If you are going to count them or use them as a > radix, you will > > have to classify them. The only way you will know how many unique > > values you have in "Company+Division" is to ... > > Either sort them or by some means discover all that are distinct > Ummm, Indexes, particularly Primary Key Indexes, anyone? > Finding the unique values in an index should be perceived as > trivial... ...and you often have the index in memory for > other reasons already. If we have a unique clustered primary key index, why bother to sort? The order by will be faster without sorting. > > . The database does not know how many there are > > beforehand. Indeed, there could be anywhere > > from zero to 2^320 (given enough space) distinct values. > > > > I would be interested to see your algorithm that > > performs a counting or radix sort on 320 bit bins and that works in > > the general case without using extra space. > > > 1= See below. I clearly stated that we need to use extra > space 2= If your definition of "general" is "we know nothing > about the data", then of course any method based on using > more sophisticated ordering operators than comparisons is > severely hampered, if not doomed. This is not a comparison > based method. You have to know some things about the data > being sorted before you can use it. I've been =very= clear > about that throughout this. > > > > > Note TANSTAAFL (as Heinlein would've said): We have to use=20 > > >extra space for mapping the radix values to the radix > keys,=20 and > > >our "inner loop" for the sorting operation is=20 > considerably more > > >complicated than that for a quicksort or a=20 mergesort. > Hence the > > >fact that even though this is an O(n)=20 technique, in real world > > >terms you can expect only a 2-3X=20 performance > improvement over say > > >quicksort for realistic=20 amounts of data. > > >=20 > > > Oh and FTR, IME for most "interesting" sorts in the DB=20 > > > domain, even for "internal" sorting techniques, the time to=20 > > > read data from disk and > > > (possibly) write results back to disk tends to dominate the=20 > > > time spent actually doing the internal sort... > > >=20 > Since you don't believe me, and have implied you wouldn't > believe me even if I posted results of efforts on my part, go > sort some 2GB files by implementing the algorithms in the > sources I've given and as mentioned in this thread. (I'm > assuming that you have access to "real" machines that can > perform this as an internal sort, or we wouldn't be bothering > to have this discussion). Then come back to the table if you > still think there are open issues to be discussed. I don't believe you and I am not going to try to code a sort that does not work [and is demonstrably impossible]. But I will tell you what I will do. I will post the binary for a Win32 sorting routine that I wrote which will sort files of any size. If you can write a tool that sorts faster (and does not rely on a primary key which would render the task of sorting moot) then I will believe you. Here is the routine: ftp://cap.connx.com/chess-engines/new-approach/External.exe The syntax to sort a text file with it is: External <input_file> <output_file> <internal_sorting_buffer_size> So (for instance) to sort a 100 gigabyte file with 1 gig of physical memory allowed for the sort, the syntax would be: External 100gig.dat 100gig.sor 1000000000