Re: qsort again (was Re: [PERFORM] Strange Create - Mailing list pgsql-hackers
From | Ron |
---|---|
Subject | Re: qsort again (was Re: [PERFORM] Strange Create |
Date | |
Msg-id | 7.0.1.0.2.20060216100954.035b5ee0@earthlink.net Whole thread Raw |
In response to | Re: qsort again (was Re: [PERFORM] Strange Create Index (Martijn van Oosterhout <kleptog@svana.org>) |
Responses |
Re: qsort again (was Re: [PERFORM] Strange Create
Re: qsort again (was Re: [PERFORM] Strange Create |
List | pgsql-hackers |
At 09:48 AM 2/16/2006, Martijn van Oosterhout wrote: >On Thu, Feb 16, 2006 at 08:22:55AM -0500, Ron wrote: > > 3= Especially in modern systems where the gap between internal CPU > > bandwidth and memory bandwidth is so great, the overhead of memory > > accesses for comparisons and moves is the majority of the overhead > > for both the pivot choosing and the partitioning algorithms within > > quicksort. Particularly random memory accesses. The reason (#GPRs - > > 1) is a magic constant is that it's the most you can compare and move > > using only register-to-register operations. > >But how much of this applies to us? We're not sorting arrays of >integers, we're sorting pointers to tuples. So while moves cost very >little, a comparison costs hundreds, maybe thousands of cycles. A tuple >can easily be two or three cachelines and you're probably going to >access all of it, not to mention the Fmgr structures and the Datums >themselves. Pointers are simply fixed size 32b or 64b quantities. They are essentially integers. Comparing and moving pointers or fixed size keys to those pointers is exactly the same problem as comparing and moving integers. Comparing =or= moving the actual data structures is a much more expensive and variable cost proposition. I'm sure that pg's sort functionality is written intelligently enough that the only real data moves are done in a final pass after the exact desired order has been found using pointer compares and (re)assignments during the sorting process. That's a standard technique for sorting data whose "key" or pointer is much smaller than a datum. Your cost comment basically agrees with mine regarding the cost of random memory accesses. The good news is that the number of datums to be examined during the pivot choosing process is small enough that the datums can fit into CPU cache while the pointers to them can be assigned to registers: making pivot choosing +very+ fast when done correctly. As you've noted, actual partitioning is going to be more expensive since it involves accessing enough actual datums that they can't all fit into CPU cache. The good news is that QuickSort has a very sequential access pattern within its inner loop. So while we must go to memory for compares, we are at least keeping the cost for it down it a minimum. In addition, said access is nice enough to be very prefetch and CPU cache hierarchy friendly. >None of this is cache friendly. The actual tuples themselves could be >spread all over memory (I don't think any particular effort is expended >trying to minimize fragmentation). It probably would be worth it to spend some effort on memory layout just as we do for HD layout. >Do these algorithms discuss the case where a comparison is more than >1000 times the cost of a move? A move is always more expensive than a compare when the datum is larger than its pointer or key. A move is always more expensive than a compare when it involves memory to memory movement rather than CPU location to CPU location movement. A move is especially more expensive than a compare when it involves both factors. Most moves do involve both. What I suspect you meant is that a key comparison that involves accessing the data in memory is more expensive than reassigning the pointers associated with those keys. That is certainly true. Yes. The problem has been extensively studied. ;-) >Where this does become interesting is where we can convert a datum to >an integer such that if f(A) > f(B) then A > B. Then we can sort on >f(X) first with just integer comparisons and then do a full tuple >comparison only if f(A) = f(B). This would be much more cache-coherent >and make these algorithms much more applicable in my mind. In fact we can do better. Using hash codes or what-not to map datums to keys and then sorting just the keys and the pointers to those datums followed by an optional final pass where we do the actual data movement is also a standard technique for handling large data structures. Regardless of what tweaks beyond the basic algorithms we use, the algorithms themselves have been well studied and their performance well established. QuickSort is the best performing of the O(nlgn) comparison based sorts and it uses less resources than HeapSort or MergeSort. Ron
pgsql-hackers by date: