Re: Merge algorithms for large numbers of "tapes" - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Merge algorithms for large numbers of "tapes" |
Date | |
Msg-id | 1141823356.27729.689.camel@localhost.localdomain Whole thread Raw |
In response to | Merge algorithms for large numbers of "tapes" (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Merge algorithms for large numbers of "tapes"
|
List | pgsql-hackers |
On Tue, 2006-03-07 at 18:14 -0500, Tom Lane wrote: > BTW, I was just looking over Knuth's discussion of sorting again, and > realized that there is still something more that could be done within > the existing sort framework. We currently use standard polyphase merge > (his Algorithm 5.4.2D), which IIRC I chose because it was simple and > for relatively small numbers of tapes T it was about as good as anything > else. Knuth spends a great deal of energy on minimizing tape rewind > time which of course is of no interest to us, and I had supposed that > all of his more-complex algorithms were really only of interest if you > needed to consider rewind time. However, now that we've changed the > code to prefer large numbers of tapes, it's not at all clear that > Algorithm D is still the right one to use. In particular I'm looking at > cascade merge, Algorithm 5.4.3C, which appears to use significantly > fewer passes when T is large. Ah! Well spotted. Yeh, looks like it will improve performance a good deal. So, yes, definitely a TODO item. > Do you want to try that? The Cascade Merge re-writes the way logical tapes are selected and how the runs are merged. It doesn't seem to do anything at all about the run-forming, which would still use heapsort. So the only effect is when we have more runs than "tapes", so for the limits of where we would begin noticing any benefit would be: work_mem= 1 GB benefit at 8 TB work_mem= 256MB benefit at 0.5 TB work_mem= 8MB benefit at 256 MB work_mem= 1MB benefit at 12 MB (min 7 tapes). (based upon runs on average twice size of memory, and each logical tape requiring 256KB memory, i.e. min(work_mem/4, 6) * work_mem * 2, which for work_mem > 2 MB gives 0.5 * work_mem^2) Which means the benefit we get is when we have for some reason been unable to give the sort enough space, or not set parameters correctly. So, still a concern...but makes me think about 2 other issues first: 1. Earlier we had some results that showed that the heapsorts got slower when work_mem was higher and that concerns me most of all right now. It's possible you'll have reduced that considerably with the pull-out-the-first-attr patch. I'll look into some test results to show that has gone away. We also have Nyberg et al telling us that as of 1994 they established that heapsort would always be slower than qsort, as a result of CPU cache locality improvements. An improvement here would effect all sorts > work_mem. 2. Improvement in the way we do overall memory allocation, so we would not have the problem of undersetting work_mem that we currently experience. If we solved this problem we would have faster sorts in *all* cases, not just extremely large ones. Dynamically setting work_mem higher when possible would be very useful. I've looked at this a few times and have some suggestions, but perhaps its worth asking for ideas in this area? Best Regards, Simon Riggs
pgsql-hackers by date: