On Sun, Dec 28, 2014 at 12:45 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Sun, Dec 28, 2014 at 12:37 PM, Jeff Davis <pgsql@j-davis.com> wrote: > Do others have similar numbers? I'm quite surprised at how little > work_mem seems to matter for these plans (HashJoin might be a different > story though). I feel like I made a mistake -- can someone please do a > sanity check on my numbers?
I have seen external sorts that were quicker than internal sorts before. With my abbreviated key patch, under certain circumstances external sorts are faster, while presumably the same thing is true of int4 attribute sorts today. Actually, I saw a 10MB work_mem setting that was marginally faster than a multi-gigabyte one that fit the entire sort in memory. It probably has something to do with caching effects dominating over the expense of more comparisons, since higher work_mem settings that still resulted in an external sort were slower than the 10MB setting.
I was surprised by this too, but it has been independently reported by Jeff Janes.
I don't recall (at the moment) seeing our external sort actually faster than quick-sort, but I've very reliably seen external sorts get faster with less memory than with more. It is almost certainly a CPU caching issue. Very large simple binary heaps are horrible on the CPU cache. And for sort-by-reference values, quick sort is also pretty bad.
With a slow enough data bus between the CPU and main memory, I don't doubt that a 'tapesort' with small work_mem could actually be faster than quicksort with large work_mem. But I don't recall seeing it myself. But I'd be surprised that a tapesort as currently implemented would be faster than a quicksort if the tapesort is using just one byte less memory than the quicksort is.
But to Jeff Davis's question, yes, tapesort is not very sensitive to work_mem, and to the extent it is sensitive it is in the other direction of more memory being bad. Once work_mem is so small that it takes multiple passes over the data to do the merge, then small memory would really be a problem. But on modern hardware you have to get pretty absurd settings before that happens.