Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Using quicksort for every external sort run
Date
Msg-id CAM3SWZTk2Pv0MdD-0A1x5k_aF_m98-Xm2BcUqMgdeDx3=Kpjhw@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Thu, Aug 20, 2015 at 11:56 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> This will give more runs, so merging those needs some thought. It will also
> give a more predictable number of runs, so we'll be able to predict any
> merging issues ahead of time. We can more easily find out the min/max tuple
> in each run, so we only merge overlapping runs.

I think that merging runs can be optimized to reduce the number of
cache misses. Poul-Henning Kamp, the FreeBSD guy, has described
problems with binary heaps and cache misses [1], and I think we could
use his solution for merging. But we should definitely still quicksort
runs.

> Using a heapsort is known to be poor for large heaps. We previously
> discussed the idea of quicksorting the first chunk of memory, then
> reallocating the heap as a smaller chunk for the rest of the sort. That
> would solve the cache miss problem.
>
> I'd like to see some discussion of how we might integrate aggregation and
> sorting. A heap might work quite well for that, whereas quicksort doesn't
> sound like it would work as well.

If you're talking about deduplicating within tuplesort, then there are
techniques. I don't know that that needs to be an up-front priority of
this work.

> I think its premature to retire that algorithm - I think we should keep it
> for a while yet. I suspect it may serve well in cases where we have low
> memory, though I accept that is no longer the case for larger servers that
> we would now call typical.

I have given one case where I think the first run should still use
replacement selection: where that enables a "quicksort with
spillover". For that reason, I would consider that I have not actually
proposed to retire the algorithm. In principle, I agree with also
using it under any other circumstances where it is likely to be
appreciably faster, but it's just not in evidence that there is any
other such case. I did look at all the traditionally sympathetic
cases, as I went into, and it still seemed to not be worth it at all.
But by all means, if you think I missed something, please show me a
test case.

> This could cause particular issues in optimization, since heap sort is
> wonderfully predictable. We'd need a cost_sort() that was slightly
> pessimistic to cover the risk that a quicksort might not be as fast as we
> hope.

Wonderfully predictable? Really? It's totally sensitive to CPU cache
characteristics. I wouldn't say that at all. If you're alluding to the
quicksort worst case, that seems like the wrong thing to worry about.
The risk around that is often overstated, or based on experience from
third-rate implementations that don't follow various widely accepted
recommendations from the research community.

> I'd like to see a more general and concise plan for how sorting evolves. We
> are close to having the infrastructure to perform intermediate aggregation,
> which would allow that to happen during sorting when required (aggregation,
> sort distinct). We also agreed some time back that parallel sorting would be
> the first incarnation of parallel operations, so we need to consider that
> also.

I agree with everything you say here, I think. I think it's
appropriate that this work anticipate adding a number of other
optimizations in the future, at least including:

* Parallel sort using worker processes.

* Memory prefetching.

* Offset-value coding of runs, a compression technique that was used
in System R, IIRC. This can speed up merging a lot, and will save I/O
bandwidth on dumping out runs.

* Asynchronous I/O.

There should be an integrated approach to applying every possible
optimization, or at least leaving the possibility open. A lot of these
techniques are complementary. For example, there are significant
benefits where the "onlyKey" optimization is now used with external
sorts, which you get for free by using quicksort for runs. In short, I
am absolutely on-board with the idea that these things need to be
anticipated at the very least. For another speculative example, offset
coding makes the merge step cheaper, but the work of doing the offset
coding can be offloaded to worker processes, whereas the merge step
proper cannot really be effectively parallelized -- those two
techniques together are greater than the sum of their parts. One big
problem that I see with replacement selection is that it makes most of
these things impossible.

In general, I think that parallel sort should be an external sort
technique first and foremost. If you can only parallelize an internal
sort, then running out of road when there isn't enough memory to do
the sort in memory becomes a serious issue. Besides, you need to
partition the input anyway, and external sorting naturally needs to do
that while not precluding runs not actually being dumped to disk.

[1] http://queue.acm.org/detail.cfm?id=1814327
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Warnings around booleans
Next
From: Andres Freund
Date:
Subject: Re: A few cases of left shifting negative integers