Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Using quicksort for every external sort run
Date
Msg-id CANP8+jJZaSz9cRqqTAYz58DwD_L-CKnepjV+=UkRU-YiTmG2Vg@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On 20 August 2015 at 18:41, Peter Geoghegan <pg@heroku.com> wrote:
On Thu, Aug 20, 2015 at 8:15 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 20 August 2015 at 03:24, Peter Geoghegan <pg@heroku.com> wrote:
>>
>>
>> The patch is ~3.25x faster than master
>
>
> I've tried to read this post twice and both times my work_mem overflowed.
> ;-)
>
> Can you summarize what this patch does? I understand clearly what it doesn't
> do...

The most important thing that it does is always quicksort runs, that
are formed by simply filling work_mem with tuples in no particular
order, rather than trying to make runs that are twice as large as
work_mem on average. That's what the ~3.25x improvement concerned.
That's actually a significantly simpler algorithm than replacement
selection, and appears to be much faster.

Then I think this is fine, not least because it seems like a first step towards parallel sort.

This will give more runs, so merging those needs some thought. It will also give a more predictable number of runs, so we'll be able to predict any merging issues ahead of time. We can more easily find out the min/max tuple in each run, so we only merge overlapping runs.
 
You might even say that it's
a dumb algorithm, because it is less sophisticated than replacement
selection. However, replacement selection tends to use CPU caches very
poorly, while its traditional advantages have become dramatically less
important due to large main memory sizes in particular. Also, it hurts
that we don't currently dump tuples in batches, for several reasons.
Better to do memory intense operations in batch, rather than having a
huge inner loop, in order to minimize or prevent instruction cache
misses. And we can better take advantage of asynchronous I/O.

The complicated aspect of considering the patch is whether or not it's
okay to not use replacement selection anymore -- is that an
appropriate trade-off?

Using a heapsort is known to be poor for large heaps. We previously discussed the idea of quicksorting the first chunk of memory, then reallocating the heap as a smaller chunk for the rest of the sort. That would solve the cache miss problem.

I'd like to see some discussion of how we might integrate aggregation and sorting. A heap might work quite well for that, whereas quicksort doesn't sound like it would work as well.
 
The reason that the code has not actually been simplified by this
patch is that I still want to use replacement selection for one
specific case: when it is anticipated that a "quicksort with
spillover" can occur, which is only possible with incremental
spilling. That may avoid most I/O, by spilling just a few tuples using
a heap/priority queue, and quicksorting everything else. That's
compelling when you can manage it, but no reason to always use
replacement selection for the first run in the common case where there
well be several runs in total.

I think its premature to retire that algorithm - I think we should keep it for a while yet. I suspect it may serve well in cases where we have low memory, though I accept that is no longer the case for larger servers that we would now call typical.

This could cause particular issues in optimization, since heap sort is wonderfully predictable. We'd need a cost_sort() that was slightly pessimistic to cover the risk that a quicksort might not be as fast as we hope.
 
Is that any clearer? 

Yes, thank you.

I'd like to see a more general and concise plan for how sorting evolves. We are close to having the infrastructure to perform intermediate aggregation, which would allow that to happen during sorting when required (aggregation, sort distinct). We also agreed some time back that parallel sorting would be the first incarnation of parallel operations, so we need to consider that also.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

Previous
From: Pavan Deolasee
Date:
Subject: Re: Declarative partitioning
Next
From: Stephen Frost
Date:
Subject: Re: Warnings around booleans