Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Using quicksort for every external sort run
Date
Msg-id CAM3SWZTAZEhF+trC9BOLHO8sD8WVXXJqvTP=b5uPK62c7x6yQg@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
On Wed, Nov 25, 2015 at 4:10 AM, Greg Stark <stark@mit.edu> wrote:
> That's precisely why it's valuable to see a whole series of data
> points rather than just one. Often when you see the shape of the
> curve, especially any breaks or changes in the behaviour that helps
> understand the limitations of the model. Perhaps it would be handy to
> find a machine with a very small amount of physical memory so you
> could run more reasonably sized tests on it. A VM would be fine if you
> could be sure the storage layer isn't caching.

I have access to the Power7 system that Robert and others sometimes
use for this stuff. I'll try to come up a variety of tests.

> In short, I think you're right in theory and I want to make sure
> you're right in practice. I'm afraid if we just look at a few data
> points we'll miss out on a bug or a factor we didn't anticipate that
> could have been addressed.

I am in favor of being comprehensive.

> Just to double check though. My understanding is that your quicksort
> algorithm is to fill work_mem with tuples, quicksort them, write out a
> run, and repeat. When the inputs are done read work_mem/runs worth of
> tuples from each run into memory and run a merge (using a heap?) like
> we do currently. Is that right?

Yes, that's basically what I'm doing.

There are basically two extra bits:

* Without changing how merging actually works, I am clever about
allocating memory for the final on-the-fly merge. Allocation is done
once, in one huge batch. Importantly, I exploit locality by having
every "tuple proper" (e.g. IndexTuple) in contiguous memory, in sorted
(tape) order, per tape. This also greatly reduces palloc() overhead
for the final on-the-fly merge step.

* We do something special when we're just over work_mem, to avoid most
I/O -- "quicksort with spillover". This is a nice trick, but it's
certain way less important than the basic idea of simply always
quicksorting runs. I could easily not do this. This is why the heap
code was not significantly simplified to only cover the merge cases,
though -- this uses essentially the same replacement selection style
heap to incrementally spill to get us enough memory to mostly complete
the sort internally.

> Incidentally one of the reasons abandoning the heap to generate runs
> is attractive is that it opens up other sorting algorithms for us.
> Instead of quicksort we might be able to plug in a GPU sort for
> example.

Yes, it's true that we automatically benefit from optimizations for
the internal sort case now. That's already happening with the patch,
actually -- the "onlyKey" optimization (a more specialized quicksort
specialization, used in the one attribute heap case, and datum case)
is now automatically used. That was where the best 2012 numbers for
SortSupport were seen, so that makes a significant difference. As you
say, something like that could easily happen again.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: New email address
Next
From: Pavel Stehule
Date:
Subject: Re: proposal: multiple psql option -c