On Wed, Nov 25, 2015 at 2:31 AM, Peter Geoghegan <pg@heroku.com> wrote:
>
> There already was a test case involving a 1TB/16 billion tuple sort
> [1] (well, a 1TB gensort Postgres table [2]). Granted, I don't have a
> large number of similar test cases across a variety of scales, but
> there are only so many hours in the day. Disappointingly, the results
> at that scale were merely good, not great, but there was probably
> various flaws in how representative the hardware used was.
That's precisely why it's valuable to see a whole series of data
points rather than just one. Often when you see the shape of the
curve, especially any breaks or changes in the behaviour that helps
understand the limitations of the model. Perhaps it would be handy to
find a machine with a very small amount of physical memory so you
could run more reasonably sized tests on it. A VM would be fine if you
could be sure the storage layer isn't caching.
In short, I think you're right in theory and I want to make sure
you're right in practice. I'm afraid if we just look at a few data
points we'll miss out on a bug or a factor we didn't anticipate that
could have been addressed.
Just to double check though. My understanding is that your quicksort
algorithm is to fill work_mem with tuples, quicksort them, write out a
run, and repeat. When the inputs are done read work_mem/runs worth of
tuples from each run into memory and run a merge (using a heap?) like
we do currently. Is that right?
Incidentally one of the reasons abandoning the heap to generate runs
is attractive is that it opens up other sorting algorithms for us.
Instead of quicksort we might be able to plug in a GPU sort for
example.
--
greg