Home > mailing lists

Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: Using quicksort for every external sort run
Date	November 25, 2015 15:11:35
Msg-id	CAM-w4HMd6SpqRRRipHwz9QS0L8eJh6Mo-pEB8sGuwYw=UKyy_A@mail.gmail.com Whole thread Raw
In response to	Re: Using quicksort for every external sort run (Peter Geoghegan <pg@heroku.com>)
Responses	Re: Using quicksort for every external sort run
List	pgsql-hackers

Tree view

On Wed, Nov 25, 2015 at 2:31 AM, Peter Geoghegan <pg@heroku.com> wrote:
>
> There already was a test case involving a 1TB/16 billion tuple sort
> [1] (well, a 1TB gensort Postgres table [2]). Granted, I don't have a
> large number of similar test cases across a variety of scales, but
> there are only so many hours in the day. Disappointingly, the results
> at that scale were merely good, not great, but there was probably
> various flaws in how representative the hardware used was.

That's precisely why it's valuable to see a whole series of data
points rather than just one. Often when you see the shape of the
curve, especially any breaks or changes in the behaviour that helps
understand the limitations of the model. Perhaps it would be handy to
find a machine with a very small amount of physical memory so you
could run more reasonably sized tests on it. A VM would be fine if you
could be sure the storage layer isn't caching.

In short, I think you're right in theory and I want to make sure
you're right in practice. I'm afraid if we just look at a few data
points we'll miss out on a bug or a factor we didn't anticipate that
could have been addressed.

Just to double check though. My understanding is that your quicksort
algorithm is to fill work_mem with tuples, quicksort them, write out a
run, and repeat. When the inputs are done read work_mem/runs worth of
tuples from each run into memory and run a merge (using a heap?) like
we do currently. Is that right?

Incidentally one of the reasons abandoning the heap to generate runs
is attractive is that it opens up other sorting algorithms for us.
Instead of quicksort we might be able to plug in a GPU sort for
example.

-- 
greg

pgsql-hackers by date:

From: Thom Brown
Date: 25 November 2015, 14:37:26
Subject: Re: Optimization for updating foreign tables in Postgres FDW

From: Greg Stark
Date: 25 November 2015, 15:25:30
Subject: Re: New email address

Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

Previous

Next