Home > mailing lists

Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From	Feng Tian
Subject	Re: Using quicksort for every external sort run
Date	August 20, 2015 23:28:37
Msg-id	CAFWGqnvHRDFxvxjjEgvnsTrgU3rLPN8W17O7Y-UbrCGAYka=mA@mail.gmail.com Whole thread Raw
In response to	Re: Using quicksort for every external sort run (Peter Geoghegan <pg@heroku.com>)
Responses	Re: Using quicksort for every external sort run (Peter Geoghegan <pg@heroku.com>)
List	pgsql-hackers

Tree view

On Thu, Aug 20, 2015 at 1:16 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Thu, Aug 20, 2015 at 12:42 PM, Feng Tian <ftian@vitessedata.com> wrote:
> Just a quick anecdotal evidence. I did similar experiment about three years
> ago. The conclusion was that if you have SSD, just do quick sort and
> forget the longer runs, but if you are using hard drives, longer runs is the
> winner (and safer, to avoid cliffs). I did not experiment with RAID0/5 on
> many spindles though.
>
> Not limited to sort, more generally, SSD is different enough from HDD,
> therefore it may worth the effort for backend to "guess" what storage device
> it has, then choose the right thing to do.

The devil is in the details. I cannot really comment on such a general
statement.

I would be willing to believe that that's true under
unrealistic/unrepresentative conditions. Specifically, when multiple
passes are required with a sort-merge strategy where that isn't the
case with replacement selection. This could happen with a tiny
work_mem setting (tiny in an absolute sense more than a relative
sense). With an HDD, where sequential I/O is so much faster, this
could be enough to make replacement selection win, just as it would
have in the 1970s with magnetic tapes.

As I've said, the solution is to simply avoid multiple passes, which
should be possible in virtually all cases because of the quadratic
growth in a classic hybrid sort-merge strategy's capacity to avoid
multiple passes (growth relative to work_mem's growth). Once you
ensure that, then you probably have a mostly I/O bound workload, which
can be made faster by adding sequential I/O capacity (or, on the
Postgres internals side, adding asynchronous I/O, or with memory
prefetching). You cannot really buy a faster CPU to make a degenerate
heapsort faster.

--
Peter Geoghegan

Agree everything in principal,except one thing -- no, random IO on HDD in 2010s (relative to CPU/Memory/SSD), is not any faster than tape in 1970s. :-)

pgsql-hackers by date:

From: Peter Geoghegan
Date: 20 August 2015, 23:16:41
Subject: Re: Using quicksort for every external sort run

From: Peter Geoghegan
Date: 20 August 2015, 23:31:22
Subject: Re: Using quicksort for every external sort run

Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

Previous

Next