Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Using quicksort for every external sort run
Date
Msg-id CAM3SWZRTBkwOYsnJQAa=XTVaRcH7u9_926bEHGvEeM55qRacTQ@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Feng Tian <ftian@vitessedata.com>)
Responses Re: Using quicksort for every external sort run  (Feng Tian <ftian@vitessedata.com>)
List pgsql-hackers
On Thu, Aug 20, 2015 at 12:42 PM, Feng Tian <ftian@vitessedata.com> wrote:
> Just a quick anecdotal evidence.  I did similar experiment about three years
> ago.   The conclusion was that if you have SSD, just do quick sort and
> forget the longer runs, but if you are using hard drives, longer runs is the
> winner (and safer, to avoid cliffs).    I did not experiment with RAID0/5 on
> many spindles though.
>
> Not limited to sort, more generally, SSD is different enough from HDD,
> therefore it may worth the effort for backend to "guess" what storage device
> it has, then choose the right thing to do.

The devil is in the details. I cannot really comment on such a general
statement.

I would be willing to believe that that's true under
unrealistic/unrepresentative conditions. Specifically, when multiple
passes are required with a sort-merge strategy where that isn't the
case with replacement selection. This could happen with a tiny
work_mem setting (tiny in an absolute sense more than a relative
sense). With an HDD, where sequential I/O is so much faster, this
could be enough to make replacement selection win, just as it would
have in the 1970s with magnetic tapes.

As I've said, the solution is to simply avoid multiple passes, which
should be possible in virtually all cases because of the quadratic
growth in a classic hybrid sort-merge strategy's capacity to avoid
multiple passes (growth relative to work_mem's growth). Once you
ensure that, then you probably have a mostly I/O bound workload, which
can be made faster by adding sequential I/O capacity (or, on the
Postgres internals side, adding asynchronous I/O, or with memory
prefetching). You cannot really buy a faster CPU to make a degenerate
heapsort faster.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: DBT-3 with SF=20 got failed
Next
From: Feng Tian
Date:
Subject: Re: Using quicksort for every external sort run