Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Using quicksort for every external sort run
Date
Msg-id CA+TgmoZanxHonZU+NUWX3Xqurh6QGdvQvU4H_yZ7tKjHh-C1YA@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Re: Using quicksort for every external sort run  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
On Fri, Jan 29, 2016 at 12:46 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Jan 29, 2016 at 9:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I feel like this could be data driven.  I mean, the cost model is
>> based mainly on the tuple width and the size of the SortTuple array.
>> So, it should be possible to tests of both algorithms on 32, 64, 96,
>> 128, ... byte tuples with a SortTuple array that is 256MB, 512MB,
>> 768MB, 1GB, ...  Then we can judge how closely the cost model comes to
>> mimicking the actual behavior.
>
> You would also need to represent how much of the input actually ended
> up being sorted with the heap in each case. Maybe that could be tested
> at 50% (bad for "quicksort with spillover"), 25% (better), and 5%
> (good).
>
> An alternative approach that might be acceptable is to add a generic,
> conservative 90% threshold (so 10% of tuples sorted by heap).

I don't quite know what you mean by these numbers.  Add a generic,
conservative threshold to what?

Thinking about this some more, I really think we should think hard
about going back to the strategy which you proposed and discarded in
your original post: always generate the first run using replacement
selection, and every subsequent run by quicksorting.  In that post you
mention powerful advantages of this method: "even without a strong
logical/physical correlation, the algorithm tends to produce runs that
are about twice the size of work_mem. (It's also notable that
replacement selection only produces one run with mostly presorted
input, even where input far exceeds work_mem, which is a neat trick.)"You went on to dismiss that strategy, saying that
"despitethese
 
upsides, replacement selection is obsolete, and should usually be
avoided."  But I don't see that you've justified that statement.  It
seems pretty easy to construct cases where this technique regresses,
and a large percentage of those cases are precisely those where
replacement selection would have produced a single run, avoiding the
merge step altogether.  I think those cases are extremely important.
I'm quite willing to run somewhat more slowly than in other cases to
be certain of not regressing the case of completely or
almost-completely ordered input.  Even if that didn't seem like a
sufficient reason unto itself, I'd be willing to go that way just so
we don't have to depend on a cost model that might easily go wrong due
to bad input even if it were theoretically perfect in every other
respect (which I'm pretty sure is not true here anyway).

I also have another idea that might help squeeze more performance out
of your approach and avoid regressions.  Suppose that we add a new GUC
with a name like sort_mem_stretch_multiplier or something like that,
with a default value of 2.0 or 4.0 or whatever we think is reasonable.
When we've written enough runs that a polyphase merge will be
required, or when we're actually performing a polyphase merge, the
amount of memory we're allowed to use increases by this multiple.  The
idea is: we hope that people will set work_mem appropriately and
consequently won't experience polyphase merges at all, but it might.
However, it's almost certain not to happen very frequently.
Therefore, using extra memory in such cases should be acceptable,
because while you might have every backend in the system using 1 or
more copies of work_mem for something if the system is very busy, it
is extremely unlikely that you will have more than a handful of
processes doing polyphase merges.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Sequence Access Method WIP
Next
From: Robert Haas
Date:
Subject: Re: Sequence Access Method WIP