Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Using quicksort for every external sort run
Date
Msg-id CAM3SWZROc8KSVexHnrpxJraOCmDNd1-Mw19cs0+LSyzOWn8hTQ@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Using quicksort for every external sort run  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Dec 23, 2015 at 9:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> But yes, let me concede more clearly: the cost model is based on
>> frobbing. But at least it's relatively honest about that, and is
>> relatively simple. I think it might be possible to make it simpler,
>> but I have a feeling that anything we can come up with will basically
>> have the same quality that you so dislike. I don't know how to do
>> better. Frankly, I'd rather be roughly correct than exactly wrong.
>
> Sure, but the fact that the model has huge discontinuities - perhaps
> most notably a case where adding a single tuple to the estimated
> cardinality changes the crossover point by a factor of two - suggests
> that you are probably wrong.  The actual behavior does not change
> sharply when the size of the SortTuple array crosses 1GB, but the
> estimates do.

Here is some fairly interesting analysis of Quicksort vs. Heapsort,
from Bentley, coauthor of our own Quicksort implementation:

https://youtu.be/QvgYAQzg1z8?t=16m15s

(This link picks up at the right point to see the comparison, complete
with an interesting graph).

It probably doesn't tell you much that you didn't already know, at
least at this exact point, but it's nice to see Bentley's graph. This
perhaps gives you some idea of why my "quicksort with spillover" cost
model had a cap of MaxAllocSize of SortTuples, past which we always
needed a very compelling case. That was my rough guess of where the
Heapsort graph takes a sharp upward turn. Before then, Bentley shows
that it's close enough to a straight line.

Correct me if I'm wrong, but I think that the only outstanding issue
with all patches posted here so far is the "quicksort with spillover"
cost model. Hopefully this can be cleared up soon. As I've said, I am
very receptive to other people's suggestions about how that should
work.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Anastasia Lubennikova
Date:
Subject: Re: Patch: fix lock contention for HASHHDR.mutex
Next
From: Amit Kapila
Date:
Subject: WAL Re-Writes