Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Using quicksort for every external sort run
Date
Msg-id CAM3SWZRZ_ChN6FOOnOR8fc-DcdJB2GJtbfrpOrrh41HfcbWaBQ@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Re: Using quicksort for every external sort run  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Fri, Nov 20, 2015 at 12:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> That's what I was asking about.  It seemed to me that you were saying
> we could ignore those cases, which doesn't seem to me to be true.

I've been around for long enough to know that there are very few cases
that can be ignored.  :-)

>> The latter 16MB work_mem example query/table is still faster with a
>> 12MB work_mem than master, even with multiple passes. Quite a bit
>> faster, in fact: about 37 seconds on master, to about 24.7 seconds
>> with the patches (same for higher settings short of 16MB).
>
> Is this because we save enough by quicksorting rather than heapsorting
> to cover the cost of the additional merge phase?
>
> If not, then why is it happening like this?

I think it's because of caching effects alone, but I am not 100% sure
of that. I concede that it might not be enough to make up for the
additional I/O on some systems or platforms. The fact remains,
however, that the patch was faster on the unsympathetic case I ran on
the machine I had available (which has an SSD), and that I really have
not managed to find a case that is regressed after some effort.

>> I should point out that there is no evidence that any case has been
>> regressed, let alone written off entirely or ignored. I looked. I
>> probably have not been completely exhaustive, and I'd be willing to
>> believe there is something that I've missed, but it's still quite
>> possible that there is no downside to any of this.
>
> If that's so, it's excellent news.

As I mentioned up-thread, maybe I shouldn't have brought all the
theoretical justifications for killing replacement selection into the
discussion so early. Those observations on replacement selection
(which are not my own original insights) happen to be what spurred
this work. I spent so much time talking about how irrelevant
multi-pass merging was that people imagined that that was severely
regressed, when it really was not. That just happened to be the way I
came at the problem.

The numbers speak for themselves here. I just want to be clear about
the disadvantages of what I propose, even if it's well worth it
overall in most (all?) cases.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Using quicksort for every external sort run
Next
From: Kouhei Kaigai
Date:
Subject: Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)