Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Using quicksort for every external sort run
Date
Msg-id CAM3SWZTEgaOoWnBNUY55+MXS+3=Wtb28tB=K52MBwwMJS=3M7Q@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Using quicksort for every external sort run  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
On Thu, Nov 19, 2015 at 2:53 PM, Peter Geoghegan <pg@heroku.com> wrote:
> The latter 16MB work_mem example query/table is still faster with a
> 12MB work_mem than master, even with multiple passes. Quite a bit
> faster, in fact: about 37 seconds on master, to about 24.7 seconds
> with the patches (same for higher settings short of 16MB).

I made the same comparison with work_mem sizes of 2MB and 6MB for
master/patch, and the patch *still* came out ahead, often by over 10%.
This was more than fair, though, because sometimes the final
on-the-fly merge for the master branch starting at a point at which
the patch series has already completed its sort. (Of course, I don't
believe that any user would ever be well served with such a low
work_mem setting for these queries -- I'm looking for a bad case,
though).

I guess this is a theoretical downside of my approach, that is more
than made up for elsewhere (even leaving aside the final, unrelated
patch in the series, addressing the merge bottleneck directly). So, to
summarize such downsides (downsides of a hybrid sort-merge strategy as
compared to replacement selection):

* As mentioned just now, the fact that there are more runs -- merging
can be slower (although tuples can be returned earlier, which could
also help with CREATE INDEX). This is more of a problem when random
I/O is expensive, and less of a problem when the OS cache buffers
things nicely.

* One run can be created with replacement selection, where a
hyrbid-sort merge strategy needs to create and then merge many runs.
When I started work on this patch, I was pretty sure that case would
be noticeably regressed. I was wrong.

* Abbreviated key comparisons are used less because runs are smaller.
This is why sorts of types like numeric are not especially sympathetic
to the patch. Still, we manage to come out well ahead overall.

You can perhaps show the patch to be almost as slow as the master
branch with a very unsympathetic case involving the union of all these
3. I couldn't regress a case with integers with just the first two,
though.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: ROLES and ALTER DEFAULT PRIVILEGES
Next
From: Robert Haas
Date:
Subject: Re: Typo in file header comment of replorigindesc.c