Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Using quicksort for every external sort run
Date
Msg-id CAM3SWZRY9vQx1P+YKPedKyNwd6cf4_KhDntfir=ReKW60C5L3g@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Mon, Nov 30, 2015 at 9:51 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> As I said, it seems a little bit unfair to hand-tune work_mem or
>> maintenance_work_mem like that. Who can afford to do that? I think you
>> agree that it's untenable to have DBAs allocate work_mem differently
>> for cases where an internal sort or external sort is expected;
>> workloads are just far too complicated and changeable.
>
> Right, I agree with all that.  But I think it is important to know
> where the benefits come from.  It looks like about half comes from
> being more robust to overly-large memory usage, and half from absolute
> improvements which you get at each implementations own best setting.
> Also, if someone had previously restricted work_mem (or more likely
> maintenance_work_mem) simply to avoid the large memory penalty, they
> need to know to revisit that decision. Although they still don't get
> any actual benefit from using too much memory, just a reduced penalty.

Well, to be clear, they do get a benefit with much larger memory
sizes. It's just that the benefit does not continue indefinitely. I
agree with this assessment, though.

> I'm kind of curious as to why the optimal for the patched code appears
> at 1GB and not lower.  If I get a chance to rebuild the test, I will
> look into that more.

I think that the availability of abbreviated keys (or something that
allows most comparisons made by quicksort/the heap to be resolved at
the SortTuple level) could make a big difference for things like this.
Bear in mind that the merge phase has better cache characteristics
when many attributes must be compared, and not mostly just leading
attributes. Alphasort [1] merges in-memory runs (built with quicksort)
to create on-disk runs for this reason. (I tried that, and it didn't
help -- maybe I get that benefit from merging on-disk runs, since
modern machines have so much more memory than in 1994).

> It has a Perc H710 RAID controller with 15,000 RPM drives, but it is
> also a virtualized system that has other stuff going on.  The disks
> are definitely better than your average household computer, but I
> don't think they are anything special as far as real database hardware
> goes.

What I meant was that it's better than my laptop. :-)

> What would be next in reviewing the patches?  Digging into the C-level
> implementation?

Yes, certainly, but let me post a revised version first. I have
improved the comments, and performed some consolidation of commits.

Also, I am going to get a bunch of test results from the POWER7
system. I think I might see more benefits with higher
maintenance_work_mem settings that you saw, primarily because my case
can mostly just use abbreviated keys during the quicksort operations.
Also, I find it very very useful that while (for example) your 3GB
test case was slower than your 1GB test case, it was only 5% slower. I
have a lot of hope that we can have a cost model for sizing an
effective maintenance_work_mem for this reason -- the consequences of
being wrong are really not that severe. It's unfortunate that we
currently waste so much memory by blindly adhering to
work_mem/maintenance_work_mem. This matters a lot more when we have
parallel sort.

[1] http://www.cs.berkeley.edu/~rxin/db-papers/alphasort.pdf
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Remaining 9.5 open items
Next
From: Peter Geoghegan
Date:
Subject: Re: Memory prefetching while sequentially fetching from SortTuple array, tuplestore