Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Using quicksort for every external sort run
Date
Msg-id CAMkU=1wufLoS-rL_+tnBzNG_LpMj1rp1aMqnCEwL0qXxmRgmyA@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Sat, Nov 28, 2015 at 4:05 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Sat, Nov 28, 2015 at 2:04 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
...
>>
>> The final merging is intermixed with whatever other work goes on to
>> build the actual index files out of the sorted data, so I don't know
>> exactly what the timing of just the merge part was.  But it was
>> certainly a minority of the time, even if you assume the actual index
>> build were free.  For the patched code, the majority of the time goes
>> to the quick sorting stages.
>
> I'm not sure what you mean here.

I had no point to make here, I was just trying to answer one of your
questions about how much time was spent merging. I don't know, because
it is interleaved with and not separately instrumented from the index
build.

>
> I would generally expect that the merge phase takes significantly less
> than sorting runs, regardless of how we sort runs, unless parallelism
> is involved, where merging could dominate. The master branch has a
> faster merge step, at least proportionally, because it has larger
> runs.
>
>> When I test each version of the code at its own most efficient
>> maintenance_work_mem, I get
>> 3007.2 seconds at 1GB for patched and 3836.46 seconds at 64MB for unpatched.
>
> As I said, it seems a little bit unfair to hand-tune work_mem or
> maintenance_work_mem like that. Who can afford to do that? I think you
> agree that it's untenable to have DBAs allocate work_mem differently
> for cases where an internal sort or external sort is expected;
> workloads are just far too complicated and changeable.

Right, I agree with all that.  But I think it is important to know
where the benefits come from.  It looks like about half comes from
being more robust to overly-large memory usage, and half from absolute
improvements which you get at each implementations own best setting.
Also, if someone had previously restricted work_mem (or more likely
maintenance_work_mem) simply to avoid the large memory penalty, they
need to know to revisit that decision. Although they still don't get
any actual benefit from using too much memory, just a reduced penalty.

I'm kind of curious as to why the optimal for the patched code appears
at 1GB and not lower.  If I get a chance to rebuild the test, I will
look into that more.


>
>> I'm attaching the trace_sort output from the client log for all 4 of
>> those scenarios.  "sort_0005" means all 5 of your patches were
>> applied, "origin" means none of them were.
>
> Thanks for looking at this. This is very helpful. It looks like the
> server you used here had fairly decent disks, and that we tended to be
> CPU bound more often than not. That's a useful testing ground.

It has a Perc H710 RAID controller with 15,000 RPM drives, but it is
also a virtualized system that has other stuff going on.  The disks
are definitely better than your average household computer, but I
don't think they are anything special as far as real database hardware
goes.  It is hard to saturate the disks for sequential reads.  It will
be interesting to see what parallel builds can do.


What would be next in reviewing the patches?  Digging into the C-level
implementation?

Cheers,

Jeff



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: [patch] Proposal for \rotate in psql
Next
From: Robert Haas
Date:
Subject: Re: Additional role attributes && superuser review