Re: Merge algorithms for large numbers of "tapes" - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Merge algorithms for large numbers of "tapes"
Date
Msg-id 1141823356.27729.689.camel@localhost.localdomain
Whole thread Raw
In response to Merge algorithms for large numbers of "tapes"  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Merge algorithms for large numbers of "tapes"
List pgsql-hackers
On Tue, 2006-03-07 at 18:14 -0500, Tom Lane wrote:
> BTW, I was just looking over Knuth's discussion of sorting again, and
> realized that there is still something more that could be done within
> the existing sort framework.  We currently use standard polyphase merge
> (his Algorithm 5.4.2D), which IIRC I chose because it was simple and
> for relatively small numbers of tapes T it was about as good as anything
> else.  Knuth spends a great deal of energy on minimizing tape rewind
> time which of course is of no interest to us, and I had supposed that
> all of his more-complex algorithms were really only of interest if you
> needed to consider rewind time.  However, now that we've changed the
> code to prefer large numbers of tapes, it's not at all clear that
> Algorithm D is still the right one to use.  In particular I'm looking at
> cascade merge, Algorithm 5.4.3C, which appears to use significantly
> fewer passes when T is large.  

Ah! Well spotted. Yeh, looks like it will improve performance a good
deal. So, yes, definitely a TODO item. 

> Do you want to try that?

The Cascade Merge re-writes the way logical tapes are selected and how
the runs are merged. It doesn't seem to do anything at all about the
run-forming, which would still use heapsort. So the only effect is when
we have more runs than "tapes", so for the limits of where we would
begin noticing any benefit would be:
work_mem= 1 GB        benefit at 8 TB
work_mem= 256MB         benefit at 0.5 TB
work_mem= 8MB        benefit at 256 MB
work_mem= 1MB        benefit at 12 MB (min 7 tapes).
(based upon runs on average twice size of memory, and each logical tape
requiring 256KB memory, i.e. min(work_mem/4, 6) * work_mem * 2, which
for work_mem > 2 MB gives 0.5 * work_mem^2)

Which means the benefit we get is when we have for some reason been
unable to give the sort enough space, or not set parameters correctly.
So, still a concern...but makes me think about 2 other issues first:

1. Earlier we had some results that showed that the heapsorts got slower
when work_mem was higher and that concerns me most of all right now.
It's possible you'll have reduced that considerably with the
pull-out-the-first-attr patch. I'll look into some test results to show
that has gone away. We also have Nyberg et al telling us that as of 1994
they established that heapsort would always be slower than qsort, as a
result of CPU cache locality improvements. An improvement here would
effect all sorts > work_mem.

2. Improvement in the way we do overall memory allocation, so we would
not have the problem of undersetting work_mem that we currently
experience. If we solved this problem we would have faster sorts in
*all* cases, not just extremely large ones. Dynamically setting work_mem
higher when possible would be very useful. I've looked at this a few
times and have some suggestions, but perhaps its worth asking for ideas
in this area?

Best Regards, Simon Riggs




pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: [PATCHES] Inherited Constraints
Next
From: Stephen Frost
Date:
Subject: Running out of disk space during query