Re: Memory usage during sorting - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Memory usage during sorting
Date
Msg-id CA+Tgmob3k_m=ietqXccJZgoqeZxjj7WJmnV7HawXT0nQc-VKeA@mail.gmail.com
Whole thread Raw
In response to Re: Memory usage during sorting  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On Wed, Mar 21, 2012 at 8:57 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Mon, Mar 19, 2012 at 12:23 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> ...
>>
>> One thing that seems inefficient to me about our current algorithm is
>> the use of the run number as a leading column in the sort key.
>> There's no real reason why the tuples destined for the next run need
>> to be maintained in heap order; we could just store them unordered and
>> heapify the whole lot of them when it's time to start the next run.
>> That ought to save comparison cycles for the current run, since the
>> heap will be less deep, and there are other possible savings as well -
>> in particular, an unordered array of tuples can be heapify'd in linear
>> time, whereas our current algorithm is O(n lg n).  However, when I
>> actually implemented this, I found that doing it this way was a loser.
>>  In fact, excluding the next-run tuples from the heap for the current
>> run was a BIG loser even before you add in the time required to
>> re-heapify.  This confused the daylights out of me for a while,
>> because it's hard to understand how insert and siftup can be slower on
>> a small heap than a large one.
>>
>> My working theory is that, even though we must necessarily do more
>> comparisons when manipulating a larger heap, many of those comparisons
>> are resolved by comparing run numbers, and that's a lot cheaper than
>> invoking the real comparator.  For example, if we insert into a heap
>> whose rightmost child is in the next run, and the new tuple goes into
>> the current run, and the current size of the heap is such that the
>> initial position of the new element is a descendent of the right node,
>> it will very quickly crash through all of the next-run tuples and only
>> need one REAL comparison - against the root.  Either the new element
>> ends up as the root, or it ends up as the direct child of the root;
>> now we remove the root and, perhaps, replace it with its rightmost
>> child, meaning that the next element we read in can do the exact same
>> thing all over again.  If we exclude the next-run elements from the
>> heap, then the average depth of the heap when inserting a new element
>> is smaller, but all the elements are in the same-run, and we have to
>> invoke the real comparator every time.  In other words, those next-run
>> tuples act as dummies which effectively create a heap of uneven depth,
>> and the average depth encountered when inserting tuples turns out to
>> be less than what we get by pulling out the dummies and making the
>> depth uniform.
>>
>> This makes me kind of nervous, because I don't understand why things
>> should work out so well as all that.  If throwing some dummy elements
>> into the heap makes things work better, then maybe we should throw in
>> some more.  Or maybe it would work better to take some but not all of
>> them out.  There certainly doesn't seem to be any indication in the
>> code that this is an anticipated effect, and it seems an awfully
>> providential accident.
>
> Is there a patch or a git repo available for this change?  I'd like to
> see if I can analyze that if I get some spare time.

Sorry for the slow response to this.  Patch is attached
(crummy-external-sort.patch).

> Clever.  Rather than doing a binary search of the path, it might make
> sense to do a reverse search of it.  The tuple is likely to send up
> somewhere at the bottom of the heap, both because that is where most
> of the data is, and because the tuple being reinserted just came from
> the bottom so it is likely to be biased that way.

Patches for these approaches also attached (siftup-using-bsearch.patch
and siftup-reverse-linear.patch).  Neither approach seemed terribly
promising in my testing, IIRC.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Improving our clauseless-join heuristics
Next
From: Robert Haas
Date:
Subject: Re: Improving our clauseless-join heuristics