Re: Is tuplesort_heap_siftup() a misnomer? - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Is tuplesort_heap_siftup() a misnomer?
Date
Msg-id CAM3SWZTEENdg1TvjOLRpDk0sA2tPDc9tNrk6_Oq-GjCvd2f9Yw@mail.gmail.com
Whole thread Raw
In response to Re: Is tuplesort_heap_siftup() a misnomer?  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Is tuplesort_heap_siftup() a misnomer?
Re: Is tuplesort_heap_siftup() a misnomer?
List pgsql-hackers
On Thu, Sep 8, 2016 at 12:01 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I still think tuplesort_heap_siftup is a confusing name, although I'm not
> sure that Peter's "compact" is much better. I suggest that we rename it to
> tuplesort_heap_delete_top(). In comments within the function, explain that
> the *loop* corresponds to the "siftup" in Knuth's book.

I'm also fine with that name.

> Interestingly, as Knuth explains the siftup algorithm, it performs a
> "replace the top" operation, rather than "remove the top". But we use it to
> remove the top, by moving the last element to the top and running the
> algorithm after that. Peter's other patch, to change the way we currently
> replace the top node, to do it in one pass rather than as delete+insert, is
> actually Knuth's siftup algorithm.

Knuth must have a strange sense of humor.

> Peter, looking at your "displace" patch in this light, I think
> tuplesort_heap_root_displace() and tuplesort_heap_delete_top() (as I'm
> calling it now), should share a common subroutine. Displace replaces the top
> node with the new node passed as argument, and then calls the subroutine to
> push it down to the right place. Delete_top replaces the top node with the
> last value in the heap, and then calls the subroutine. Or perhaps delete_top
> should remove the last value from the heap, and then call displace() with
> it, to re-insert it.

I can see the value in that, but I'd still rather not. The code reuse
win isn't all that large, and having a separate
tuplesort_heap_root_displace() routine allows us to explain what's
going on for merging, to assert what we're able to assert with tape
numbers vs. heap position, and to lose the HEAPCOMPARE()/checkIndex
cruft that the existing routines need (if only barely) to support
replacement selection. I would be surprised if with a very tight inner
loop like this, HEAPCOMPARE() has an appreciable overhead (maybe it
increases pipeline stalls).

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Claudio Freire
Date:
Subject: Re: Parallel tuplesort (for parallel B-Tree index creation)
Next
From: Peter Geoghegan
Date:
Subject: Re: Parallel tuplesort (for parallel B-Tree index creation)