Re: Is tuplesort_heap_siftup() a misnomer? - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Is tuplesort_heap_siftup() a misnomer?
Date
Msg-id 0b4f0f78-28d3-e99f-70bb-962cf8a8025c@iki.fi
Whole thread Raw
In response to Re: Is tuplesort_heap_siftup() a misnomer?  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Is tuplesort_heap_siftup() a misnomer?
List pgsql-hackers
On 09/08/2016 03:36 AM, Peter Geoghegan wrote:
> On Wed, Sep 7, 2016 at 2:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The reason it's called siftup is that that's what Knuth calls it.
>> See Algorithm 5.2.3H (Heapsort), pp 146-147 in the first edition of
>> Volume 3; tuplesort_heap_siftup corresponds directly to steps H3-H8.
>
> I see that Knuth explains that these steps form the siftup procedure.
> What steps does tuplesort_heap_insert correspond to, if any?

Hmm. The loop in tuplesort_heap_siftup() indeed looks the same as 
Knuth's steps H3-H8. But the start is different. Knuth's algorithm 
begins with the situation that the top node of the sub-heap is in wrong 
place, and it pushes it downwards, until the whole sub-heap satisfies 
the heap condition again. But tuplesort_heap_siftup() begins by moving 
the last value in the heap to the top, and then it does the H3-H8 steps 
to move it to the right place.

Using Wikipedia's terms, tuplesort_heap_siftup() function as whole is 
the Extract operation. And the loop inside it corresponds to the 
Max-Heapify operation, which is the same as Knuth's "siftup".

I still think tuplesort_heap_siftup is a confusing name, although I'm 
not sure that Peter's "compact" is much better. I suggest that we rename 
it to tuplesort_heap_delete_top(). In comments within the function, 
explain that the *loop* corresponds to the "siftup" in Knuth's book.

Interestingly, as Knuth explains the siftup algorithm, it performs a 
"replace the top" operation, rather than "remove the top". But we use it 
to remove the top, by moving the last element to the top and running the 
algorithm after that. Peter's other patch, to change the way we 
currently replace the top node, to do it in one pass rather than as 
delete+insert, is actually Knuth's siftup algorithm.

Peter, looking at your "displace" patch in this light, I think 
tuplesort_heap_root_displace() and tuplesort_heap_delete_top() (as I'm 
calling it now), should share a common subroutine. Displace replaces the 
top node with the new node passed as argument, and then calls the 
subroutine to push it down to the right place. Delete_top replaces the 
top node with the last value in the heap, and then calls the subroutine. 
Or perhaps delete_top should remove the last value from the heap, and 
then call displace() with it, to re-insert it.

- Heikki




pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Suggestions for first contribution?
Next
From: "Tsunakawa, Takayuki"
Date:
Subject: Re: Supporting SJIS as a database encoding