Thread: Is tuplesort_heap_siftup() a misnomer?

Is tuplesort_heap_siftup() a misnomer?

From

Peter Geoghegan

Date:

12 August 2016, 23:31:00

Doesn't tuplesort_heap_siftup() actually shift-down?

The Wikipedia article on heaps [1] lists "shift-down" (never "sift
down", FWIW) as a common operation on a heap:

"shift-down: move a node down in the tree, similar to shift-up; used
to restore heap condition after deletion or replacement."

This seems to be what tuplesort_heap_siftup() actually does (among
other things; its job is to compact the heap following caller
returning the root, where caller leaves a "hole" at the root position
0). As it happens, the node that this tuplesort.c function "moves down
in the tree" is the memtuple that was previously positioned physically
last in the heap portion of the memtuples array. This node is then
considered as a "candidate root" (i) for each subheap as we actually
shift down, starting from i = 0. Conceptually, we immediately "fill
the hole" that caller left in the root with this other tuple (the
physically last tuple), with this new tuple/node, then shifted down as
needed. (I say "conceptually" because we don't actually bother with an
eager swap into the "hole" position, but that's just to avoid a
potentially useless swap.)

Now, tuplesort_heap_insert() actually does sift (or shift up, if you
prefer), which is necessary because it doesn't assume that there is an
existing "hole" anywhere in the heap (it just assumes the availability
of space to insert the caller's tuple, at the end of the heap's
memtuple portion).

Or, as Wikipedia describes it:

"shift-up: move a node up in the tree, as long as needed; used to
restore heap condition after insertion. Called "sift" because node
moves up the tree until it reaches the correct level, as in a sieve."

I think that tuplesort_heap_insert() is correct, because it doesn't
claim to shift at all (or sift, or whatever). I'm just contrasting it
with tuplesort_heap_siftup() to make a point.

I think that tuplesort_heap_siftup() should probably be renamed to
describe its purpose at a higher level, rather than how it shifts,
which in any case is an implementation detail. Specifically, I suggest
we rename tuplesort_heap_siftup() to tuplesort_heap_compact(), which
suggests that it's roughly the opposite of tuplesort_heap_insert(). I
think that the contrast between these two functions add clarity.
tuplesort_heap_siftup() comments will also need to be updated, since
"sift up" is mentioned there.

Should I write a patch?

[1] https://en.wikipedia.org/wiki/Heap_(data_structure)
-- 
Peter Geoghegan

Re: Is tuplesort_heap_siftup() a misnomer?

From

Peter Geoghegan

Date:

07 September 2016, 21:27:34

On Fri, Aug 12, 2016 at 4:30 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Doesn't tuplesort_heap_siftup() actually shift-down?
>
> The Wikipedia article on heaps [1] lists "shift-down" (never "sift
> down", FWIW) as a common operation on a heap:
>
> "shift-down: move a node down in the tree, similar to shift-up; used
> to restore heap condition after deletion or replacement."
>
> This seems to be what tuplesort_heap_siftup() actually does (among
> other things; its job is to compact the heap following caller
> returning the root, where caller leaves a "hole" at the root position
> 0).

Heikki (CC'd) and I discussed this privately today, and we were in
agreement that this needs to be fixed, so I wrote a patch, which I
attach here.


--
Peter Geoghegan

Attachment

0001-Rename-tuplesort_heap_siftup-function.patch

Re: Is tuplesort_heap_siftup() a misnomer?

From

Tom Lane

Date:

07 September 2016, 21:42:45

Peter Geoghegan <pg@heroku.com> writes:
> On Fri, Aug 12, 2016 at 4:30 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> Doesn't tuplesort_heap_siftup() actually shift-down?

The reason it's called siftup is that that's what Knuth calls it.
See Algorithm 5.2.3H (Heapsort), pp 146-147 in the first edition of
Volume 3; tuplesort_heap_siftup corresponds directly to steps H3-H8.

> Heikki (CC'd) and I discussed this privately today, and we were in
> agreement that this needs to be fixed, so I wrote a patch, which I
> attach here.

I think this patch is misguided, as it unnecessarily moves the code
away from standard nomenclature.
        regards, tom lane

Re: Is tuplesort_heap_siftup() a misnomer?

From

Peter Geoghegan

Date:

08 September 2016, 00:36:54

On Wed, Sep 7, 2016 at 2:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The reason it's called siftup is that that's what Knuth calls it.
> See Algorithm 5.2.3H (Heapsort), pp 146-147 in the first edition of
> Volume 3; tuplesort_heap_siftup corresponds directly to steps H3-H8.

I see that Knuth explains that these steps form the siftup procedure.
What steps does tuplesort_heap_insert correspond to, if any?

5.2.3.H is about heapsort, and so the Wikipedia article on heapsort
(not the one on heaps in general, which I referenced first) may be a
useful reference here. It's also freely available. Consider the
Wikipedia pseudocode for siftup [1], from classic heapsort. That
version goes from child to parent each iteration (in the first
iteration, "child" is initialized to "end" -- not root). So, it
*ascends* the heap ("child" is assigned from "parent" for any
subsequent iterations). OTOH, tuplesort_heap_siftup always starts from
the root of tuplesort's top-level heap (or the "hole" at the root
position, if you prefer), and *descends* the heap.

> I think this patch is misguided, as it unnecessarily moves the code
> away from standard nomenclature.

As I mentioned, the patch is guided by the descriptions of fundamental
operations on heaps from Wikipedia (I didn't think much about heapsort
until now). I'm not really proposing to change things in a way that is
inconsistent with Knuth (regardless of how the term siftup is
understood). I'm proposing to change things in a way that seems
clearer overall, particularly given the way that these various
routines are used in fairly distinct contexts.

The terminology in this area can be confusing. My Sedgewick/Wayne
Algorithms book never uses the term shift or sift when discussing
heaps, even once in passing. The call shift up "swim", and shift down
"sink", possibly because they'd like to avoid any baggage that other
terminology has.

I propose, as a compromise, to not introduce the term "shift down" at
all in this patch, but to still rename tuplesort_heap_siftup to
tuplesort_heap_compact. Even assuming that I'm wrong about siftup
here, I think that that still represents an improvement. Would that be
acceptable to you?

[1] https://en.wikipedia.org/wiki/Heapsort#Pseudocode
-- 
Peter Geoghegan

Re: Is tuplesort_heap_siftup() a misnomer?

From

Heikki Linnakangas

Date:

08 September 2016, 07:01:47

On 09/08/2016 03:36 AM, Peter Geoghegan wrote:
> On Wed, Sep 7, 2016 at 2:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The reason it's called siftup is that that's what Knuth calls it.
>> See Algorithm 5.2.3H (Heapsort), pp 146-147 in the first edition of
>> Volume 3; tuplesort_heap_siftup corresponds directly to steps H3-H8.
>
> I see that Knuth explains that these steps form the siftup procedure.
> What steps does tuplesort_heap_insert correspond to, if any?

Hmm. The loop in tuplesort_heap_siftup() indeed looks the same as 
Knuth's steps H3-H8. But the start is different. Knuth's algorithm 
begins with the situation that the top node of the sub-heap is in wrong 
place, and it pushes it downwards, until the whole sub-heap satisfies 
the heap condition again. But tuplesort_heap_siftup() begins by moving 
the last value in the heap to the top, and then it does the H3-H8 steps 
to move it to the right place.

Using Wikipedia's terms, tuplesort_heap_siftup() function as whole is 
the Extract operation. And the loop inside it corresponds to the 
Max-Heapify operation, which is the same as Knuth's "siftup".

I still think tuplesort_heap_siftup is a confusing name, although I'm 
not sure that Peter's "compact" is much better. I suggest that we rename 
it to tuplesort_heap_delete_top(). In comments within the function, 
explain that the *loop* corresponds to the "siftup" in Knuth's book.

Interestingly, as Knuth explains the siftup algorithm, it performs a 
"replace the top" operation, rather than "remove the top". But we use it 
to remove the top, by moving the last element to the top and running the 
algorithm after that. Peter's other patch, to change the way we 
currently replace the top node, to do it in one pass rather than as 
delete+insert, is actually Knuth's siftup algorithm.

Peter, looking at your "displace" patch in this light, I think 
tuplesort_heap_root_displace() and tuplesort_heap_delete_top() (as I'm 
calling it now), should share a common subroutine. Displace replaces the 
top node with the new node passed as argument, and then calls the 
subroutine to push it down to the right place. Delete_top replaces the 
top node with the last value in the heap, and then calls the subroutine. 
Or perhaps delete_top should remove the last value from the heap, and 
then call displace() with it, to re-insert it.

- Heikki

Re: Is tuplesort_heap_siftup() a misnomer?

From

Peter Geoghegan

Date:

08 September 2016, 17:19:15

On Thu, Sep 8, 2016 at 12:01 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I still think tuplesort_heap_siftup is a confusing name, although I'm not
> sure that Peter's "compact" is much better. I suggest that we rename it to
> tuplesort_heap_delete_top(). In comments within the function, explain that
> the *loop* corresponds to the "siftup" in Knuth's book.

I'm also fine with that name.

> Interestingly, as Knuth explains the siftup algorithm, it performs a
> "replace the top" operation, rather than "remove the top". But we use it to
> remove the top, by moving the last element to the top and running the
> algorithm after that. Peter's other patch, to change the way we currently
> replace the top node, to do it in one pass rather than as delete+insert, is
> actually Knuth's siftup algorithm.

Knuth must have a strange sense of humor.

> Peter, looking at your "displace" patch in this light, I think
> tuplesort_heap_root_displace() and tuplesort_heap_delete_top() (as I'm
> calling it now), should share a common subroutine. Displace replaces the top
> node with the new node passed as argument, and then calls the subroutine to
> push it down to the right place. Delete_top replaces the top node with the
> last value in the heap, and then calls the subroutine. Or perhaps delete_top
> should remove the last value from the heap, and then call displace() with
> it, to re-insert it.

I can see the value in that, but I'd still rather not. The code reuse
win isn't all that large, and having a separate
tuplesort_heap_root_displace() routine allows us to explain what's
going on for merging, to assert what we're able to assert with tape
numbers vs. heap position, and to lose the HEAPCOMPARE()/checkIndex
cruft that the existing routines need (if only barely) to support
replacement selection. I would be surprised if with a very tight inner
loop like this, HEAPCOMPARE() has an appreciable overhead (maybe it
increases pipeline stalls).

-- 
Peter Geoghegan

Re: Is tuplesort_heap_siftup() a misnomer?

From

Tom Lane

Date:

08 September 2016, 17:40:50

Peter Geoghegan <pg@heroku.com> writes:
> On Thu, Sep 8, 2016 at 12:01 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> I still think tuplesort_heap_siftup is a confusing name, although I'm not
>> sure that Peter's "compact" is much better. I suggest that we rename it to
>> tuplesort_heap_delete_top(). In comments within the function, explain that
>> the *loop* corresponds to the "siftup" in Knuth's book.

> I'm also fine with that name.

I can live with it too.

>> Interestingly, as Knuth explains the siftup algorithm, it performs a
>> "replace the top" operation, rather than "remove the top". But we use it to
>> remove the top, by moving the last element to the top and running the
>> algorithm after that. Peter's other patch, to change the way we currently
>> replace the top node, to do it in one pass rather than as delete+insert, is
>> actually Knuth's siftup algorithm.

> Knuth must have a strange sense of humor.

I'm too lazy to get the book back down off the shelf to check, but I think
that Knuth uses sift-up to describe two different algorithms that have
essentially the same inner loop.
        regards, tom lane

Re: Is tuplesort_heap_siftup() a misnomer?

From

Peter Geoghegan

Date:

08 September 2016, 19:20:19

On Thu, Sep 8, 2016 at 10:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> On Thu, Sep 8, 2016 at 12:01 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>> I still think tuplesort_heap_siftup is a confusing name, although I'm not
>>> sure that Peter's "compact" is much better. I suggest that we rename it to
>>> tuplesort_heap_delete_top(). In comments within the function, explain that
>>> the *loop* corresponds to the "siftup" in Knuth's book.
>
>> I'm also fine with that name.
>
> I can live with it too.

Attached patch does it that way, then. I stuck with the reference to
"shift down", though, since I think we all agree that that is
unambiguous.

--
Peter Geoghegan

Attachment

0001-Rename-tuplesort_heap_siftup-function.patch

Re: Is tuplesort_heap_siftup() a misnomer?

From

Claudio Freire

Date:

08 September 2016, 19:29:50

On Thu, Sep 8, 2016 at 4:20 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Thu, Sep 8, 2016 at 10:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> On Thu, Sep 8, 2016 at 12:01 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>>> I still think tuplesort_heap_siftup is a confusing name, although I'm not
>>>> sure that Peter's "compact" is much better. I suggest that we rename it to
>>>> tuplesort_heap_delete_top(). In comments within the function, explain that
>>>> the *loop* corresponds to the "siftup" in Knuth's book.
>>
>>> I'm also fine with that name.
>>
>> I can live with it too.
>
> Attached patch does it that way, then. I stuck with the reference to
> "shift down", though, since I think we all agree that that is
> unambiguous.

I believe the term is "sift" not "shift"

Re: Is tuplesort_heap_siftup() a misnomer?

From

Peter Geoghegan

Date:

08 September 2016, 19:36:02

<p dir="ltr">Sift means shift up. There is no such thing as sift down, though, only shift down. That is my
understanding,based on the Wikipedia article on heaps. <p dir="ltr">--<br /> Peter Geoghegan

Re: Is tuplesort_heap_siftup() a misnomer?

From

Tom Lane

Date:

08 September 2016, 19:47:00

Peter Geoghegan <pg@heroku.com> writes:
> Attached patch does it that way, then. I stuck with the reference to
> "shift down", though, since I think we all agree that that is
> unambiguous.

I dunno.  What you've now got is
/* * The tuple at state->memtuples[0] has been removed from the heap.
- * Decrement memtupcount, and sift up to maintain the heap invariant.
+ * Decrement memtupcount, and shift down to maintain the heap invariant. */static void
-tuplesort_heap_siftup(Tuplesortstate *state, bool checkIndex)
+tuplesort_heap_delete_top(Tuplesortstate *state, bool checkIndex)

I don't find this clearer at all, because

(1) As noted in the comment, the heap top has *already* been removed,
we're just trying to re-establish the heap invariant.  So maybe
"delete_top" isn't the best name after all.

(2) "shift down" seems a bit weird, since we're moving data closer to
the heap top, not further away from it; why isn't that direction "up"?

(3) "shift" makes it sound like it ought to be a simple memmove
kind of operation, which it surely is not.
        regards, tom lane

Re: Is tuplesort_heap_siftup() a misnomer?

From

Peter Geoghegan

Date:

08 September 2016, 20:12:40

On Thu, Sep 8, 2016 at 12:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>  /*
>   * The tuple at state->memtuples[0] has been removed from the heap.
> - * Decrement memtupcount, and sift up to maintain the heap invariant.
> + * Decrement memtupcount, and shift down to maintain the heap invariant.
>   */
>  static void
> -tuplesort_heap_siftup(Tuplesortstate *state, bool checkIndex)
> +tuplesort_heap_delete_top(Tuplesortstate *state, bool checkIndex)
>
> I don't find this clearer at all, because
>
> (1) As noted in the comment, the heap top has *already* been removed,
> we're just trying to re-establish the heap invariant.  So maybe
> "delete_top" isn't the best name after all.

This is why I suggested tuplesort_heap_compact(). When merging, there
is a comment associated with a call to the function, "compact the
heap". I based my name off of that, thinking that that was really no
different to any other caller.

> (2) "shift down" seems a bit weird, since we're moving data closer to
> the heap top, not further away from it; why isn't that direction "up"?

Well, the fact that the routine does that makes it a higher level
thing than something like a sift or a shift. It might just as easily
be some other tuple (e.g., from the caller), as far as "shifting down"
goes. (I wrote a patch where for merging, it *is* from the caller, and
the heap stays the same size, which Heikki mentioned on this thread.)

> (3) "shift" makes it sound like it ought to be a simple memmove
> kind of operation, which it surely is not.

Then, how about the Sedgewick terminology, "sink"? I'm really not
attached to that aspect. I do think that "shift down" is unambiguous,
but I'd just as soon use "sink", or even explicitly spell it out.

-- 
Peter Geoghegan

Re: Is tuplesort_heap_siftup() a misnomer?

From

Andres Freund

Date:

08 September 2016, 20:14:35

Hi,

Is this issue really worth keeping several hackers busy?

Andres

Re: Is tuplesort_heap_siftup() a misnomer?

From

Peter Geoghegan

Date:

08 September 2016, 20:16:28

On Thu, Sep 8, 2016 at 1:14 PM, Andres Freund <andres@anarazel.de> wrote:
> Is this issue really worth keeping several hackers busy?

I don't think it's fair to put it to me specifically that I'm doing
that. That said, I would like to see this resolved without further
bikeshedding.


-- 
Peter Geoghegan

Re: Is tuplesort_heap_siftup() a misnomer?

From

Tom Lane

Date:

08 September 2016, 21:09:56

Peter Geoghegan <pg@heroku.com> writes:
> On Thu, Sep 8, 2016 at 1:14 PM, Andres Freund <andres@anarazel.de> wrote:
>> Is this issue really worth keeping several hackers busy?

> I don't think it's fair to put it to me specifically that I'm doing
> that. That said, I would like to see this resolved without further
> bikeshedding.

Well, my vote is that it ain't broke and we shouldn't fix it.
        regards, tom lane

Re: Is tuplesort_heap_siftup() a misnomer?

From

Peter Geoghegan

Date:

08 September 2016, 21:45:32

On Thu, Sep 8, 2016 at 2:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Well, my vote is that it ain't broke and we shouldn't fix it.

To take a step back, what prompted this whole discussion is the patch
that I wrote that shifts down, replacing calls to
tuplesort_heap_siftup() and tuplesort_heap_insert with one new call to
a function I've called tuplesort_heap_root_displace() (today, Claudio
reports that it makes some of his test queries go 25% faster). This
new function shifts down. It's not clear what I'm supposed to say
about that, given the current understanding. So, in a sense, it's
blocking on this.


-- 
Peter Geoghegan

Re: Is tuplesort_heap_siftup() a misnomer?

From

Claudio Freire

Date:

08 September 2016, 21:55:14

On Thu, Sep 8, 2016 at 2:19 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> Peter, looking at your "displace" patch in this light, I think
>> tuplesort_heap_root_displace() and tuplesort_heap_delete_top() (as I'm
>> calling it now), should share a common subroutine. Displace replaces the top
>> node with the new node passed as argument, and then calls the subroutine to
>> push it down to the right place. Delete_top replaces the top node with the
>> last value in the heap, and then calls the subroutine. Or perhaps delete_top
>> should remove the last value from the heap, and then call displace() with
>> it, to re-insert it.
>
> I can see the value in that, but I'd still rather not. The code reuse
> win isn't all that large, and having a separate
> tuplesort_heap_root_displace() routine allows us to explain what's
> going on for merging, to assert what we're able to assert with tape
> numbers vs. heap position, and to lose the HEAPCOMPARE()/checkIndex
> cruft that the existing routines need (if only barely) to support
> replacement selection. I would be surprised if with a very tight inner
> loop like this, HEAPCOMPARE() has an appreciable overhead (maybe it
> increases pipeline stalls).

BTW, regarding this, since I read in some other thread that it was ok
to use static inline functions, I believe the compiler is smart enough
to optimize out the constant true/false in checkIndex for inlined
calls, so that may be viable (on modern compilers).