Re: [PATCH] binary heap implementation - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [PATCH] binary heap implementation
Date
Msg-id 17109.1353452470@sss.pgh.pa.us
Whole thread Raw
In response to Re: [PATCH] binary heap implementation  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [PATCH] binary heap implementation  (Abhijit Menon-Sen <ams@2ndQuadrant.com>)
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> Here's a revised patch that takes the approach of just changing void *
> to Datum; it includes a bunch of other cleanups that I felt were
> worthwhile.

The comment in binaryheap.h says
* For a max-heap, the comparator must return -1 iff a < b, 0 iff a == b,* and +1 iff a > b.  For a min-heap, the
conditionsare reversed.
 

Is it actually required that the output be exactly these three values,
or is the specification really <0, 0, >0 ?  If the latter, it should
say that, rather than overdefine the implementation of comparison.

Also, wouldn't it be reasonable to const-ify the comparator function, ie

typedef int (*binaryheap_comparator) (const binaryheap_node *a, const binaryheap_node *b);
*        nodes            the first of a list of "space" nodes

s/list/array/, no?  Also, it's standard to mark this sort of hack with
/* VARIABLE LENGTH ARRAY */, or even better use FLEXIBLE_ARRAY_MEMBER
(which would require fixing the space calculation in
binaryheap_allocate, not that that would be a bad thing anyway).

left_offset and friends are defined as taking size_t and returning int,
which is broken on machines where size_t is wider than int, or even on
machines where they're the same width since int is signed.  Personally
I'd replace pretty much every single usage of size_t in this module with
int, as I am unconvinced either that we need 8-byte indexes or that the
logic is correct if it tries to form index -1 (as would happen for
example with binaryheap_build applied to an empty heap).

The Assert() in binaryheap_add_unordered fills me with no confidence.
If we're not going to support expansion of the array (which might be a
better idea anyway) then this needs to be an actual run-time check, not
something that won't be there in production.

What I think *would* be worth asserting for is whether the heap is
correctly ordered or not --- that is, I think you should add a flag that
is cleared by binaryheap_add_unordered and set by binaryheap_build, and
then the functions that presume correct ordering should assert it's set.
An Assert in binaryheap_replace_first that the heap is not empty seems
appropriate as well.

This in binaryheap_replace_first is simply bizarre:
if (key)    heap->nodes[0].key = key;if (val)    heap->nodes[0].value = val;

Why are these assignments conditional?  If they're sane at all, they
should not be testing the Datums directly in any case, but the result of
DatumGetSomething.
 static int32
! heap_compare_slots(binaryheap_node *a, binaryheap_node *b) {
+     MergeAppendState *node = (MergeAppendState *) a->value;

This should use DatumGetPointer(a->value), and the function result
should be int not int32 to comport with the typedef for it.

While I'm thinking about it, why are the fields of a binaryheap_node
called "key" and "value"?  That implies a semantics not actually used
here.  Maybe "value1" and "value2" instead?
        regards, tom lane



pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Re: StrategyGetBuffer questions
Next
From: Merlin Moncure
Date:
Subject: Re: StrategyGetBuffer questions