Re: [PATCH] binary heap implementation - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [PATCH] binary heap implementation |
Date | |
Msg-id | 17109.1353452470@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [PATCH] binary heap implementation (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [PATCH] binary heap implementation
|
List | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > Here's a revised patch that takes the approach of just changing void * > to Datum; it includes a bunch of other cleanups that I felt were > worthwhile. The comment in binaryheap.h says * For a max-heap, the comparator must return -1 iff a < b, 0 iff a == b,* and +1 iff a > b. For a min-heap, the conditionsare reversed. Is it actually required that the output be exactly these three values, or is the specification really <0, 0, >0 ? If the latter, it should say that, rather than overdefine the implementation of comparison. Also, wouldn't it be reasonable to const-ify the comparator function, ie typedef int (*binaryheap_comparator) (const binaryheap_node *a, const binaryheap_node *b); * nodes the first of a list of "space" nodes s/list/array/, no? Also, it's standard to mark this sort of hack with /* VARIABLE LENGTH ARRAY */, or even better use FLEXIBLE_ARRAY_MEMBER (which would require fixing the space calculation in binaryheap_allocate, not that that would be a bad thing anyway). left_offset and friends are defined as taking size_t and returning int, which is broken on machines where size_t is wider than int, or even on machines where they're the same width since int is signed. Personally I'd replace pretty much every single usage of size_t in this module with int, as I am unconvinced either that we need 8-byte indexes or that the logic is correct if it tries to form index -1 (as would happen for example with binaryheap_build applied to an empty heap). The Assert() in binaryheap_add_unordered fills me with no confidence. If we're not going to support expansion of the array (which might be a better idea anyway) then this needs to be an actual run-time check, not something that won't be there in production. What I think *would* be worth asserting for is whether the heap is correctly ordered or not --- that is, I think you should add a flag that is cleared by binaryheap_add_unordered and set by binaryheap_build, and then the functions that presume correct ordering should assert it's set. An Assert in binaryheap_replace_first that the heap is not empty seems appropriate as well. This in binaryheap_replace_first is simply bizarre: if (key) heap->nodes[0].key = key;if (val) heap->nodes[0].value = val; Why are these assignments conditional? If they're sane at all, they should not be testing the Datums directly in any case, but the result of DatumGetSomething. static int32 ! heap_compare_slots(binaryheap_node *a, binaryheap_node *b) { + MergeAppendState *node = (MergeAppendState *) a->value; This should use DatumGetPointer(a->value), and the function result should be int not int32 to comport with the typedef for it. While I'm thinking about it, why are the fields of a binaryheap_node called "key" and "value"? That implies a semantics not actually used here. Maybe "value1" and "value2" instead? regards, tom lane
pgsql-hackers by date: