Re: [PATCH] binary heap implementation - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [PATCH] binary heap implementation
Date
Msg-id CA+TgmoaajxCvcuJBQBx0+Kvz4bq=cuKwxggLi-v+3+NPKgwzGQ@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] binary heap implementation  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: [PATCH] binary heap implementation
List pgsql-hackers
On Tue, Nov 20, 2012 at 2:29 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2012-11-20 14:03:12 -0500, Robert Haas wrote:
>> On Thu, Nov 15, 2012 at 8:56 PM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
>> > [ new patch ]
>>
>> I went over this again today with a view to getting it committed, but
>> discovered some compiler warnings that look like this:
>>
>> warning: cast to pointer from integer of different size
>>
>> The problem seems to be that the binary heap implementation wants the
>> key and value to be a void *, but the way it's being used in
>> nodeMergeAppend.c the value is actually an int.  I don't think we want
>> to commit a binary heap implementation which has an impedance mismatch
>> of this type with its only client.  The obvious solution seems to be
>> to make the key and value each a Datum, and then clients can use
>> WhateverGetDatum and DatumGetWhatever to pass whatever built-in data
>> type they happen to have.  I'm trying that approach now and will post
>> an updated patch based on that approach if it seems to work OK and
>> nobody suggests something better between now and then.
>
> Sounds like a good plan, one other alternative would have been declaring
> the parameters a size_t (and thus explicitly always big enough to store
> a pointer, but also valid to store inline data) instead of using a
> void*. But given there is DatumGetPointer/PointerGetDatum et al, using
> the postgres-y datatype seems fine to me.
>
> In the LCR case those really are pointers, so preserving that case is
> important to me ;), but your proposal does that, so ...

On further reflection, I'm wondering if it wouldn't be a smarter idea
to just get rid of the binaryheap_node concept altogether.  Instead,
when you allocate a binary heap, you specify an "element size" as well
as a capacity.  Then, the binary heap can truly treat the elements as
an opaque array of bytes rather than a struct of any particular type.
Of course, the disadvantage of this approach is that it's likely to be
slightly slower, because we'll need to do a multiplication by a
variable rather than simple bit shift.  But the extra flexibility
might be worth it, because for example for merge append this is kind
of inefficient from a storage perspective.  We only really need N+1
pointers, but we've got storage for 2N.  With the above change plus a
user-specified third argument for the comparator function (passed as
yet another argument to binaryheap_allocate) we could get around that
while preserving the option for LCR or any other client code to do as
it sees fit.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: WIP: index support for regexp search
Next
From: Andres Freund
Date:
Subject: Re: [PATCH] binary heap implementation