Re: [PATCH] binary heap implementation - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [PATCH] binary heap implementation
Date
Msg-id 20121120201938.GB26019@awork2.anarazel.de
Whole thread Raw
In response to Re: [PATCH] binary heap implementation  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [PATCH] binary heap implementation  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2012-11-20 15:06:58 -0500, Robert Haas wrote:
> On Tue, Nov 20, 2012 at 2:29 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2012-11-20 14:03:12 -0500, Robert Haas wrote:
> >> On Thu, Nov 15, 2012 at 8:56 PM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
> >> > [ new patch ]
> >>
> >> I went over this again today with a view to getting it committed, but
> >> discovered some compiler warnings that look like this:
> >>
> >> warning: cast to pointer from integer of different size
> >>
> >> The problem seems to be that the binary heap implementation wants the
> >> key and value to be a void *, but the way it's being used in
> >> nodeMergeAppend.c the value is actually an int.  I don't think we want
> >> to commit a binary heap implementation which has an impedance mismatch
> >> of this type with its only client.  The obvious solution seems to be
> >> to make the key and value each a Datum, and then clients can use
> >> WhateverGetDatum and DatumGetWhatever to pass whatever built-in data
> >> type they happen to have.  I'm trying that approach now and will post
> >> an updated patch based on that approach if it seems to work OK and
> >> nobody suggests something better between now and then.
> >
> > Sounds like a good plan, one other alternative would have been declaring
> > the parameters a size_t (and thus explicitly always big enough to store
> > a pointer, but also valid to store inline data) instead of using a
> > void*. But given there is DatumGetPointer/PointerGetDatum et al, using
> > the postgres-y datatype seems fine to me.
> >
> > In the LCR case those really are pointers, so preserving that case is
> > important to me ;), but your proposal does that, so ...

> On further reflection, I'm wondering if it wouldn't be a smarter idea
> to just get rid of the binaryheap_node concept altogether.  Instead,
> when you allocate a binary heap, you specify an "element size" as well
> as a capacity.  Then, the binary heap can truly treat the elements as
> an opaque array of bytes rather than a struct of any particular type.
> Of course, the disadvantage of this approach is that it's likely to be
> slightly slower, because we'll need to do a multiplication by a
> variable rather than simple bit shift.  But the extra flexibility
> might be worth it, because for example for merge append this is kind
> of inefficient from a storage perspective.  We only really need N+1
> pointers, but we've got storage for 2N.  With the above change plus a
> user-specified third argument for the comparator function (passed as
> yet another argument to binaryheap_allocate) we could get around that
> while preserving the option for LCR or any other client code to do as
> it sees fit.

> Thoughts?

I don't have a fundamental problem with it, but I don't think it's worth
the cost. If you have variable sized (from the point of the compiler)
values you either have more complex offset computations to the
individual elements or one more indirection due to a pointer array.

Do you feel strongly about it? If so, go ahead, otherwise I would prefer
the current way (with parameters changed to be Datum's of course).

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [PATCH] binary heap implementation
Next
From: Alvaro Herrera
Date:
Subject: Re: foreign key locks