Re: [PATCH] binary heap implementation - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [PATCH] binary heap implementation
Date
Msg-id 20121120214617.GB4171@awork2.anarazel.de
Whole thread Raw
In response to Re: [PATCH] binary heap implementation  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2012-11-20 15:47:25 -0500, Robert Haas wrote:
> On Tue, Nov 20, 2012 at 3:19 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > I don't have a fundamental problem with it, but I don't think it's worth
> > the cost. If you have variable sized (from the point of the compiler)
> > values you either have more complex offset computations to the
> > individual elements or one more indirection due to a pointer array.
> >
> > Do you feel strongly about it? If so, go ahead, otherwise I would prefer
> > the current way (with parameters changed to be Datum's of course).
>
> Here's a revised patch that takes the approach of just changing void *
> to Datum; it includes a bunch of other cleanups that I felt were
> worthwhile.  I tested this using the following setup:
>
> CREATE TABLE sample (a int, b numeric);
> DO $$
> BEGIN
>     FOR i IN 1 .. 1000 LOOP
>                 EXECUTE 'CREATE TABLE sample' || i || ' (CHECK (b = ' || i
>                         || ')) INHERITS (sample)';
>                 EXECUTE 'INSERT INTO sample' || i
>                         || ' SELECT g, ' || i || ' FROM
> generate_series(1,100000) g;';
>                 EXECUTE 'ALTER TABLE sample' || i
>                         || ' ADD PRIMARY KEY (a)';
>         END LOOP;
> END
> $$;
> VACUUM ANALYZE;
>
> And this test query:
>
> select sum(1) from (select * from sample order by a limit 250) x;
>
> On my system, on repeated execution, this takes about 113 ms with
> unpatched master and about 114 ms with the patch.  I'm inclined to
> think that's not worth getting excited about, as it could be simply
> code shifting around across cache line boundaries and not indicative
> of an actual regression due to the difference in logic.  Even if there
> is a regression, it's pretty darn small: most people will not have
> 1000 partitions, and those that do will often find query execution
> time dominated by other factors.  Against that, there is positive
> value in having this code somewhat more abstracted.

To make sure were not regressing I ran some more testcases that I could
think of/find and didn't find anything problematic. Either both are
equally fast or the new code is faster.

FWIW for me the new code is very slightly faster in your example, but
only if I apply it ontop of fklocks (where I applied it accidentally at
first), not if ontop of 273986bf0d39e5166eb15ba42ebff4803e23a688 where
its reversed, so your code-layout theory sounds likely.

> With respect to the question of the API, no, I don't feel
> overwhelmingly strongly that we have to whack the API with a bat
> that's larger than the one I've used here.  On the flip side, it
> wouldn't surprise me if, as the code acquires more users, we find that
> we actually do want to make some changes, either the one I already
> proposed or something else.  Fortunately, it's not a lot of code, so
> if we end up refactoring it some more later, it shouldn't be a big
> deal.  So I'm happy enough to commit this the way I have it and sort
> out the rest as we go.  If there are no objections I'll go ahead and
> do that.

Looks good to me. Adapting when we have the actual requirements sounds
fine & sensible as well...

Thanks,

Andres


--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Do we need so many hint bits?
Next
From: Tom Lane
Date:
Subject: Re: Do we need so many hint bits?