Re: [PATCH] binary heap implementation - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [PATCH] binary heap implementation
Date
Msg-id CA+TgmobvR7XW9fjj2RNY7sKK-VAG5nahfai_zV51rHVLDNvaBg@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] binary heap implementation  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: [PATCH] binary heap implementation
Re: [PATCH] binary heap implementation
List pgsql-hackers
On Tue, Nov 20, 2012 at 3:19 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> I don't have a fundamental problem with it, but I don't think it's worth
> the cost. If you have variable sized (from the point of the compiler)
> values you either have more complex offset computations to the
> individual elements or one more indirection due to a pointer array.
>
> Do you feel strongly about it? If so, go ahead, otherwise I would prefer
> the current way (with parameters changed to be Datum's of course).

Here's a revised patch that takes the approach of just changing void *
to Datum; it includes a bunch of other cleanups that I felt were
worthwhile.  I tested this using the following setup:

CREATE TABLE sample (a int, b numeric);
DO $$
BEGIN
    FOR i IN 1 .. 1000 LOOP
                EXECUTE 'CREATE TABLE sample' || i || ' (CHECK (b = ' || i
                        || ')) INHERITS (sample)';
                EXECUTE 'INSERT INTO sample' || i
                        || ' SELECT g, ' || i || ' FROM
generate_series(1,100000) g;';
                EXECUTE 'ALTER TABLE sample' || i
                        || ' ADD PRIMARY KEY (a)';
        END LOOP;
END
$$;
VACUUM ANALYZE;

And this test query:

select sum(1) from (select * from sample order by a limit 250) x;

On my system, on repeated execution, this takes about 113 ms with
unpatched master and about 114 ms with the patch.  I'm inclined to
think that's not worth getting excited about, as it could be simply
code shifting around across cache line boundaries and not indicative
of an actual regression due to the difference in logic.  Even if there
is a regression, it's pretty darn small: most people will not have
1000 partitions, and those that do will often find query execution
time dominated by other factors.  Against that, there is positive
value in having this code somewhat more abstracted.

With respect to the question of the API, no, I don't feel
overwhelmingly strongly that we have to whack the API with a bat
that's larger than the one I've used here.  On the flip side, it
wouldn't surprise me if, as the code acquires more users, we find that
we actually do want to make some changes, either the one I already
proposed or something else.  Fortunately, it's not a lot of code, so
if we end up refactoring it some more later, it shouldn't be a big
deal.  So I'm happy enough to commit this the way I have it and sort
out the rest as we go.  If there are no objections I'll go ahead and
do that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: foreign key locks
Next
From: Pavel Stehule
Date:
Subject: review: plpgsql return a row-expression