Re: Inlining comparators as a performance optimisation - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Inlining comparators as a performance optimisation
Date
Msg-id CA+TgmoYr8O_fK3WvOPXzU6bzZ0utCfTxUZPN3=nYon3Wfh+KHQ@mail.gmail.com
Whole thread Raw
In response to Re: Inlining comparators as a performance optimisation  (Peter Geoghegan <peter@2ndquadrant.com>)
Responses Re: Inlining comparators as a performance optimisation
Re: Inlining comparators as a performance optimisation
Re: Inlining comparators as a performance optimisation
List pgsql-hackers
On Sun, Sep 25, 2011 at 10:12 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> I've produced something much neater than my first patch, attached,
> although I still consider this to be at the POC stage, not least since
> I'm not exactly sure how I should be selecting the right
> specialisation in tuplesort_performsort() (a hack is used right now
> that does a direct pg_proc OID comparison), and also because I haven't
> implemented anything other than qsort for heap tuples yet (a minor
> detail, I suppose). I'm pleased to say that a much clearer picture of
> what's really going on here has emerged.
>
> Statistics: Total runtime according to explain analyze for query
> "select * from orderlines order by prod_id" (dellstore2 sample db), at
> GCC 4.5's -02 optimisation level, after warming the cache, on my
> desktop:
>
> Without the patch:
>
> ~82ms
>
> With the patch, but with the "inline" keyword commented out for all
> new functions/meta-functions:
>
> ~60ms
>
> with the patch, unmodified:
>
> ~52ms

Reviewing away when I should be sleeping, wahoo...!

I think that we should really consider doing with this patch what Tom
suggested upthread; namely, looking for a mechanism to allow
individual datatypes to offer up a comparator function that doesn't
require bouncing through FunctionCall2Coll().  It seems to me that
duplicating the entire qsort() algorithm is a little iffy.  Sure, in
this case it works out to a win.  But it's only a small win -
three-quarters of it is in the uncontroversial activity of reducing
the impedance mismatch - and how do we know it will always be a win?
Adding more copies of the same code can be an anti-optimization if it
means that a smaller percentage of it fits in the instruction cache,
and sometimes small changes in runtime are caused by random shifts in
the layout of memory that align things more or less favorably across
cache lines rather than by real effects.  Now it may well be that this
is a real effect, but will it still look as good when we do this for
10 data types?  For 100 data types?

In contrast, it seems to me that reducing the impedance mismatch is
something that we could go and do across our entire code base, and
every single data type would benefit from it.  It would also be
potentially usable by other sorting algorithms, not just quick sort.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: RangeVarGetRelid()
Next
From: Etsuro Fujita
Date:
Subject: Re: WIP: Collecting statistics on CSV file data