Re: Inlining comparators as a performance optimisation - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Inlining comparators as a performance optimisation
Date
Msg-id CA+TgmoYusPUmR2yNjrryHStAu7oeYVpWxV8NmPm_tdT-_jJPSw@mail.gmail.com
Whole thread Raw
In response to Re: Inlining comparators as a performance optimisation  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Fri, Nov 18, 2011 at 3:53 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> We have no proof that we need to do this for 10 or 100 data types. We
> only currently have proof that there is gain for the most common
> types.

Well, that's kind of my point.  I think this needs more work before we
decide what the best approach is.  So far, the ONLY test result we
have that compares inlining to not inlining shows a speedup from 60 ms
to 52 ms.  I think that an 8 ms speedup on one test with one datatype
on one platform/compiler combination isn't sufficient evidence to
conclude that this is the best possible approach.

I think the way to look at this is that this patch contains two
possibly good ideas: one of which is to make the qsort() argument
match the qsort() calling convention, and the other of which is to
have multiple copies of the qsort() logic that inline the comparators
for their respective datatypes.  Tom hypothesized that most of the
benefit was in the first idea, and the numbers that Peter posted seem
to support that conclusion.  The first idea is also less invasive and
more broadly applicable, so to me that seems like the first thing to
pursue.

Now, that doesn't mean that we shouldn't consider the second idea as
well, but I don't believe that the evidence presented thus far is
sufficient to prove that we should go that route.  It seems entirely
possible that inlining any non-trivial comparator function could work
out to a loss, or that the exact choice of compiler flags could affect
whether inlining works out to a plus or a minus (what happens with -O3
vs. -O2?  what about -O1?  what about -O2 -fno-omit-frame-pointer?
what if they're using HP's aCC or Intel's icc rather than gcc?).
There's no point in installing an optimization that could easily be a
pessimization on some other workload, and that hasn't been tested, or
at least no results have been posted here.  On the other hand,
matching the calling convention of the comparator function to what
qsort() wants and eliminating the trampoline seems absolutely certain
to be a win in every case, and based on the work Peter has done it
seems like it might be a quite large win.

In fact, you have to ask yourself just exactly how much our
function-calling convention is costing us in general.  We use that
mechanism an awful lot and whatever loss of cycles is involved would
be spread all over the code base where oprofile or gprof won't easily
be able to pick it out.  Even the cost at any particular call site
will be split between the caller and the callee.  There might be more
stuff we could optimize here than just sorting...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: RangeVarGetRelid()
Next
From: Thom Brown
Date:
Subject: Re: [GENERAL] VACUUM touching file but not updating relation