Re: Inlining comparators as a performance optimisation - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Inlining comparators as a performance optimisation
Date
Msg-id CAEYLb_U1oY=HzU-v14+s=cDkY1st0L7kvKaOLv2hMdh0jF_uhQ@mail.gmail.com
Whole thread Raw
In response to Re: Inlining comparators as a performance optimisation  (Peter Geoghegan <peter@2ndquadrant.com>)
Responses Re: Inlining comparators as a performance optimisation
Re: Inlining comparators as a performance optimisation
List pgsql-hackers
Attached are the results from performing a similar process to the
prior benchmark, but on Greg Smith's high-end server, and with an
orderlines table that has been "doubled-up" until it is 1538 MB,
making the same old query perform a quicksort that's over 3GB. Short
version:  HEAD is 20468.0ms, with my patch it's 13689.698ms, so these
gains hold-up very well for very large in-memory sorts, possibly even
perfectly.

Note that the "doubling up" had an interesting and desirable effect
for the purposes of this patch review: It's approaching a worst case
due to the low cardinality of the product + quantity columns, which
crops up with the non-inlined functions only (int4regqsort_arg, etc).
All too frequently, evaluating the first scankey results in equality,
and so we cannot elide the SQL function call machinery. This is going
to dampen the improvement for the multiple scanKey case considerably
(and it looks like a smaller relative improvement than when the
orderlines table was quite a lot smaller). As I said from the outset,
there is no worst case for the single scanKey case that occurs to me.
Multiple scanKeys are likely to be a problem for any effort to offer
user-defined, per-type sort functions. I could probably make
int4regqsort_arg and friends perform a bit better than we see here by
increasing the cardinality of those two columns for the second query,
so the first scanKey comparison would frequently suffice.

Incidentally, I'm pretty sceptical of the idea of any effort being
made to provide user-defined per-type sort functions or anything like
that. No one has come forward with a novel sorting implementation that
looks useful, although there has been some talk of some. It's
relatively easy to hack on tuplesort to build a prototype, so I don't
think the lack of this functionality is the blocker here. Besides, we
will probably end up doing a terrible job of anticipating how invasive
such a facility needs to be to facilitate these novel sorting
implementations.

While I'm very encouraged by these results, I must admit that there
are a few things that I cannot currently explain. I don't know why the
second query on the "Optimized" sheet outperforms the same query on
the "Not inlined" page. When you consider how small the standard
deviation is for each set of results, it's fairly obvious that it
cannot be explained by noise. I thought it was weird, so I re-ran both
benchmarks, with much the same outcome. It may be that the compiler
gets lucky for the optimized case, resulting in a bit of an extra
gain, and that effect is ridiculously magnified. In all cases, the
regression tests pass.

The single key/inlining optimisation seems to account for a big part
of the gains, and the "Not inlined" figures for the first query would
seem to suggest that the inlining can account for more than half of
gains seen, whereas that was previously assumed to be less. What I
think is happening here is that we're benefiting both from inlining as
well as not having to even consider multiple scanKeys (we have to at
least consider them even if we know in advance from the nScankeys
variable that there'll never be multiple scanKeys). Again, if the
cardinality was higher, we'd probably see "Not inlined" do better
here.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachment

pgsql-hackers by date:

Previous
From: Hitoshi Harada
Date:
Subject: Re: Feature proposal: www_fdw
Next
From: "Mr. Aaron W. Swenson"
Date:
Subject: Re: Patch: Perl xsubpp