Re: Inlining comparators as a performance optimisation - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Inlining comparators as a performance optimisation |
Date | |
Msg-id | CAEYLb_U1oY=HzU-v14+s=cDkY1st0L7kvKaOLv2hMdh0jF_uhQ@mail.gmail.com Whole thread Raw |
In response to | Re: Inlining comparators as a performance optimisation (Peter Geoghegan <peter@2ndquadrant.com>) |
Responses |
Re: Inlining comparators as a performance
optimisation
Re: Inlining comparators as a performance optimisation |
List | pgsql-hackers |
Attached are the results from performing a similar process to the prior benchmark, but on Greg Smith's high-end server, and with an orderlines table that has been "doubled-up" until it is 1538 MB, making the same old query perform a quicksort that's over 3GB. Short version: HEAD is 20468.0ms, with my patch it's 13689.698ms, so these gains hold-up very well for very large in-memory sorts, possibly even perfectly. Note that the "doubling up" had an interesting and desirable effect for the purposes of this patch review: It's approaching a worst case due to the low cardinality of the product + quantity columns, which crops up with the non-inlined functions only (int4regqsort_arg, etc). All too frequently, evaluating the first scankey results in equality, and so we cannot elide the SQL function call machinery. This is going to dampen the improvement for the multiple scanKey case considerably (and it looks like a smaller relative improvement than when the orderlines table was quite a lot smaller). As I said from the outset, there is no worst case for the single scanKey case that occurs to me. Multiple scanKeys are likely to be a problem for any effort to offer user-defined, per-type sort functions. I could probably make int4regqsort_arg and friends perform a bit better than we see here by increasing the cardinality of those two columns for the second query, so the first scanKey comparison would frequently suffice. Incidentally, I'm pretty sceptical of the idea of any effort being made to provide user-defined per-type sort functions or anything like that. No one has come forward with a novel sorting implementation that looks useful, although there has been some talk of some. It's relatively easy to hack on tuplesort to build a prototype, so I don't think the lack of this functionality is the blocker here. Besides, we will probably end up doing a terrible job of anticipating how invasive such a facility needs to be to facilitate these novel sorting implementations. While I'm very encouraged by these results, I must admit that there are a few things that I cannot currently explain. I don't know why the second query on the "Optimized" sheet outperforms the same query on the "Not inlined" page. When you consider how small the standard deviation is for each set of results, it's fairly obvious that it cannot be explained by noise. I thought it was weird, so I re-ran both benchmarks, with much the same outcome. It may be that the compiler gets lucky for the optimized case, resulting in a bit of an extra gain, and that effect is ridiculously magnified. In all cases, the regression tests pass. The single key/inlining optimisation seems to account for a big part of the gains, and the "Not inlined" figures for the first query would seem to suggest that the inlining can account for more than half of gains seen, whereas that was previously assumed to be less. What I think is happening here is that we're benefiting both from inlining as well as not having to even consider multiple scanKeys (we have to at least consider them even if we know in advance from the nScankeys variable that there'll never be multiple scanKeys). Again, if the cardinality was higher, we'd probably see "Not inlined" do better here. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Attachment
pgsql-hackers by date: