Re: Inlining comparators as a performance optimisation - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Inlining comparators as a performance optimisation |
Date | |
Msg-id | CAEYLb_Us1cSxn94G2CyhSp+r1TxrghV6sj70htAtwL1K5cWLHg@mail.gmail.com Whole thread Raw |
In response to | Re: Inlining comparators as a performance optimisation (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Inlining comparators as a performance optimisation
[PATCH] POC: inline int4 comparison in tuplesort Re: Inlining comparators as a performance optimisation |
List | pgsql-hackers |
On 20 September 2011 03:51, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Considering that -O2 is our standard optimization level, that > observation seems to translate to "this patch will be useless in > practice". I think you had better investigate that aspect in some > detail before spending more effort. I don't think that the fact that that happens is at all significant at this early stage, and it never even occurred to me that you'd think that it might be. I was simply disclosing a quirk of this POC patch. The workaround is probably to use a macro instead. For the benefit of those that didn't follow the other threads, the macro-based qsort implementation, which I found to perform significantly better than regular qsort(), runs like this on my laptop when I built at 02 with GCC 4.6 just now: C stdlib quick-sort time elapsed: 2.092451 seconds Inline quick-sort time elapsed: 1.587651 seconds Does *that* look attractive to you? I've attached source code of the program that produced these figures, which has been ported to C from C++. When I #define LARGE_SIZE 100000000, here's what I see: [peter@peter inline_compar_test]$ ./a.out C stdlib quick-sort time elapsed: 23.659411 seconds Inline quick-sort time elapsed: 18.470611 seconds Here, sorting with the function pointer/stdlib version takes about 1.28 times as long. In the prior test (with the smaller LARGE_SIZE), it took about 1.32 times as long. Fairly predictable, linear, and not to be sniffed at. The variance I'm seeing across runs is low - a couple of hundredths of a second at most. This is a Fedora 15 " Intel(R) Core(TM) i5-2540M CPU @ 2.60GHz" machine. I'm not sure right now why the inline quick-sort is less of a win than on my old Fedora 14 desktop (where it was 3.24 Vs 2.01), but it's still a significant win. Perhaps others can build this simple program and tell me what they come up with. >> This performance patch differs from most in that it's difficult in >> principle to imagine a performance regression occurring. > > Really? N copies of the same code could lead to performance loss just > due to code bloat (ie, less of a query's inner loops fitting in CPU > cache). I did consider that. Of course inlining has an overhead, and I'll be testing that each instance of inlining has a net benefit. I just meant that many other performance patches have an obvious worst case, and I think that it is policy to focus on that case, but I can't think of one here. Does anyone else have any ideas? > Not to mention the clear regression in maintainability. So > I'm disinclined to consider this sort of change without a significantly > bigger win than you're suggesting above Sure, there'll be some sort of regression in maintainability - I think that HOT had a clear regression in maintainability too. The important questions are obviously "how big is the loss of maintainability?", and "is it worth it?". We'll know more when this work is actually shaped into a proper patch. Perhaps I should have waited until I had something along those lines before making an announcement, but I wanted community input as early as possible. I think that there's plenty of tweaking that can be done to get additional performance improvements - all I've done so far is demonstrate that those improvements are real and worth thinking about, in the fastest possible way, partly because you expressed skepticism of the benefits of inlining comparators to Greg Stark in an earlier thread. Performance and maintainability are often somewhat in tension, but we cannot ignore performance. If this work can bring us an improvement in performance approaching the isolated macro Vs qsort() function pointer benchmark, that's a *big* win. Sorting integers and floats is very common and important. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Attachment
pgsql-hackers by date: