Re: Inlining comparators as a performance optimisation - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Inlining comparators as a performance optimisation |
Date | |
Msg-id | CAEYLb_W64XrbzOnQVeQ4BO8CjGjrRB2rToORTUgLDiAeDWdC1w@mail.gmail.com Whole thread Raw |
In response to | Re: Inlining comparators as a performance optimisation (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Inlining comparators as a performance optimisation
|
List | pgsql-hackers |
On 19 November 2011 02:55, Robert Haas <robertmhaas@gmail.com> wrote: > Maybe we should look at trying to isolate that a bit better. Indeed. Fortunately, GCC has options to disable each optimisation. Here's potentially relevant flags that we're already implicitly using at -02: -finline-small-functions <-- this is the one that inlined functions without an "inline" keyword -findirect-inlining -finline-functions-called-once -fearly-inlining -fipa-sra (Perform interprocedural scalar replacement of aggregates, removal of unused parameters and replacement of parameters passed by reference by parameters passed by value). In an effort to better isolate the effects of inlining, I tried this: ./configure CFLAGS="-fno-inline -fno-inline-small-functions" (I could have disabled more -02 optimisations, but this proved sufficient to make my point) Unsurprisingly, this makes things slower than regular -02. With the patch, the same query (once again, using explain analyze) with the same fast path quicksort stabilises around 92ms +/- 5ms (recall that the original figure was ~82ms). Our gains evaporate, and then some. Take away the additional CLAGS, and we're predictably back to big gains, with the query taking ~52ms just as before. What happens to this query when we build an unmodified postgres with these same CFLAGS? Well, we see the query take ~116ms after a few runs. It seems that the impedance mismatch matters, but inlining and other optimisations look to be at least as important. This isn't surprising to me, given what I was able to do with the isolated test. Maybe I should have tried it with additional disabling of optimisations named above, but that would have perhaps made things less clear. I'd probably have been better off directly measuring qsort speed and only passing those flags when compiling tuplesort.c (maybe impedance mismatch issues would have proven to have been even less relevant), but I wanted to do something that could be easily recreated, plus it's late. > It strikes me that we could probably create an API that would support > doing either of these things depending on the wishes of the underlying > datatype. For example, imagine that we're sorting with <(int4, int4). > We associate a PGPROC-callable function with that operator that > returns "internal", really a pointer to a struct. The first element > of the struct is a pointer to a comparison function that qsort() (or a > tape sort) can invoke without a trampoline; the second is a wholesale > replacement for qsort(); either or both can be NULL. Given that, it > seems to me that we could experiment with this pretty easily, and if > it turns out that only one of them is worth doing, it's easy to drop > one element out of the structure. > > Or do you have another plan for how to do this? I haven't given it much thought. Let me get back to you on that next week. > Have you done any benchmarks where this saves seconds or minutes, > rather than milliseconds? That would certainly make it more exciting, > at least to me. Right. Well, I thought I'd use pgbench to generate a large table in a re-creatable way. That is: pgbench -i -s 60 This puts pgbench_accounts at 769MB. Then, having increased work_mem to 1GB (enough to qsort) and maintenance_work_mem to 756mb, I decided to test this query with the patch: explain analyze select * from pgbench_accounts order BY abalance; This stabilised at ~3450ms, through repeatedly being executed. How does this compare to unpatched postgres? Well, it stabilised at about ~3780ms for the same query. This patch is obviously less of a win as the number of tuples to sort goes up. That's probably partly explained by the cost of everything else going up at a greater rate than the number of comparisons. I suspect that if we measure qsort in isolation, we'll see better results, so we may still see a good win on index creation time as a result of this work. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
pgsql-hackers by date: