Re: Inlining comparators as a performance optimisation - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Inlining comparators as a performance optimisation |
Date | |
Msg-id | CA+TgmoYr8O_fK3WvOPXzU6bzZ0utCfTxUZPN3=nYon3Wfh+KHQ@mail.gmail.com Whole thread Raw |
In response to | Re: Inlining comparators as a performance optimisation (Peter Geoghegan <peter@2ndquadrant.com>) |
Responses |
Re: Inlining comparators as a performance optimisation
Re: Inlining comparators as a performance optimisation Re: Inlining comparators as a performance optimisation |
List | pgsql-hackers |
On Sun, Sep 25, 2011 at 10:12 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > I've produced something much neater than my first patch, attached, > although I still consider this to be at the POC stage, not least since > I'm not exactly sure how I should be selecting the right > specialisation in tuplesort_performsort() (a hack is used right now > that does a direct pg_proc OID comparison), and also because I haven't > implemented anything other than qsort for heap tuples yet (a minor > detail, I suppose). I'm pleased to say that a much clearer picture of > what's really going on here has emerged. > > Statistics: Total runtime according to explain analyze for query > "select * from orderlines order by prod_id" (dellstore2 sample db), at > GCC 4.5's -02 optimisation level, after warming the cache, on my > desktop: > > Without the patch: > > ~82ms > > With the patch, but with the "inline" keyword commented out for all > new functions/meta-functions: > > ~60ms > > with the patch, unmodified: > > ~52ms Reviewing away when I should be sleeping, wahoo...! I think that we should really consider doing with this patch what Tom suggested upthread; namely, looking for a mechanism to allow individual datatypes to offer up a comparator function that doesn't require bouncing through FunctionCall2Coll(). It seems to me that duplicating the entire qsort() algorithm is a little iffy. Sure, in this case it works out to a win. But it's only a small win - three-quarters of it is in the uncontroversial activity of reducing the impedance mismatch - and how do we know it will always be a win? Adding more copies of the same code can be an anti-optimization if it means that a smaller percentage of it fits in the instruction cache, and sometimes small changes in runtime are caused by random shifts in the layout of memory that align things more or less favorably across cache lines rather than by real effects. Now it may well be that this is a real effect, but will it still look as good when we do this for 10 data types? For 100 data types? In contrast, it seems to me that reducing the impedance mismatch is something that we could go and do across our entire code base, and every single data type would benefit from it. It would also be potentially usable by other sorting algorithms, not just quick sort. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: