Re: Inlining comparators as a performance optimisation - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Inlining comparators as a performance optimisation
Date
Msg-id 595E409C76394413878F7F572F5DE0CB@china.huawei.com
Whole thread Raw
In response to Inlining comparators as a performance optimisation  (Peter Geoghegan <peter@2ndquadrant.com>)
List pgsql-hackers
<div class="Section1"><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt">>Recent discussions on the threads "Double sorting split patch" and</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt">>"CUDA sorting" raised the possibility that there could be significant</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt">>performance optimisation "low-hanging fruit" picked by having the</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt">>executor treat integers and floats as a special case during sorting,</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt">>avoiding going to the trouble of calling a comparator using the</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">>built-in SQL function machinery</span></font><p class="MsoPlainText"><font face="Courier New"
size="2"><spanstyle="font-size:
 
10.0pt"> </span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt">Why only for integers and floats why not for char/varchar?</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">But I believe this can make code less maintainable as similar things can be done at other places to avoid SQL
functionmachinery.</span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
 
10.0pt"> </span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt">>Once the cache has been warmed, explain analyze very consistently</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt">>reports a runtime of 123ms for this query on master/HEAD, which varies</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt">>+/- 1 ms, with a few outliers of maybe +/- 2ms. However, when I apply</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt">>this patch, that goes down to 107ms +/- 1ms at -O0.</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt"> </span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt">Time 123ms which is without your change is with which optimization –O2 or O0?</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt"> </span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt"> </span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt">***************************************************************************************</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt">This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person
orentity whose address is listed above. Any use of the information contained herein in any way (including, but not
limitedto, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's)
isprohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete
it!</span></font><pclass="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
 
10.0pt"> </span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt"> </span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt">-----Original Message-----<br /> From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Peter Geoghegan<br /> Sent: Tuesday, September 20, 2011 7:26
AM<br/> To: PG Hackers<br /> Subject: [HACKERS] Inlining comparators as a performance optimisation</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt"> </span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt">Recent discussions on the threads "Double sorting split patch" and</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">"CUDA sorting" raised the possibility that there could be significant</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt">performance optimisation "low-hanging fruit" picked by having the</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">executor treat integers and floats as a special case during sorting,</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">avoiding going to the trouble of calling a comparator using the</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">built-in SQL function machinery, and taking advantage of inlining of</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">the comparator, which has been shown to have a considerable</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">performance advantage (at least compared to a general purpose c stdlib</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt">qsort(), that takes a function pointer as its comparator, much like</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">tuplesort).</span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt"> </span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt">I've hacked together a sloppy POC implementation in a hurry</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">(basically, some code is shifted around) , which is attached - I felt</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt">that it would be useful to determine if the community feels that this</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt">is a worth-while undertaking in advance of a business trip that I'm</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">leaving on tomorrow lasting until Friday, during which I will be</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">mostly unavailable. The patch breaks the Postgres sorting executor</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">node (at least when it quicksorts) for any type other than int4. I</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">apologise for how rough the patch is, but the code itself isn't</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">important right now - the idea is. I anticipate that the value</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">state->datumType or something similar will be set in a near future</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt">revision, so that tuplesort_performsort will know which type-specific</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt">optimisation it can use for the type, while falling back on the</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">existing generic qsort_arg + qsort_arg_comparator, and sorting won't</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">actually be broken.</span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span
style="font-size:
10.0pt"> </span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt">I've been doing some preliminary testing using the dell store 2 sample</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt">database. I increase work_mem to '50MB', to ensure that a quicksort</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">will be performed for sorting (otherwise, I'm using the</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">postgresql.conf that initdb gave me). The query is:</span></font><p class="MsoPlainText"><font face="Courier
New"size="2"><span style="font-size:
 
10.0pt"> </span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt">explain analyze select * from orderlines order by prod_id;</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt"> </span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt">Once the cache has been warmed, explain analyze very consistently</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">reports a runtime of 123ms for this query on master/HEAD, which varies</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt">+/- 1 ms, with a few outliers of maybe +/- 2ms. However, when I apply</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt">this patch, that goes down to 107ms +/- 1ms at -O0. I think that</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">that's a pretty good start. Funnily enough, the difference/advantage</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">vanishes at -O2 (I'm guessing that the higher optimisation level of</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">GCC 4.5 hyper-corrects away the inlining, but I don't have time to</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">check that right now).</span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span
style="font-size:
10.0pt"> </span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt">I imagine the version that I actually submit for patch review will</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">have a macro-based infrastructure for inlining the sorting of various</span></font><p
class="MsoPlainText"><fontface="Courier New" size="2"><span style="font-size:
 
10.0pt">built-in types, initially integers and floats. It will most likely</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">have some other optimisations - I haven't even used a profiler yet.</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt"> </span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt">This performance patch differs from most in that it's difficult in</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt">principle to imagine a performance regression occurring.</span></font><p class="MsoPlainText"><font
face="CourierNew" size="2"><span style="font-size:
 
10.0pt"> </span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt">Thoughts?</span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt"> </span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt">-- </span></font><p class="MsoPlainText"><font face="Courier New" size="2"><span style="font-size:
10.0pt">Peter Geoghegan       http://www.2ndQuadrant.com/</span></font><p class="MsoPlainText"><font face="Courier New"
size="2"><spanstyle="font-size:
 
10.0pt">PostgreSQL Development, 24x7 Support, Training and Services</span></font></div>

pgsql-hackers by date:

Previous
From: Marco Stornelli
Date:
Subject: Re: Improve lseek scalability v3
Next
From: Dan McGee
Date:
Subject: [PATCH] POC: inline int4 comparison in tuplesort