Off-Topic: Float point division academia? (was: Re: Spinlock backoff algorithm) - Mailing list pgsql-hackers

From Mark Mielke
Subject Off-Topic: Float point division academia? (was: Re: Spinlock backoff algorithm)
Date
Msg-id 473DA249.1040303@mark.mielke.cc
Whole thread Raw
In response to Re: Spinlock backoff algorithm  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
Martijn van Oosterhout wrote: <blockquote cite="mid:20071116084542.GB31271@svana.org" type="cite"><pre wrap="">On Wed,
Nov14, 2007 at 06:57:18PM -0800, Josh Berkus wrote: </pre><blockquote type="cite"><pre wrap="">The question is, for our
mostcommon platforms (like AMD and Intel) is the FPU 
 
notably slower/more contended than integer division?  I'd the impression that 
it was, but my knowledge of chip architectures is liable to be out of date.

Can we have a hardware geek speak up?   </pre></blockquote><pre wrap="">I did some searching and the best figures I can
findfor integer
 
division is 15-40 cycles whereas for floating point the best I found was 5
cycles. The FP units on modern processer as very highly optimsed, but
the figures quoted are for pipelined division, which is not what we're
doing here. </pre></blockquote> I agree with Tom that the although the user might be experiencing odd issues as
PostgreSQLwas not designed for 32 light-weight cores like the Niagara, that this particular issue is unlikely to be the
primaryissue.<br /><br /> In response to the academic discussion above, did you also look up the time to convert from
INT-> FLOAT, and then FLOAT -> INT?<br /><br /> Also, if divided by a constant, there are automatic optimization
tricksperformed, such as:<br /><br /> $ cat a.c<br /> int f(int i)<br /> {<br />     return i / 7;<br /> }<br /> $ gcc
-O3-S a.c<br /> $ cat a.s<br /> ...<br /> .globl f<br />         .type   f, @function<br /> f:<br /> .LFB2:<br />
       movl    %edi, %eax<br />         movl    $-1840700269, %edx<br />         imull   %edx<br />         leal   
(%rdx,%rdi),%eax<br />         sarl    $31, %edi<br />         sarl    $2, %eax<br />         subl    %edi, %eax<br />
       ret<br /><br /> No divide. No need to convert from INT -> FLOAT -> INT.<br /><br /> Here is non-conclusive
evidenceboth that integer division by a constant may still be faster on a fairly modern processor (AMD X2 3800+), and
that,as Tom believes, this issue is a waste of time:<br /><br /> $ cat a.c<br /> #define MAX_RANDOM_VALUE 
(0x7FFFFFFF)<br/><br /> int old_f(int i)<br /> {<br />     return ((double) i / (double) MAX_RANDOM_VALUE) + 0.5;<br />
}<br/><br /> int new_f(int i)<br /> {<br />     return (i + MAX_RANDOM_VALUE - 1) / MAX_RANDOM_VALUE;<br /> }<br /> $
catold.c<br /> int old_f(int);<br /> int new_f(int);<br /><br /> int main()<br /> {<br />     for (int i = 0; i <
100000000;i++)<br />         old_f(50);<br />     return 0;<br /> }<br /> $ cat new.c<br /> int old_f(int);<br /> int
new_f(int);<br/><br /> int main()<br /> {<br />     for (int i = 0; i < 100000000; i++)<br />         new_f(50);<br
/>    return 0;<br /> }<br /> $ gcc -o old -O3 -std=c99 old.c a.c<br /> $ gcc -o new -O3 -std=c99 new.c a.c<br /> $
time./old<br /> ./old  1.58s user 0.00s system 99% cpu 1.590 total<br /> $ time ./old<br /> ./old  1.31s user 0.00s
system99% cpu 1.314 total<br /> $ time ./old<br /> ./old  1.14s user 0.00s system 99% cpu 1.142 total<br /> $ time
./old<br/> ./old  1.46s user 0.00s system 99% cpu 1.461 total<br /> $ time ./new                       <br /> ./new 
0.68suser 0.00s system 99% cpu 0.682 total<br /> $ time ./new<br /> ./new  0.70s user 0.01s system 99% cpu 0.703
total<br/> $ time ./new<br /> ./new  0.70s user 0.00s system 99% cpu 0.705 total<br /> $ time ./new<br /> ./new  0.70s
user0.00s system 99% cpu 0.703 total<br /><br /><br /> I say non-conclusive, because:<br /> 1) my CPU is probably doing
branchprediction and may possibly be doing out-of-order execution. Since it has more integer units that floating point
units,this would work in the favour of integers. The contention case described by the original poster does not allow
formultiple integer units to be in use at the same time.<br /> 2) as Tom has already said, 100 million divides in
either0.7 seconds or 1.5 seconds, is still orders of magnitude more than the contended case would ever be called.<br
/><br/> For the future, please consider doing integer division when working with integers and the rvalue is a constant.
Forthis case, it doesn't seem like it is worth it to risk breaking the code.<br /><br /> Cheers,<br /> mark<br /><br
/><preclass="moz-signature" cols="72">-- 
 
Mark Mielke <a class="moz-txt-link-rfc2396E" href="mailto:mark@mielke.cc"><mark@mielke.cc></a>
</pre>

pgsql-hackers by date:

Previous
From: Zdenek Kotala
Date:
Subject: AllocSetStats uses fprintf instead of elog
Next
From: Tom Lane
Date:
Subject: Re: Javascript support in the backend, i.e. PL/JS