Re: In PG12, query with float calculations is slower than PG11 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: In PG12, query with float calculations is slower than PG11
Date
Msg-id 0C1EE524-CCB0-4230-9384-C11C48B8C7B0@anarazel.de
Whole thread Raw
In response to Re: In PG12, query with float calculations is slower than PG11  (keisuke kuroda <keisuke.kuroda.3862@gmail.com>)
Responses Re: In PG12, query with float calculations is slower than PG11
List pgsql-hackers
Hi,

On February 6, 2020 11:42:30 PM PST, keisuke kuroda <keisuke.kuroda.3862@gmail.com> wrote:
>Hi,
>
>I have been testing with newer compiler (clang-7)
>and result is a bit different at least with clang-7.
>Compiling PG 12.1 (even without patch) with clang-7
>results in __isinf() no longer being a bottleneck,
>that is, you don't see it in profiler at all.

I don't think that's necessarily the right conclusion. What's quite possibly happening is that you do not see the
externalisinf function anymore, because it is implemented as an intrinsic,  but that there still are more computations
beingdone. Due to the changed order of the isinf checks. You'd have to compare with 11 using the same compiler. 

Andres


>* result(PostgreSQL 12.1 (even without patch))
>
>postgres=# EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
> select (2 * a) , (2 * b) , (2 * c), (2 * d),  (2 * e)
> from realtest;
>
>QUERY PLAN

>-----------------------------------------------------------------------------------------------------------------------
>Seq Scan on public.realtest  (cost=0.00..288697.59 rows=10000115
>width=40)
>(actual time=0.012..3878.284 rows=10000001 loops=1)
>   Output: ('2'::double precision * a), ('2'::double precision * b),
>('2'::double precision * c), ('2'::double precision * d), ('2'::double
>precision * e)
>   Buffers: shared hit=63695
> Planning Time: 0.038 ms
> Execution Time: 4533.767 ms
>(5 rows)
>
>Samples: 5K of event 'cpu-clock', Event count (approx.): 1275000000
>Overhead  Command   Shared Object      Symbol
>  33.92%  postgres  postgres           [.] ExecInterpExpr
>  13.27%  postgres  postgres           [.] float84mul
>  10.86%  postgres  [vdso]             [.] __vdso_clock_gettime
>   5.49%  postgres  postgres           [.] tts_buffer_heap_getsomeattrs
>   3.96%  postgres  postgres           [.] ExecScan
>   3.25%  postgres  libc-2.17.so       [.] __clock_gettime
>   3.16%  postgres  postgres           [.] heap_getnextslot
>   2.41%  postgres  postgres           [.] tts_virtual_clear
>   2.39%  postgres  postgres           [.] SeqNext
>   2.22%  postgres  postgres           [.] InstrStopNode
>
>Best Regards,
>Keisuke Kuroda
>
>2020年2月7日(金) 3:48 Andres Freund <andres@anarazel.de>:
>
>> Hi,
>>
>> On 2020-02-06 11:03:51 -0500, Tom Lane wrote:
>> > Andres seems to be of the opinion that the compiler should be
>willing
>> > to ignore the semantic requirements of the C standard in order
>> > to rearrange the code back into the cheaper order.  That sounds
>like
>> > wishful thinking to me ... even if it actually works on his
>compiler,
>> > it certainly isn't going to work for everyone.
>>
>> Sorry, but, uh, what are you talking about?  Please tell me which
>single
>> standards violation I'm advocating for?
>>
>> I was asking about the inlining bit because the first email of the
>topic
>> explained that as the problem, which I don't believe can be the full
>> explanation - and it turns out it isn't. As Amit Langote's followup
>> email explained, there's the whole issue of the order of checks being
>> inverted - which is clearly bad. And wholly unrelated to inlining.
>>
>> And I asked about __isinf() being used because there are issues with
>> accidentally ending up with the non-intrinsic version of isinf() when
>> not using gcc, due to badly written standard library headers.
>>
>>
>> > The patch looks unduly invasive to me, but I think that it might be
>> > right that we should go back to a macro-based implementation,
>because
>> > otherwise we don't have a good way to be certain that the function
>> > parameter won't get evaluated first.
>>
>> I'd first like to see some actual evidence of this being a problem,
>> rather than just the order of the checks.
>>
>>
>> > (Another reason to do so is so that the file/line numbers generated
>> > for the error reports go back to being at least a little bit
>useful.)
>> > We could use local variables within the macro to avoid double
>evals,
>> > if anyone thinks that's actually important --- I don't.
>>
>> I don't think that's necessarily a good idea. In fact, I think we
>should
>> probably do the exact opposite, and move the error messages further
>out
>> of line. All these otherwise very small functions having their own
>> ereports makes them much bigger. Our low code density, and the
>resulting
>> rate of itlb misses, is pretty significant cost (cf [1]).
>>
>> master:
>>    text    data     bss     dec     hex filename
>>   36124      44      65   36233    8d89 float.o
>> error messages moved out of line:
>>    text    data     bss     dec     hex filename
>>   32883      44      65   32992    80e0 float.o
>>
>> Taking int4pl as an example - solely because it is simpler assembly
>to
>> look at - we get:
>>
>> master:
>>    0x00000000004ac190 <+0>:     mov    0x30(%rdi),%rax
>>    0x00000000004ac194 <+4>:     add    0x20(%rdi),%eax
>>    0x00000000004ac197 <+7>:     jo     0x4ac19c <int4pl+12>
>>    0x00000000004ac199 <+9>:     cltq
>>    0x00000000004ac19b <+11>:    retq
>>    0x00000000004ac19c <+12>:    push   %rbp
>>    0x00000000004ac19d <+13>:    lea    0x1a02c4(%rip),%rsi        #
>> 0x64c468
>>    0x00000000004ac1a4 <+20>:    xor    %r8d,%r8d
>>    0x00000000004ac1a7 <+23>:    lea    0x265da1(%rip),%rcx        #
>> 0x711f4f <__func__.26823>
>>    0x00000000004ac1ae <+30>:    mov    $0x30b,%edx
>>    0x00000000004ac1b3 <+35>:    mov    $0x14,%edi
>>    0x00000000004ac1b8 <+40>:    callq  0x586060 <errstart>
>>    0x00000000004ac1bd <+45>:    lea    0x147e0e(%rip),%rdi        #
>> 0x5f3fd2
>>    0x00000000004ac1c4 <+52>:    xor    %eax,%eax
>>    0x00000000004ac1c6 <+54>:    callq  0x5896a0 <errmsg>
>>    0x00000000004ac1cb <+59>:    mov    $0x3000082,%edi
>>    0x00000000004ac1d0 <+64>:    mov    %eax,%ebp
>>    0x00000000004ac1d2 <+66>:    callq  0x589540 <errcode>
>>    0x00000000004ac1d7 <+71>:    mov    %eax,%edi
>>    0x00000000004ac1d9 <+73>:    mov    %ebp,%esi
>>    0x00000000004ac1db <+75>:    xor    %eax,%eax
>>    0x00000000004ac1dd <+77>:    callq  0x588fb0 <errfinish>
>>
>> out-of-line error:
>>    0x00000000004b04e0 <+0>:     mov    0x30(%rdi),%rax
>>    0x00000000004b04e4 <+4>:     add    0x20(%rdi),%eax
>>    0x00000000004b04e7 <+7>:     jo     0x4b04ec <int4pl+12>
>>    0x00000000004b04e9 <+9>:     cltq
>>    0x00000000004b04eb <+11>:    retq
>>    0x00000000004b04ec <+12>:    push   %rax
>>    0x00000000004b04ed <+13>:    callq  0x115e17 <out_of_range_err>
>>
>> With the out-of-line error, we can fit multiple of these functions
>into one
>> cache line. With the inline error, not even one.
>>
>> Greetings,
>>
>> Andres Freund
>>
>> [1] https://twitter.com/AndresFreundTec/status/1214305610172289024
>>

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



pgsql-hackers by date:

Previous
From: keisuke kuroda
Date:
Subject: Re: In PG12, query with float calculations is slower than PG11
Next
From: Fujii Masao
Date:
Subject: Re: Tid scan increments value of pg_stat_all_tables.seq_scan. (butnot seq_tup_read)