Re: In PG12, query with float calculations is slower than PG11 - Mailing list pgsql-hackers
From | keisuke kuroda |
---|---|
Subject | Re: In PG12, query with float calculations is slower than PG11 |
Date | |
Msg-id | CANDwggKPuZDUQROEMjuxsDE29p8+AgQ-7CCUiBGM=M_jc1a34w@mail.gmail.com Whole thread Raw |
In response to | Re: In PG12, query with float calculations is slower than PG11 (Andres Freund <andres@anarazel.de>) |
Responses |
Re: In PG12, query with float calculations is slower than PG11
|
List | pgsql-hackers |
Hi,
I have been testing with newer compiler (clang-7)
and result is a bit different at least with clang-7.
Compiling PG 12.1 (even without patch) with clang-7
results in __isinf() no longer being a bottleneck,
that is, you don't see it in profiler at all.
So, there is no issue for people who use the modern clang toolchain,
but maybe that's not everyone.
So there would still be some interest in doing something about this.
* clang
bash-4.2$ which clang
/opt/rh/llvm-toolset-7.0/root/usr/bin/clang
bash-4.2$ clang -v
clang version 7.0.1 (tags/RELEASE_701/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rh/llvm-toolset-7.0/root/usr/bin
Found candidate GCC installation: /opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7
Found candidate GCC installation: /opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.2
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.5
Selected GCC installation: /opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64
** pg_config
---
CONFIGURE = '--prefix=/var/lib/pgsql/pgsql/12.1' 'CC=/opt/rh/llvm-toolset-7.0/root/usr/bin/clang' 'PKG_CONFIG_PATH=/opt/rh/llvm-toolset-7.0/root/usr/lib64/pkgconfig'
CC = /opt/rh/llvm-toolset-7.0/root/usr/bin/clang
---
* result(PostgreSQL 12.1 (even without patch))
postgres=# EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
select (2 * a) , (2 * b) , (2 * c), (2 * d), (2 * e)
from realtest;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Seq Scan on public.realtest (cost=0.00..288697.59 rows=10000115 width=40) (actual time=0.012..3878.284 rows=10000001 loops=1)
Output: ('2'::double precision * a), ('2'::double precision * b), ('2'::double precision * c), ('2'::double precision * d), ('2'::double precision * e)
Buffers: shared hit=63695
Planning Time: 0.038 ms
Execution Time: 4533.767 ms
(5 rows)
Samples: 5K of event 'cpu-clock', Event count (approx.): 1275000000
Overhead Command Shared Object Symbol
33.92% postgres postgres [.] ExecInterpExpr
13.27% postgres postgres [.] float84mul
10.86% postgres [vdso] [.] __vdso_clock_gettime
5.49% postgres postgres [.] tts_buffer_heap_getsomeattrs
3.96% postgres postgres [.] ExecScan
3.25% postgres libc-2.17.so [.] __clock_gettime
3.16% postgres postgres [.] heap_getnextslot
2.41% postgres postgres [.] tts_virtual_clear
2.39% postgres postgres [.] SeqNext
2.22% postgres postgres [.] InstrStopNode
Best Regards,
Keisuke Kuroda
I have been testing with newer compiler (clang-7)
and result is a bit different at least with clang-7.
Compiling PG 12.1 (even without patch) with clang-7
results in __isinf() no longer being a bottleneck,
that is, you don't see it in profiler at all.
So, there is no issue for people who use the modern clang toolchain,
but maybe that's not everyone.
So there would still be some interest in doing something about this.
* clang
bash-4.2$ which clang
/opt/rh/llvm-toolset-7.0/root/usr/bin/clang
bash-4.2$ clang -v
clang version 7.0.1 (tags/RELEASE_701/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rh/llvm-toolset-7.0/root/usr/bin
Found candidate GCC installation: /opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7
Found candidate GCC installation: /opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.2
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.5
Selected GCC installation: /opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64
** pg_config
---
CONFIGURE = '--prefix=/var/lib/pgsql/pgsql/12.1' 'CC=/opt/rh/llvm-toolset-7.0/root/usr/bin/clang' 'PKG_CONFIG_PATH=/opt/rh/llvm-toolset-7.0/root/usr/lib64/pkgconfig'
CC = /opt/rh/llvm-toolset-7.0/root/usr/bin/clang
---
* result(PostgreSQL 12.1 (even without patch))
postgres=# EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
select (2 * a) , (2 * b) , (2 * c), (2 * d), (2 * e)
from realtest;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Seq Scan on public.realtest (cost=0.00..288697.59 rows=10000115 width=40) (actual time=0.012..3878.284 rows=10000001 loops=1)
Output: ('2'::double precision * a), ('2'::double precision * b), ('2'::double precision * c), ('2'::double precision * d), ('2'::double precision * e)
Buffers: shared hit=63695
Planning Time: 0.038 ms
Execution Time: 4533.767 ms
(5 rows)
Samples: 5K of event 'cpu-clock', Event count (approx.): 1275000000
Overhead Command Shared Object Symbol
33.92% postgres postgres [.] ExecInterpExpr
13.27% postgres postgres [.] float84mul
10.86% postgres [vdso] [.] __vdso_clock_gettime
5.49% postgres postgres [.] tts_buffer_heap_getsomeattrs
3.96% postgres postgres [.] ExecScan
3.25% postgres libc-2.17.so [.] __clock_gettime
3.16% postgres postgres [.] heap_getnextslot
2.41% postgres postgres [.] tts_virtual_clear
2.39% postgres postgres [.] SeqNext
2.22% postgres postgres [.] InstrStopNode
Best Regards,
Keisuke Kuroda
2020年2月7日(金) 3:48 Andres Freund <andres@anarazel.de>:
Hi,
On 2020-02-06 11:03:51 -0500, Tom Lane wrote:
> Andres seems to be of the opinion that the compiler should be willing
> to ignore the semantic requirements of the C standard in order
> to rearrange the code back into the cheaper order. That sounds like
> wishful thinking to me ... even if it actually works on his compiler,
> it certainly isn't going to work for everyone.
Sorry, but, uh, what are you talking about? Please tell me which single
standards violation I'm advocating for?
I was asking about the inlining bit because the first email of the topic
explained that as the problem, which I don't believe can be the full
explanation - and it turns out it isn't. As Amit Langote's followup
email explained, there's the whole issue of the order of checks being
inverted - which is clearly bad. And wholly unrelated to inlining.
And I asked about __isinf() being used because there are issues with
accidentally ending up with the non-intrinsic version of isinf() when
not using gcc, due to badly written standard library headers.
> The patch looks unduly invasive to me, but I think that it might be
> right that we should go back to a macro-based implementation, because
> otherwise we don't have a good way to be certain that the function
> parameter won't get evaluated first.
I'd first like to see some actual evidence of this being a problem,
rather than just the order of the checks.
> (Another reason to do so is so that the file/line numbers generated
> for the error reports go back to being at least a little bit useful.)
> We could use local variables within the macro to avoid double evals,
> if anyone thinks that's actually important --- I don't.
I don't think that's necessarily a good idea. In fact, I think we should
probably do the exact opposite, and move the error messages further out
of line. All these otherwise very small functions having their own
ereports makes them much bigger. Our low code density, and the resulting
rate of itlb misses, is pretty significant cost (cf [1]).
master:
text data bss dec hex filename
36124 44 65 36233 8d89 float.o
error messages moved out of line:
text data bss dec hex filename
32883 44 65 32992 80e0 float.o
Taking int4pl as an example - solely because it is simpler assembly to
look at - we get:
master:
0x00000000004ac190 <+0>: mov 0x30(%rdi),%rax
0x00000000004ac194 <+4>: add 0x20(%rdi),%eax
0x00000000004ac197 <+7>: jo 0x4ac19c <int4pl+12>
0x00000000004ac199 <+9>: cltq
0x00000000004ac19b <+11>: retq
0x00000000004ac19c <+12>: push %rbp
0x00000000004ac19d <+13>: lea 0x1a02c4(%rip),%rsi # 0x64c468
0x00000000004ac1a4 <+20>: xor %r8d,%r8d
0x00000000004ac1a7 <+23>: lea 0x265da1(%rip),%rcx # 0x711f4f <__func__.26823>
0x00000000004ac1ae <+30>: mov $0x30b,%edx
0x00000000004ac1b3 <+35>: mov $0x14,%edi
0x00000000004ac1b8 <+40>: callq 0x586060 <errstart>
0x00000000004ac1bd <+45>: lea 0x147e0e(%rip),%rdi # 0x5f3fd2
0x00000000004ac1c4 <+52>: xor %eax,%eax
0x00000000004ac1c6 <+54>: callq 0x5896a0 <errmsg>
0x00000000004ac1cb <+59>: mov $0x3000082,%edi
0x00000000004ac1d0 <+64>: mov %eax,%ebp
0x00000000004ac1d2 <+66>: callq 0x589540 <errcode>
0x00000000004ac1d7 <+71>: mov %eax,%edi
0x00000000004ac1d9 <+73>: mov %ebp,%esi
0x00000000004ac1db <+75>: xor %eax,%eax
0x00000000004ac1dd <+77>: callq 0x588fb0 <errfinish>
out-of-line error:
0x00000000004b04e0 <+0>: mov 0x30(%rdi),%rax
0x00000000004b04e4 <+4>: add 0x20(%rdi),%eax
0x00000000004b04e7 <+7>: jo 0x4b04ec <int4pl+12>
0x00000000004b04e9 <+9>: cltq
0x00000000004b04eb <+11>: retq
0x00000000004b04ec <+12>: push %rax
0x00000000004b04ed <+13>: callq 0x115e17 <out_of_range_err>
With the out-of-line error, we can fit multiple of these functions into one
cache line. With the inline error, not even one.
Greetings,
Andres Freund
[1] https://twitter.com/AndresFreundTec/status/1214305610172289024
pgsql-hackers by date: