Thread: PostgreSQLv14 TPC-H performance GCC vs Clang

PostgreSQLv14 TPC-H performance GCC vs Clang

From
arjun shetty
Date:
Hi 
PostgreSQLv14 source code build  with GCCv11.2 and Clangv12(without JIT) with  optimisation flags like O3 and tested with HammerDB
Observed TPC-H , GCC performance better than Clang(without JIT). The performance difference ~22% and also noticed the assembly code difference GCC vs Clang( e.g. GCC inlined functionality compared to Clang). 

Environment details:
————————-
OS :RHEL8.4
Bare metal : Apple/AMD EPYC/IBM
Test(TPC-H) Benchmark Environment:HammerDB

Is the performance difference mainly because of below points ?
1 data over flow and calculations like int128(int128.c) and C arithmetic operations(functions include in float.h e.g float4_mul)   

And please suggest is any another functionality or code points need to check on the performance difference 

Re: PostgreSQLv14 TPC-H performance GCC vs Clang

From
Imre Samu
Date:
> .. optimisation flags like O3
> And please suggest ...  to check on the performance difference 

The Phoronix has been tested the PostgreSQL 13 with Clang 12 + GCC 11.1 On Xeon Ice Lake
  "The CFLAGS/CXXFLAGS set throughout testing were "-O3 -march=native -flto" 
  as would be common for HPC systems when building performance sensitive code."
and the results:
only the Postgres ( GCC 11 vs. LLVM Clang 12 Benchmarks On Xeon Ice Lake ) 
  maybe you can replicate the Phoronix results  ( but this is only gcc11.1 ! )
  "Compare your own system(s) to this result file with the Phoronix Test Suite 
    by running the command: phoronix-test-suite benchmark 2105299-IB-COMPILERT91"

Regards.
  Imre

arjun shetty <arjunshetty955@gmail.com> ezt írta (időpont: 2021. nov. 2., K, 18:13):
Hi 
PostgreSQLv14 source code build  with GCCv11.2 and Clangv12(without JIT) with  optimisation flags like O3 and tested with HammerDB
Observed TPC-H , GCC performance better than Clang(without JIT). The performance difference ~22% and also noticed the assembly code difference GCC vs Clang( e.g. GCC inlined functionality compared to Clang). 

Environment details:
————————-
OS :RHEL8.4
Bare metal : Apple/AMD EPYC/IBM
Test(TPC-H) Benchmark Environment:HammerDB

Is the performance difference mainly because of below points ?
1 data over flow and calculations like int128(int128.c) and C arithmetic operations(functions include in float.h e.g float4_mul)   

And please suggest is any another functionality or code points need to check on the performance difference 

Re: PostgreSQLv14 TPC-H performance GCC vs Clang

From
arjun shetty
Date:
Hi 

@imre : Thank you sharing the links on “ Phoronix has been tested the PostgreSQL 13”.
I compared my test results with Phoronix test suit” . It has too deviations(may be hardware environment and PostgreSQL version) 
I think PostgreSQLv13 may have issues with Auto vacuum and currently I’m using with PostgreSQLv14 


In my environment GCC performs better than Clang(llvm) the reason would  be “int128”performance better in GCC compared to Clang.
1.Clang(__int128) require 4 additional functions like “__divti3 , __modti3, __udivti3, __umodti3” and these additional not required in GCC . So it may lead performance drop in Clang.
2.__int128 aligned 16 bytes boundaries (MAXALIGN) supported in GCC and may this in not support in Clang

@postgresql- performance: kindly let know your view on those two points.





On Wednesday, November 3, 2021, Imre Samu <pella.samu@gmail.com> wrote:
> .. optimisation flags like O3
> And please suggest ...  to check on the performance difference 

The Phoronix has been tested the PostgreSQL 13 with Clang 12 + GCC 11.1 On Xeon Ice Lake
  "The CFLAGS/CXXFLAGS set throughout testing were "-O3 -march=native -flto" 
  as would be common for HPC systems when building performance sensitive code."
and the results:
only the Postgres ( GCC 11 vs. LLVM Clang 12 Benchmarks On Xeon Ice Lake ) 
  maybe you can replicate the Phoronix results  ( but this is only gcc11.1 ! )
  "Compare your own system(s) to this result file with the Phoronix Test Suite 
    by running the command: phoronix-test-suite benchmark 2105299-IB-COMPILERT91"

Regards.
  Imre

arjun shetty <arjunshetty955@gmail.com> ezt írta (időpont: 2021. nov. 2., K, 18:13):
Hi 
PostgreSQLv14 source code build  with GCCv11.2 and Clangv12(without JIT) with  optimisation flags like O3 and tested with HammerDB
Observed TPC-H , GCC performance better than Clang(without JIT). The performance difference ~22% and also noticed the assembly code difference GCC vs Clang( e.g. GCC inlined functionality compared to Clang). 

Environment details:
————————-
OS :RHEL8.4
Bare metal : Apple/AMD EPYC/IBM
Test(TPC-H) Benchmark Environment:HammerDB

Is the performance difference mainly because of below points ?
1 data over flow and calculations like int128(int128.c) and C arithmetic operations(functions include in float.h e.g float4_mul)   

And please suggest is any another functionality or code points need to check on the performance difference 

Re: PostgreSQLv14 TPC-H performance GCC vs Clang

From
Tomas Vondra
Date:
Hi,

IMO this thread provides so little information it's almost impossible to 
answer the question. There's almost no information about the hardware, 
scale of the test, configuration of the Postgres instance, the exact 
build flags, differences in generated asm code, etc.

I find it hard to believe merely switching from clang to gcc yields 22% 
speedup - that's way higher than any differences we've seen in the past.

In my experience, the speedup is unlikely to be "across the board". 
There will be a handful of affected queries, while most remaining 
queries will be about the same. In that case you need to focus on those 
queries, see if the plans are the same, do some profiling, etc.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



PostgreSQLv14 TPC-H performance GCC vs Clang

From
arjun shetty
Date:
Yes, currently focusing affects queries as well.
In meanwhile on analysis(hardware level) and sample examples noticed
1. GCC performance  better than Clang on int128 . 
2. Clang performance better than GCC on long long 

3.GCC enabled with “ fexcess-precision=standard” (precision cast for floating point ).

Is these 3 points can make performance  difference GCC vs Clang in PostgreSQLv14 in Apple/AMD/()environment(intel environment need to check). In these environment int128 enabled wrt PostgreSQLv14.

On Friday, November 5, 2021, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
Hi,

IMO this thread provides so little information it's almost impossible to answer the question. There's almost no information about the hardware, scale of the test, configuration of the Postgres instance, the exact build flags, differences in generated asm code, etc.

I find it hard to believe merely switching from clang to gcc yields 22% speedup - that's way higher than any differences we've seen in the past.

In my experience, the speedup is unlikely to be "across the board". There will be a handful of affected queries, while most remaining queries will be about the same. In that case you need to focus on those queries, see if the plans are the same, do some profiling, etc.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PostgreSQLv14 TPC-H performance GCC vs Clang

From
Imre Samu
Date:
GCC vs Clang 

related: 
As I see - with LLVM/Clang 14.0 ( X86_64 -O3 )   ~12% performance increase expected with the new optimisation ( probably adapted from gcc  )  

arjun shetty <arjunshetty955@gmail.com> ezt írta (időpont: 2021. nov. 16., K, 11:10):
Yes, currently focusing affects queries as well.
In meanwhile on analysis(hardware level) and sample examples noticed
1. GCC performance  better than Clang on int128 . 
2. Clang performance better than GCC on long long 

3.GCC enabled with “ fexcess-precision=standard” (precision cast for floating point ).

Is these 3 points can make performance  difference GCC vs Clang in PostgreSQLv14 in Apple/AMD/()environment(intel environment need to check). In these environment int128 enabled wrt PostgreSQLv14.

On Friday, November 5, 2021, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
Hi,

IMO this thread provides so little information it's almost impossible to answer the question. There's almost no information about the hardware, scale of the test, configuration of the Postgres instance, the exact build flags, differences in generated asm code, etc.

I find it hard to believe merely switching from clang to gcc yields 22% speedup - that's way higher than any differences we've seen in the past.

In my experience, the speedup is unlikely to be "across the board". There will be a handful of affected queries, while most remaining queries will be about the same. In that case you need to focus on those queries, see if the plans are the same, do some profiling, etc.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PostgreSQLv14 TPC-H performance GCC vs Clang

From
arjun shetty
Date:
Hi All,

I checked with LLVM/CLang 14.0 on arch x86-64-O3 in the Mac/AMD EPYC environment , but I see  GCC performs better than Clang14.
Clang14-https://github.com/llvm/llvm-project(main branch and pull or commitID:3f3fe4a5cfa1797..)
image.png
pre analysis GCC vs Clang 
 (1) GCC more inlined functionality compared to Clang in PostgreSQL
 (2) in few functions  GCC are not inlined but Clang consider inline
       postgresqlv14/src/include/utlis/float.h: float8_mul(),float8_div (arithmetic functions).v
      postgresqlv14/src/backend/adt/geo_ops.c : point_xxx().
(3) GCC performs better than clang on datatype Int128(need to cross check on instruction level/assembly code on Hardware).
(4) as point(2) without inline(remove inline in source code ) on those functions in file's float.h and geo_ops.c and observed performance improvement 6% compared to  within inline in Clang.

regards,
Arjun 


On Fri, Dec 10, 2021 at 11:51 PM Imre Samu <pella.samu@gmail.com> wrote:
GCC vs Clang 

related: 
As I see - with LLVM/Clang 14.0 ( X86_64 -O3 )   ~12% performance increase expected with the new optimisation ( probably adapted from gcc  )  

arjun shetty <arjunshetty955@gmail.com> ezt írta (időpont: 2021. nov. 16., K, 11:10):
Yes, currently focusing affects queries as well.
In meanwhile on analysis(hardware level) and sample examples noticed
1. GCC performance  better than Clang on int128 . 
2. Clang performance better than GCC on long long 

3.GCC enabled with “ fexcess-precision=standard” (precision cast for floating point ).

Is these 3 points can make performance  difference GCC vs Clang in PostgreSQLv14 in Apple/AMD/()environment(intel environment need to check). In these environment int128 enabled wrt PostgreSQLv14.

On Friday, November 5, 2021, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
Hi,

IMO this thread provides so little information it's almost impossible to answer the question. There's almost no information about the hardware, scale of the test, configuration of the Postgres instance, the exact build flags, differences in generated asm code, etc.

I find it hard to believe merely switching from clang to gcc yields 22% speedup - that's way higher than any differences we've seen in the past.

In my experience, the speedup is unlikely to be "across the board". There will be a handful of affected queries, while most remaining queries will be about the same. In that case you need to focus on those queries, see if the plans are the same, do some profiling, etc.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment