Why JIT speed improvement is so modest? - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Why JIT speed improvement is so modest? |
Date | |
Msg-id | 809c295d-9d0b-6a8f-c579-8b0ffe565cdc@postgrespro.ru Whole thread Raw |
Responses |
Re: Why JIT speed improvement is so modest?
Re: Why JIT speed improvement is so modest? |
List | pgsql-hackers |
Right now JIT provides about 30% improvement of TPC-H Q1 query: https://www.citusdata.com/blog/2018/09/11/postgresql-11-just-in-time/ I wonder why even at this query, which seems to be ideal use case for JIT, we get such modest improvement? I have raised this question several years ago - but that time JIT was assumed to be in early development stage and performance aspects were less critical than required infrastructure changes. But right now JIT seems to be stable enough and is switch on by default. Vitesse DB reports 8x speedup on Q1, ISP-RAS JIT version provides 3x speedup of Q1: https://www.pgcon.org/2017/schedule/attachments/467_PGCon%202017-05-26%2015-00%20ISPRAS%20Dynamic%20Compilation%20of%20SQL%20Queries%20in%20PostgreSQL%20Using%20LLVM%20JIT.pdf According to this presentation Q1 spends 6% of time in ExecQual and 75% in ExecAgg. VOPS provides 10x improvement of Q1. I have a hypothesis that such difference was caused by the way of aggregates calculation. Postgres is using Youngs-Cramer algorithm while both ISPRAS JIT version and my VOPS are just accumulating results in variable of type double. I rewrite VOPS to use the same algorithm as Postgres, but VOPS is still about 10 times faster. Results of Q1 on scale factor=10 TPC-H data at my desktop with parallel execution enabled: no-JIT: 5640 msec JIT: 4590msec VOPS: 452 msec VOPS + Youngs-Cramer algorithm: 610 msec Below are tops of profiles (functions with more than 1% of time): JIT: 10.98% postgres postgres [.] float4_accum 8.40% postgres postgres [.] float8_accum 7.51% postgres postgres [.] HeapTupleSatisfiesVisibility 5.92% postgres postgres [.] ExecInterpExpr 5.63% postgres postgres [.] tts_minimal_getsomeattrs 4.35% postgres postgres [.] lookup_hash_entries 3.72% postgres postgres [.] TupleHashTableHash.isra.8 2.93% postgres postgres [.] tuplehash_insert 2.70% postgres postgres [.] heapgettup_pagemode 2.24% postgres postgres [.] check_float8_array 2.23% postgres postgres [.] hash_search_with_hash_value 2.10% postgres postgres [.] ExecScan 1.90% postgres postgres [.] hash_uint32 1.57% postgres postgres [.] tts_minimal_clear 1.53% postgres postgres [.] FunctionCall1Coll 1.47% postgres postgres [.] pg_detoast_datum 1.39% postgres postgres [.] heapgetpage 1.37% postgres postgres [.] TupleHashTableMatch.isra.9 1.35% postgres postgres [.] ExecStoreBufferHeapTuple 1.06% postgres postgres [.] LookupTupleHashEntry 1.06% postgres postgres [.] AggCheckCallContext no-JIT: 26.82% postgres postgres [.] ExecInterpExpr 15.26% postgres postgres [.] tts_buffer_heap_getsomeattrs 8.27% postgres postgres [.] float4_accum 7.51% postgres postgres [.] float8_accum 5.26% postgres postgres [.] HeapTupleSatisfiesVisibility 2.78% postgres postgres [.] TupleHashTableHash.isra.8 2.63% postgres postgres [.] tts_minimal_getsomeattrs 2.54% postgres postgres [.] lookup_hash_entries 2.05% postgres postgres [.] tuplehash_insert 1.97% postgres postgres [.] heapgettup_pagemode 1.72% postgres postgres [.] hash_search_with_hash_value 1.57% postgres postgres [.] float48mul 1.55% postgres postgres [.] check_float8_array 1.48% postgres postgres [.] ExecScan 1.26% postgres postgres [.] hash_uint32 1.04% postgres postgres [.] tts_minimal_clear 1.00% postgres postgres [.] FunctionCall1Coll VOPS: 44.25% postgres vops.so [.] vops_avg_state_accumulate 11.76% postgres vops.so [.] vops_float4_avg_accumulate 6.14% postgres postgres [.] ExecInterpExpr 5.89% postgres vops.so [.] vops_float4_sub_lconst 4.89% postgres vops.so [.] vops_float4_mul 4.30% postgres vops.so [.] vops_int4_le_rconst 2.57% postgres vops.so [.] vops_float4_add_lconst 2.31% postgres vops.so [.] vops_count_accumulate 2.24% postgres postgres [.] tts_buffer_heap_getsomeattrs 1.97% postgres postgres [.] heap_page_prune_opt 1.72% postgres postgres [.] HeapTupleSatisfiesVisibility 1.67% postgres postgres [.] AllocSetAlloc 1.47% postgres postgres [.] hash_search_with_hash_value In theory by elimination of interpretation overhead JIT should provide performance comparable with vecrtorized executor. In most programming languages using JIT compiler instead of byte-code interpreter provides about 10x speed improvement. Certainly DBMS engine is very different with traditional interpreter and a lot of time is spent in tuple packing/unpacking (although JIT is also used here), in heap traversal,... But it is still unclear to me why if ISPRAS measurement were correct and we actually spent 75% of Q1 time in aggregation, JIT was not able to significantly (times) increase speed on Q1 query? Experiment with VOPS shows that used aggregation algorithm itself is not a bottleneck. Profile also give no answer for this question. Any ideas? -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: