Why JIT speed improvement is so modest? - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Why JIT speed improvement is so modest?
Date
Msg-id 809c295d-9d0b-6a8f-c579-8b0ffe565cdc@postgrespro.ru
Whole thread Raw
Responses Re: Why JIT speed improvement is so modest?  (Merlin Moncure <mmoncure@gmail.com>)
Re: Why JIT speed improvement is so modest?  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Right now JIT provides about 30% improvement of TPC-H Q1 query:

https://www.citusdata.com/blog/2018/09/11/postgresql-11-just-in-time/

I wonder why even at this query, which seems to be ideal use case for 
JIT, we get such modest improvement?
I have raised this question several years ago - but that time JIT was 
assumed to be in early development stage and performance aspects were 
less critical
than required infrastructure changes. But right now JIT seems to be 
stable enough and is switch on by default.
Vitesse DB reports 8x speedup on Q1,
ISP-RAS JIT version  provides 3x speedup of Q1:


https://www.pgcon.org/2017/schedule/attachments/467_PGCon%202017-05-26%2015-00%20ISPRAS%20Dynamic%20Compilation%20of%20SQL%20Queries%20in%20PostgreSQL%20Using%20LLVM%20JIT.pdf

According to this presentation Q1 spends 6% of time in ExecQual and 75% 
in ExecAgg.

VOPS provides 10x improvement of Q1.

I have a hypothesis that such difference was caused by the way of 
aggregates calculation.
Postgres is using Youngs-Cramer algorithm while both ISPRAS JIT version 
and my VOPS are just accumulating results in variable of type double.
I rewrite VOPS to use the same algorithm as Postgres, but VOPS is still 
about 10 times faster.

Results of Q1 on scale factor=10 TPC-H data at my desktop with parallel 
execution enabled:
no-JIT: 5640 msec
JIT:      4590msec
VOPS: 452 msec
VOPS + Youngs-Cramer algorithm: 610 msec

Below are tops of profiles (functions with more than 1% of time):

JIT:
   10.98%  postgres  postgres            [.] float4_accum
    8.40%  postgres  postgres            [.] float8_accum
    7.51%  postgres  postgres            [.] HeapTupleSatisfiesVisibility
    5.92%  postgres  postgres            [.] ExecInterpExpr
    5.63%  postgres  postgres            [.] tts_minimal_getsomeattrs
    4.35%  postgres  postgres            [.] lookup_hash_entries
    3.72%  postgres  postgres            [.] TupleHashTableHash.isra.8
    2.93%  postgres  postgres            [.] tuplehash_insert
    2.70%  postgres  postgres            [.] heapgettup_pagemode
    2.24%  postgres  postgres            [.] check_float8_array
    2.23%  postgres  postgres            [.] hash_search_with_hash_value
    2.10%  postgres  postgres            [.] ExecScan
    1.90%  postgres  postgres            [.] hash_uint32
    1.57%  postgres  postgres            [.] tts_minimal_clear
    1.53%  postgres  postgres            [.] FunctionCall1Coll
    1.47%  postgres  postgres            [.] pg_detoast_datum
    1.39%  postgres  postgres            [.] heapgetpage
    1.37%  postgres  postgres            [.] TupleHashTableMatch.isra.9
    1.35%  postgres  postgres            [.] ExecStoreBufferHeapTuple
    1.06%  postgres  postgres            [.] LookupTupleHashEntry
    1.06%  postgres  postgres            [.] AggCheckCallContext

no-JIT:
   26.82%  postgres  postgres            [.] ExecInterpExpr
   15.26%  postgres  postgres            [.] tts_buffer_heap_getsomeattrs
    8.27%  postgres  postgres            [.] float4_accum
    7.51%  postgres  postgres            [.] float8_accum
    5.26%  postgres  postgres            [.] HeapTupleSatisfiesVisibility
    2.78%  postgres  postgres            [.] TupleHashTableHash.isra.8
    2.63%  postgres  postgres            [.] tts_minimal_getsomeattrs
    2.54%  postgres  postgres            [.] lookup_hash_entries
    2.05%  postgres  postgres            [.] tuplehash_insert
    1.97%  postgres  postgres            [.] heapgettup_pagemode
    1.72%  postgres  postgres            [.] hash_search_with_hash_value
    1.57%  postgres  postgres            [.] float48mul
    1.55%  postgres  postgres            [.] check_float8_array
    1.48%  postgres  postgres            [.] ExecScan
    1.26%  postgres  postgres            [.] hash_uint32
    1.04%  postgres  postgres            [.] tts_minimal_clear
    1.00%  postgres  postgres            [.] FunctionCall1Coll

VOPS:
   44.25%  postgres  vops.so            [.] vops_avg_state_accumulate
   11.76%  postgres  vops.so            [.] vops_float4_avg_accumulate
    6.14%  postgres  postgres           [.] ExecInterpExpr
    5.89%  postgres  vops.so            [.] vops_float4_sub_lconst
    4.89%  postgres  vops.so            [.] vops_float4_mul
    4.30%  postgres  vops.so            [.] vops_int4_le_rconst
    2.57%  postgres  vops.so            [.] vops_float4_add_lconst
    2.31%  postgres  vops.so            [.] vops_count_accumulate
    2.24%  postgres  postgres           [.] tts_buffer_heap_getsomeattrs
    1.97%  postgres  postgres           [.] heap_page_prune_opt
    1.72%  postgres  postgres           [.] HeapTupleSatisfiesVisibility
    1.67%  postgres  postgres           [.] AllocSetAlloc
    1.47%  postgres  postgres           [.] hash_search_with_hash_value


In theory by elimination of interpretation overhead JIT should provide 
performance comparable with vecrtorized executor.
In most programming languages using JIT compiler instead of byte-code 
interpreter provides about 10x speed improvement.
Certainly DBMS engine is very different with traditional interpreter and 
a lot of time is spent in tuple packing/unpacking (although JIT is also 
used here),
in heap traversal,... But it is still unclear to me why if ISPRAS 
measurement were correct and we actually spent 75% of Q1 time in 
aggregation,
JIT was not able to significantly (times) increase speed on Q1 query?  
Experiment with VOPS shows that used aggregation algorithm itself is not 
a bottleneck.
Profile also give no answer for this question.
Any ideas?



-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




pgsql-hackers by date:

Previous
From: Juan José Santamaría Flecha
Date:
Subject: Re: logical decoding : exceeded maxAllocatedDescs for .spill files
Next
From: Merlin Moncure
Date:
Subject: Re: Why JIT speed improvement is so modest?