Re: Why JIT speed improvement is so modest? - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Why JIT speed improvement is so modest? |
Date | |
Msg-id | 20191127160504.uxw7tzbd2gwzwsov@development Whole thread Raw |
In response to | Re: Why JIT speed improvement is so modest? (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>) |
Responses |
Re: Why JIT speed improvement is so modest?
|
List | pgsql-hackers |
On Wed, Nov 27, 2019 at 06:38:45PM +0300, Konstantin Knizhnik wrote: > > >On 25.11.2019 18:24, Merlin Moncure wrote: >>On Mon, Nov 25, 2019 at 9:09 AM Konstantin Knizhnik >><k.knizhnik@postgrespro.ru> wrote: >>>JIT was not able to significantly (times) increase speed on Q1 query? >>>Experiment with VOPS shows that used aggregation algorithm itself is not >>>a bottleneck. >>>Profile also give no answer for this question. >>>Any ideas? >>Well, in the VOPS variant around 2/3 of the time is spent in routines >>that are obviously aggregation. In the JIT version, it's around 20%. >>So this suggests that the replacement execution engine is more >>invasive. I would also guess (!) that the VOPS engine optimizes fewer >>classes of query plan. ExecScan for example, looks to be completely >>optimized out VOPS but is still utilized in the JIT engine. > >The difference in fraction of time spent in aggregate calculation is >not so large (2 times vs. 10 times). >I suspected that a lot of time is spent in relation traversal code, >tuple unpacking and visibility checks. >To check this hypothesis I have implement in-memory table access >method which stores tuples in unpacked form and >doesn't perform any visibility checks at all. >Results were not so existed. I have to disable parallel execution >(because it is not possible for tuples stored in backend private >memory). >Results are the following: > >lineitem: 13736 msec >inmem_lineitem: 10044 msec >vops_lineitem: 1945 msec > >The profile of inmem_lineitem is the following: > > 16.79% postgres postgres [.] float4_accum > 12.86% postgres postgres [.] float8_accum > 5.83% postgres postgres [.] TupleHashTableHash.isra.8 > 4.44% postgres postgres [.] lookup_hash_entries > 3.37% postgres postgres [.] check_float8_array > 3.11% postgres postgres [.] tuplehash_insert > 2.91% postgres postgres [.] hash_uint32 > 2.83% postgres postgres [.] ExecScan > 2.56% postgres postgres [.] inmem_getnextslot > 2.22% postgres postgres [.] FunctionCall1Coll > 2.14% postgres postgres [.] LookupTupleHashEntry > 1.95% postgres postgres [.] TupleHashTableMatch.isra.9 > 1.76% postgres postgres [.] pg_detoast_datum > 1.58% postgres postgres [.] AggCheckCallContext > 1.57% postgres postgres [.] tts_minimal_clear > 1.35% postgres perf-3054.map [.] 0x00007f558db60010 > 1.23% postgres postgres [.] fetch_input_tuple > 1.15% postgres postgres [.] SeqNext > 1.06% postgres postgres [.] ExecAgg > 1.00% postgres postgres [.] tts_minimal_store_tuple > >So now fraction of time spent in aggregation is increased to 30% (vs. >20% for lineitem and 42% for vops_lineitem). >Looks like the main bottleneck now is hashagg. VOPS is accessing hash >about 10 times less (because it accumulates values for the whole >tile). >And it explains still large difference bwtween vops_lineitem and >inmem_lineitem. > >If we remove aggregation and rewrite Q1 query as: >select > avg(l_quantity) as sum_qty, > avg(l_extendedprice) as sum_base_price, > avg(l_extendedprice*(1-l_discount)) as sum_disc_price, > avg(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge, > avg(l_quantity) as avg_qty, > avg(l_extendedprice) as avg_price, > avg(l_discount) as avg_disc, > count(*) as count_order >from > inmem_lineitem >where > l_shipdate <= '1998-12-01'; > >then results are the following: >lineitem: 9805 msec >inmem_lineitem: 6257 msec >vops_lineitem: 1865 msec > >and now profile of inmem_lineitem is: > > 25.27% postgres postgres [.] float4_accum > 21.86% postgres postgres [.] float8_accum > 5.49% postgres postgres [.] check_float8_array > 4.57% postgres postgres [.] ExecScan > 2.61% postgres postgres [.] AggCheckCallContext > 2.30% postgres postgres [.] pg_detoast_datum > 2.10% postgres postgres [.] inmem_getnextslot > 1.81% postgres postgres [.] SeqNext > 1.73% postgres postgres [.] fetch_input_tuple > 1.61% postgres postgres [.] ExecAgg > 1.23% postgres postgres [.] MemoryContextReset > >But still more than 3 times difference with VOPS! >Something is wrong here... > I have no idea what VOPS does, but IIRC one of the bottlenecks compared to various column stores is our iterative execution model, which makes it difficult/imposible to vectorize operations. That's likely why the accum functions are so high in the CPU profile. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: