On 06.12.2019 19:52, Konstantin Knizhnik wrote:
>
>
> On 06.12.2019 18:53, Robert Haas wrote:
>> On Thu, Nov 28, 2019 at 2:08 AM Konstantin Knizhnik
>> <k.knizhnik@postgrespro.ru> wrote:
>>> calls float4_accum for each row of T, the same query in VOPS will call
>>> vops_float4_avg_accumulate for each tile which contains 64 elements.
>>> So vops_float4_avg_accumulate is called 64 times less than
>>> float4_accum.
>>> And inside it contains straightforward loop:
>>>
>>> for (i = 0; i < TILE_SIZE; i++) {
>>> sum += opd->payload[i];
>>> }
>>>
>>> which can be optimized by compiler (loop unrolling, use of SIMD
>>> instructions,...).
>> Part of the reason why the compiler can optimize that so well is
>> probably related to the fact that it includes no overflow checks.
>
> May it makes sense to use in aggregate transformation function which
> is not checking for overflow and perform this check only in final
> function?
> NaN and Inf values will be preserved in any case...
>
I have tried to comment check_float8_val in float4_pl/float8_pl and get
completely no difference in performance.
But if I replace query
select
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice*(1-l_discount)) as sum_disc_price,
sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,
sum(l_quantity) as avg_qty,
sum(l_extendedprice) as avg_price,
sum(l_discount) as avg_disc,
count(*) as count_order
from lineitem_inmem;
with
select sum(l_quantity + l_extendedprice + l_discount + l_tax) from
lineitem_inmem;
then time is reduced from 3686 to 1748 msec.
So at least half of this time we spend in expression evaluations and
aggregates accumulation.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company