On Thu, Nov 28, 2019 at 2:08 AM Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> calls float4_accum for each row of T, the same query in VOPS will call
> vops_float4_avg_accumulate for each tile which contains 64 elements.
> So vops_float4_avg_accumulate is called 64 times less than float4_accum.
> And inside it contains straightforward loop:
>
> for (i = 0; i < TILE_SIZE; i++) {
> sum += opd->payload[i];
> }
>
> which can be optimized by compiler (loop unrolling, use of SIMD
> instructions,...).
Part of the reason why the compiler can optimize that so well is
probably related to the fact that it includes no overflow checks.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company