When sum(int4) or sum(int2) is executed, many cycles are spent by
AllocSetReset. Because per-tuple context is used to allocate the
first data of each group.
An attached patch uses AggState->aggcontext instead of per-tuple
context to allocate the data. As a result, per-tuple context is not
used, and the cycles of AllocSetReset is reduced.
test data:
pgbench -i -s 5
SQL:
select a.bid, sum(a.abalance)
from accounts a
group by a.bid;
execution time(compile option "-O2"):
original: 1.530s
patched: 1.441s
profile result of original code(compile option "-g -pg"):
----------------------------------------------------------------------------
% cumulative self self total
time seconds seconds calls s/call s/call name
15.64 0.35 0.35 1500000 0.00 0.00 slot_deform_tuple
11.67 0.62 0.27 1000002 0.00 0.00 AllocSetReset
6.61 0.77 0.15 1999995 0.00 0.00 slot_getattr
5.29 0.89 0.12 500002 0.00 0.00 heapgettup
3.52 0.97 0.08 524420 0.00 0.00 hash_search
profile result of patched code(compile option "-g -pg"):
----------------------------------------------------------------------------
% cumulative self self total
time seconds seconds calls s/call s/call name
17.39 0.32 0.32 1500000 0.00 0.00 slot_deform_tuple
6.52 0.44 0.12 500002 0.00 0.00 heapgettup
6.25 0.56 0.12 1999995 0.00 0.00 slot_getattr
4.35 0.64 0.08 524420 0.00 0.00 hash_search
4.35 0.71 0.08 499995 0.00 0.00 execTuplesMatch
(skip ...)
0.54 1.67 0.01 1000002 0.00 0.00 AllocSetReset
regards,
--- Atsushi Ogawa