Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> Tom, do you recall measuring the performance difference on aggregate
> functions between int4 and numeric for small-value cases? We probably
> don't want to take order-of-magnitude performance hits to get this more
> correct behavior, but I'm not sure what the performance actually is.
I have not tried to measure it --- I was sort of assuming that for
realistic-size problems, disk I/O would swamp any increase in CPU time
anyway. Does anyone want to check the time for sum() or avg() on an
int4 column over a large table, under both 7.0.* and current?
> The problem is that numeric is extremely slow compared to an int4
> calculation, and I'd like us to consider doing the calculation in int4
> (downside: silent overflow when dealing with non-trivial data), int8
> (downside: no support on a few platforms), or float8 (downside: silent
> truncation on non-trivial data).
Actually, using a float8 accumulator would work pretty well; assuming
IEEE float8, you'd only start to get roundoff error when the running
sum exceeds 2^52 or so. However the SQL92 spec is insistent that sum()
deliver an exact-numeric result when applied to exact-numeric data,
and with a float accumulator we'd be at the mercy of the quality of the
local implementation of floating point.
I could see offering variant aggregates, say "sumf" and "avgf", that
use float8 accumulation. Right now the user can get the same result
by writing "sum(foo::float8)" but it might be wise to formalize the
idea ...
regards, tom lane