Re: Weird function behavior from Sept 11 snapshot - Mailing list pgsql-hackers

From Thomas Lockhart
Subject Re: Weird function behavior from Sept 11 snapshot
Date
Msg-id 39BE4484.C453B792@alumni.caltech.edu
Whole thread Raw
In response to Weird function behavior from Sept 11 snapshot  (Mike Mascari <mascarm@mascari.com>)
Responses Re: Weird function behavior from Sept 11 snapshot
List pgsql-hackers
> ... Does anyone want to check the time for sum() or avg() on an
> int4 column over a large table, under both 7.0.* and current?

For 262144 rows on the current tree, I get the following:

sum(int4):               12.0 seconds
sum(float8):              5.2 seconds
sum(cast(int4 as float8): 5.7 seconds

This includes startup costs, etc, and are the minimum times from several
runs (there is pretty wide variability, presumably due to disk caching,
swapping, etc on my laptop). It is a safe bet that the original int4
implementation was as fast or faster than the float8 result above (int4
does not require palloc() calls).

> Actually, using a float8 accumulator would work pretty well; assuming
> IEEE float8, you'd only start to get roundoff error when the running
> sum exceeds 2^52 or so.  However the SQL92 spec is insistent that sum()
> deliver an exact-numeric result when applied to exact-numeric data,
> and with a float accumulator we'd be at the mercy of the quality of the
> local implementation of floating point.

A problem with float8 is that it is possible to reach a point in the
accumulation where subsequent input values are ignored in the sum. This
is different than just roundoff error, since it degrades ungracefully
from that point on.

> I could see offering variant aggregates, say "sumf" and "avgf", that
> use float8 accumulation.  Right now the user can get the same result
> by writing "sum(foo::float8)" but it might be wise to formalize the
> idea ...

How about using int8 for the accumulator (on machines which support it
of course)? Falling back to float8 or numeric on other machines? Or
perhaps we could have an option (runtime??) to switch accumulator modes.

I like the idea of something like "sumf" to get alternative algorithms,
but it would be nice if basic sum() could be a bit more optimized than
currently.
                      - Thomas


pgsql-hackers by date:

Previous
From: devik@cdi.cz
Date:
Subject: Re: Performance improvement hints
Next
From: Tom Lane
Date:
Subject: Re: Weird function behavior from Sept 11 snapshot