Thread: 6.5.0 - Overflow bug in AVG( )
% psql test1 Welcome to the POSTGRESQL interactive sql monitor: Please read the file COPYRIGHT for copyright terms of POSTGRESQL [PostgreSQL 6.5.0 on i386-unknown-freebsd3.2, compiled by gcc 2.7.2.1] type \? for help on slash commands type \q to quit type \g or terminate with semicolon to execute queryYou are currentlyconnected to the database: test1 test1=> select count(*), max("ID"), min("ID"), avg("ID") from "ItemsBars";count| max| min| avg ------+-------+-----+---- 677719|3075717|61854|-251 (1 row) Overflow, perhaps? Gene Sokolov.
> [PostgreSQL 6.5.0 on i386-unknown-freebsd3.2, compiled by gcc 2.7.2.1] > test1=> select count(*), max("ID"), min("ID"), avg("ID") from "ItemsBars"; > count| max| min| avg > ------+-------+-----+---- > 677719|3075717|61854|-251 > (1 row) > Overflow, perhaps? Of course. These are integer fields? I've been considering changing all accumulators (and results) for integer aggregate functions to float8, but have not done so yet. I was sort of waiting for a v7.0 release, but am not sure why... - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
> > [PostgreSQL 6.5.0 on i386-unknown-freebsd3.2, compiled by gcc 2.7.2.1] > > test1=> select count(*), max("ID"), min("ID"), avg("ID") from "ItemsBars"; > > count| max| min| avg > > ------+-------+-----+---- > > 677719|3075717|61854|-251 > > (1 row) > > Overflow, perhaps? > > Of course. These are integer fields? I've been considering changing Yes, the fields are int4 > all accumulators (and results) for integer aggregate functions to > float8, but have not done so yet. I was sort of waiting for a v7.0 > release, but am not sure why... Float8 accumulator seems to be a good solution if AVG is limited to int/float types. I wonder if it could produce system dependency in AVG due to rounding errors. Some broader solution should be considered though if you want AVG to work on numeric/decimal as well. Gene Sokolov.
> Float8 accumulator seems to be a good solution if AVG is limited to > int/float types. I wonder if it could produce system dependency in AVG due > to rounding errors. Some broader solution should be considered though if you > want AVG to work on numeric/decimal as well. The implementation can be specified for each datatype individually, so that's not a problem. afaik the way numeric/decimal work it would be fine to use those types as their own accumulators. It's mostly the int2/int4/int8 types which are the problem, since they silently overflow (on most machines?). - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Thomas Lockhart wrote: > > > [PostgreSQL 6.5.0 on i386-unknown-freebsd3.2, compiled by gcc 2.7.2.1] > > test1=> select count(*), max("ID"), min("ID"), avg("ID") from "ItemsBars"; > > count| max| min| avg > > ------+-------+-----+---- > > 677719|3075717|61854|-251 > > (1 row) > > Overflow, perhaps? > > Of course. These are integer fields? I've been considering changing > all accumulators (and results) for integer aggregate functions to > float8, but have not done so yet. I was sort of waiting for a v7.0 > release, but am not sure why... Wouldn't it be better to use NUMERIC for the avg(int) state values? It will never loose any significant digit. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
> > Of course. These are integer fields? I've been considering changing > > all accumulators (and results) for integer aggregate functions to > > float8, but have not done so yet. I was sort of waiting for a v7.0 > > release, but am not sure why... > > Wouldn't it be better to use NUMERIC for the avg(int) state > values? It will never loose any significant digit. Sure. It would be fast, right? avg(int) is likely to be used a lot, and should be as fast as possible. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
"Gene Sokolov" <hook@aktrad.ru> writes: > test1=> select count(*), max("ID"), min("ID"), avg("ID") from "ItemsBars"; > count| max| min| avg > ------+-------+-----+---- > 677719|3075717|61854|-251 > Overflow, perhaps? sum() and avg() for int fields use int accumulators. You might want to use avg(float8(field)) to get a less-likely-to-overflow result. Someday it'd be a good idea to revise the sum() and avg() aggregates to use float or numeric accumulators in all cases. This'd require inventing a few more cross-data-type operators... regards, tom lane
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> Some broader solution should be considered though if you >> want AVG to work on numeric/decimal as well. > The implementation can be specified for each datatype individually, In the current implementation, each datatype does use its own type as the accumulator --- and also as the counter. float8 and numeric are fine, float4 is sort of OK (a float8 accumulator would be better for accuracy reasons), int4 loses, int2 loses *bad*. To fix it we'd need to invent operators that do the appropriate cross- data-type operations. For example, int4 avg using float8 accumulator would need "float8 + int4 yielding float8" and "float8 / int4 yielding int4", neither of which are to be found in pg_proc at the moment. But it's a straightforward thing to do. int8 is the only integer type that I wouldn't want to use a float8 accumulator for. Maybe numeric would be the appropriate thing here, slow though it be. Note that switching over to float accumulation would *not* be real palatable until we have fixed the memory-leak issue. avg() on int4 doesn't leak memory currently, but it would with a float accumulator... regards, tom lane
Thomas Lockhart wrote: > > > > Of course. These are integer fields? I've been considering changing > > > all accumulators (and results) for integer aggregate functions to > > > float8, but have not done so yet. I was sort of waiting for a v7.0 > > > release, but am not sure why... > > > > Wouldn't it be better to use NUMERIC for the avg(int) state > > values? It will never loose any significant digit. > > Sure. It would be fast, right? avg(int) is likely to be used a lot, > and should be as fast as possible. I think it would be fast enough, even if I have things in mind how to speed it up. But that would result in a total rewrite of NUMERIC from scratch. The only math function of NUMERIC which is time critical for AVG() is ADD. And even for int8 the number of digits it has to perform is relatively small. I expect the time spent on that is negligible compared to the heap scanning required to get all the values. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #