Thread: 6.5.0 - Overflow bug in AVG( )

6.5.0 - Overflow bug in AVG( )

From
"Gene Sokolov"
Date:
% psql test1
Welcome to the POSTGRESQL interactive sql monitor: Please read the file COPYRIGHT for copyright terms of POSTGRESQL
[PostgreSQL 6.5.0 on i386-unknown-freebsd3.2, compiled by gcc 2.7.2.1]
  type \? for help on slash commands  type \q to quit  type \g or terminate with semicolon to execute queryYou are
currentlyconnected to the database: test1
 

test1=>  select count(*), max("ID"), min("ID"), avg("ID") from "ItemsBars";count|    max|  min| avg
------+-------+-----+----
677719|3075717|61854|-251
(1 row)

Overflow, perhaps?

Gene Sokolov.




Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

From
Thomas Lockhart
Date:
> [PostgreSQL 6.5.0 on i386-unknown-freebsd3.2, compiled by gcc 2.7.2.1]
> test1=>  select count(*), max("ID"), min("ID"), avg("ID") from "ItemsBars";
>  count|    max|  min| avg
> ------+-------+-----+----
> 677719|3075717|61854|-251
> (1 row)
> Overflow, perhaps?

Of course. These are integer fields? I've been considering changing
all accumulators (and results) for integer aggregate functions to
float8, but have not done so yet. I was sort of waiting for a v7.0
release, but am not sure why...
                 - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

From
"Gene Sokolov"
Date:
> > [PostgreSQL 6.5.0 on i386-unknown-freebsd3.2, compiled by gcc 2.7.2.1]
> > test1=>  select count(*), max("ID"), min("ID"), avg("ID") from
"ItemsBars";
> >  count|    max|  min| avg
> > ------+-------+-----+----
> > 677719|3075717|61854|-251
> > (1 row)
> > Overflow, perhaps?
>
> Of course. These are integer fields? I've been considering changing

Yes, the fields are int4

> all accumulators (and results) for integer aggregate functions to
> float8, but have not done so yet. I was sort of waiting for a v7.0
> release, but am not sure why...

Float8 accumulator seems to be a good solution if AVG is limited to
int/float types. I wonder if it could produce system dependency in AVG due
to rounding errors. Some broader solution should be considered though if you
want AVG to work on numeric/decimal as well.

Gene Sokolov.




Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

From
Thomas Lockhart
Date:
> Float8 accumulator seems to be a good solution if AVG is limited to
> int/float types. I wonder if it could produce system dependency in AVG due
> to rounding errors. Some broader solution should be considered though if you
> want AVG to work on numeric/decimal as well.

The implementation can be specified for each datatype individually, so
that's not a problem. afaik the way numeric/decimal work it would be
fine to use those types as their own accumulators. It's mostly the
int2/int4/int8 types which are the problem, since they silently
overflow (on most machines?).
                    - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

From
wieck@debis.com (Jan Wieck)
Date:
Thomas Lockhart wrote:

>
> > [PostgreSQL 6.5.0 on i386-unknown-freebsd3.2, compiled by gcc 2.7.2.1]
> > test1=>  select count(*), max("ID"), min("ID"), avg("ID") from "ItemsBars";
> >  count|    max|  min| avg
> > ------+-------+-----+----
> > 677719|3075717|61854|-251
> > (1 row)
> > Overflow, perhaps?
>
> Of course. These are integer fields? I've been considering changing
> all accumulators (and results) for integer aggregate functions to
> float8, but have not done so yet. I was sort of waiting for a v7.0
> release, but am not sure why...

    Wouldn't  it  be better to use NUMERIC for the avg(int) state
    values?  It will never loose any significant digit.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

From
Thomas Lockhart
Date:
> > Of course. These are integer fields? I've been considering changing
> > all accumulators (and results) for integer aggregate functions to
> > float8, but have not done so yet. I was sort of waiting for a v7.0
> > release, but am not sure why...
> 
>     Wouldn't  it  be better to use NUMERIC for the avg(int) state
>     values?  It will never loose any significant digit.

Sure. It would be fast, right? avg(int) is likely to be used a lot,
and should be as fast as possible.
                  - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

From
Tom Lane
Date:
"Gene Sokolov" <hook@aktrad.ru> writes:
> test1=>  select count(*), max("ID"), min("ID"), avg("ID") from "ItemsBars";
>  count|    max|  min| avg
> ------+-------+-----+----
> 677719|3075717|61854|-251

> Overflow, perhaps?

sum() and avg() for int fields use int accumulators.  You might want
to use avg(float8(field)) to get a less-likely-to-overflow result.

Someday it'd be a good idea to revise the sum() and avg() aggregates
to use float or numeric accumulators in all cases.  This'd require
inventing a few more cross-data-type operators...
        regards, tom lane


Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

From
Tom Lane
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> Some broader solution should be considered though if you
>> want AVG to work on numeric/decimal as well.

> The implementation can be specified for each datatype individually,

In the current implementation, each datatype does use its own type as
the accumulator --- and also as the counter.  float8 and numeric are
fine, float4 is sort of OK (a float8 accumulator would be better for
accuracy reasons), int4 loses, int2 loses *bad*.

To fix it we'd need to invent operators that do the appropriate cross-
data-type operations.  For example, int4 avg using float8 accumulator
would need "float8 + int4 yielding float8" and "float8 / int4 yielding
int4", neither of which are to be found in pg_proc at the moment.  But
it's a straightforward thing to do.

int8 is the only integer type that I wouldn't want to use a float8
accumulator for.  Maybe numeric would be the appropriate thing here,
slow though it be.

Note that switching over to float accumulation would *not* be real
palatable until we have fixed the memory-leak issue.  avg() on int4
doesn't leak memory currently, but it would with a float accumulator...
        regards, tom lane


Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

From
wieck@debis.com (Jan Wieck)
Date:
Thomas Lockhart wrote:

>
> > > Of course. These are integer fields? I've been considering changing
> > > all accumulators (and results) for integer aggregate functions to
> > > float8, but have not done so yet. I was sort of waiting for a v7.0
> > > release, but am not sure why...
> >
> >     Wouldn't  it  be better to use NUMERIC for the avg(int) state
> >     values?  It will never loose any significant digit.
>
> Sure. It would be fast, right? avg(int) is likely to be used a lot,
> and should be as fast as possible.

    I  think  it  would  be fast enough, even if I have things in
    mind how to speed it up. But that would  result  in  a  total
    rewrite of NUMERIC from scratch.

    The  only math function of NUMERIC which is time critical for
    AVG() is ADD. And even for int8 the number of digits  it  has
    to  perform  is  relatively small. I expect the time spent on
    that is negligible compared to the heap scanning required  to
    get all the values.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #