Thread: RE: [HACKERS] 6.5.0 - Overflow bug in AVG( )

RE: [HACKERS] 6.5.0 - Overflow bug in AVG( )

From

"Jackson, DeJuan"

Date:

16 June 1999, 11:22:01

What does the spec have to say?  It bothers me somewhat that an AVG is
expected to return an integer result at all.  Isn't the Average of 1 and 2,
1.5 not 1?

just my $0.02,-DEJ

> -----Original Message-----
> From:    Tom Lane [SMTP:tgl@sss.pgh.pa.us]
> Sent:    Wednesday, June 16, 1999 9:52 AM
> To:    Thomas Lockhart
> Cc:    Gene Sokolov; pgsql-hackers@postgreSQL.org
> Subject:    Re: [HACKERS] 6.5.0 - Overflow bug in AVG( ) 
> 
> Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> >> Some broader solution should be considered though if you
> >> want AVG to work on numeric/decimal as well.
> 
> > The implementation can be specified for each datatype individually,
> 
> In the current implementation, each datatype does use its own type as
> the accumulator --- and also as the counter.  float8 and numeric are
> fine, float4 is sort of OK (a float8 accumulator would be better for
> accuracy reasons), int4 loses, int2 loses *bad*.
> 
> To fix it we'd need to invent operators that do the appropriate cross-
> data-type operations.  For example, int4 avg using float8 accumulator
> would need "float8 + int4 yielding float8" and "float8 / int4 yielding
> int4", neither of which are to be found in pg_proc at the moment.  But
> it's a straightforward thing to do.
> 
> int8 is the only integer type that I wouldn't want to use a float8
> accumulator for.  Maybe numeric would be the appropriate thing here,
> slow though it be.
> 
> Note that switching over to float accumulation would *not* be real
> palatable until we have fixed the memory-leak issue.  avg() on int4
> doesn't leak memory currently, but it would with a float accumulator...
> 
>             regards, tom lane

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

From

Thomas Lockhart

Date:

16 June 1999, 11:28:27

> What does the spec have to say?  It bothers me somewhat that an AVG is
> expected to return an integer result at all.  Isn't the Average of 1 and 2,
> 1.5 not 1?

Yeah, well, it's a holdover from the original Postgres code. We just
haven't made an effort to change it yet, but it seems a good candidate
for a makeover, no?

I'm pretty sure that the spec would suggest a float8 return value for
avg(int), but I haven't looked recently to refresh my memory.
               - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

From

Tom Lane

Date:

16 June 1999, 11:32:46

"Jackson, DeJuan" <djackson@cpsgroup.com> writes:
> What does the spec have to say?  It bothers me somewhat that an AVG is
> expected to return an integer result at all.  Isn't the Average of 1
> and 2, 1.5 not 1?

That bothered me too.  The draft spec that I have sez:
 b) If SUM is specified and DT is exact numeric with scale    S, then the data type of the result is exact numeric with
  implementation-defined precision and scale S.

  c) If AVG is specified and DT is exact numeric, then the data    type of the result is exact numeric with
implementation-   defined precision not less than the precision of DT and    implementation-defined scale not less than
thescale of DT.

  d) If DT is approximate numeric, then the data type of the    result is approximate numeric with
implementation-defined   precision not less than the precision of DT.

 65)Subclause 6.5, "<set function specification>": The precision of    the value derived from application of the SUM
functionto a data    type of exact numeric is implementation-defined.

 66)Subclause 6.5, "<set function specification>": The precision and    scale of the value derived from application of
theAVG function    to a data type of exact numeric is implementation-defined.

 67)Subclause 6.5, "<set function specification>": The preci-    sion of the value derived from application of the SUM
func-   tion or AVG function to a data type of approximate numeric is    implementation-defined.

This would seem to give license for the result of AVG() on an int4 field
to be NUMERIC with a fraction part, but not FLOAT.  But I suspect we
could get away with making it be FLOAT anyway.  Anyone know what other
databases do?
        regards, tom lane

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

From

Thomas Lockhart

Date:

16 June 1999, 11:45:22

> This would seem to give license for the result of AVG() on an int4 field
> to be NUMERIC with a fraction part, but not FLOAT.  But I suspect we
> could get away with making it be FLOAT anyway.

Sure, that can't be worse in practice than what we do now. But it is
interesting that we are currently SQL92 conforming (except for that
nasty overflow business; they probably don't mention that ;).

For int2/int4, we could bump the accumulator to int8 (certainly faster
than our numeric implementation?), but there are a very few platforms
which don't support int8 and we shouldn't break the aggregates for
them. We could get around that by defining explicit routines to be
used in the aggregates, and then having some #ifdef alternate code if
int8 is not available...

Tom, do you think that a hack in the aggregate support code which
compares the pointer returned to the pointer input, then pfree'ing the
input area if they differ, would fix the major leakage? We could even
have a backend global variable which enables/disables the feature to
allow performance tuning.
                - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

From

Tom Lane

Date:

16 June 1999, 12:09:48

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> For int2/int4, we could bump the accumulator to int8 (certainly faster
> than our numeric implementation?), but there are a very few platforms
> which don't support int8 and we shouldn't break the aggregates for
> them.

Right, that's why I preferred the idea of using float8.

Note that any reasonable floating-point implementation will deliver an
exact result for the sum of integer inputs, up to the point at which the
sum exceeds the number of mantissa bits in a float (2^52 or so in IEEE
float8).  After that you start to lose accuracy.  Using int8 would give
an exact sum up to 2^63, but if we want to start delivering a fractional
average then float still looks like a better deal...

> Tom, do you think that a hack in the aggregate support code which
> compares the pointer returned to the pointer input, then pfree'ing the
> input area if they differ, would fix the major leakage?

Yeah, that would probably work OK, although you'd have to be careful of
the initial condition --- is the initial value always safely pfreeable?

> We could even have a backend global variable which enables/disables
> the feature to allow performance tuning.

Seems unnecessary.
        regards, tom lane

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

From

José Soares

Date:

17 June 1999, 02:49:02

<tt>PostgreSQL:</tt><br /><tt>^^^^^^^^^^^</tt><br /><tt>prova=> select min(a), max(a), avg(a) from aa;</tt><br
/><tt>min|max|avg</tt><br/><tt>---+---+---</tt><br /><tt>  1|  2|  1</tt><br /><tt>(1 row)</tt><br
/><tt></tt> <tt></tt><p><tt>informix:-----------hygea@hygea ------------ Press CTRL-W for Help --------</tt><br
/><tt>^^^^^^^^^</tt><br/><tt>      (min)       (max)            (avg)</tt><tt></tt><p><tt>          1          
2                1.50000000000000</tt><tt></tt><p><tt>oracle:</tt><br /><tt>^^^^^^^</tt><br /><tt>SQL> select
min(a),max(a), avg(a) from aa;</tt><tt></tt><p><tt>    MIN(A)        MAX(A)       AVG(A)</tt><br /><tt>----------   
----------   ----------</tt><br /><tt>         1             2           1.5</tt><br /><tt></tt>  <br /><tt></tt>  <br
/> <p>Tom Lane ha scritto: <blockquote type="CITE">"Jackson, DeJuan" <djackson@cpsgroup.com> writes: <br />>
Whatdoes the spec have to say?  It bothers me somewhat that an AVG is <br />> expected to return an integer result
atall.  Isn't the Average of 1 <br />> and 2, 1.5 not 1? <p>That bothered me too.  The draft spec that I have sez:
<p> b) If SUM is specified and DT is exact numeric with scale <br />     S, then the data type of the result is exact
numericwith <br />     implementation-defined precision and scale S. <p>   c) If AVG is specified and DT is exact
numeric,then the data <br />     type of the result is exact numeric with implementation- <br />     defined precision
notless than the precision of DT and <br />     implementation-defined scale not less than the scale of DT. <p>   d) If
DTis approximate numeric, then the data type of the <br />     result is approximate numeric with
implementation-defined<br />     precision not less than the precision of DT. <p>  65)Subclause 6.5, "<set function
specification>":The precision of <br />     the value derived from application of the SUM function to a data <br
/>    type of exact numeric is implementation-defined. <p>  66)Subclause 6.5, "<set function specification>": The
precisionand <br />     scale of the value derived from application of the AVG function <br />     to a data type of
exactnumeric is implementation-defined. <p>  67)Subclause 6.5, "<set function specification>": The preci- <br
/>    sion of the value derived from application of the SUM func- <br />     tion or AVG function to a data type of
approximatenumeric is <br />     implementation-defined. <p>This would seem to give license for the result of AVG() on
anint4 field <br />to be NUMERIC with a fraction part, but not FLOAT.  But I suspect we <br />could get away with
makingit be FLOAT anyway.  Anyone know what other <br />databases do? <p>                        regards, tom
lane</blockquote><p><br/>______________________________________________________________ <br />PostgreSQL 6.5.0 on
i586-pc-linux-gnu,compiled by gcc 2.7.2.3 <br />^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ <br
/>Jose'<br />

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

From

Vince Vielhaber

Date:

17 June 1999, 06:47:12

On Wed, 16 Jun 1999, [iso-8859-1] Jos� Soares wrote:

> PostgreSQL:
> ^^^^^^^^^^^
> prova=> select min(a), max(a), avg(a) from aa;
> min|max|avg
> ---+---+---
>   1|  2|  1
> (1 row)
> 
> 

Sybase - I'm guessing/ass-u-me ing it's around version 4.9

1> select min(a), max(a), avg(a) from aa
2> go                                    ----------- ----------- -----------           1           2           1 

(1 row affected)
1> 

> > This would seem to give license for the result of AVG() on an int4 field
> > to be NUMERIC with a fraction part, but not FLOAT.  But I suspect we
> > could get away with making it be FLOAT anyway.  Anyone know what other
> > databases do?


Vince.
-- 
==========================================================================
Vince Vielhaber -- KA8CSH   email: vev@michvhf.com   flame-mail: /dev/null      # include <std/disclaimers.h>
       TEAM-OS2       Online Campground Directory    http://www.camping-usa.com      Online Giftshop Superstore
http://www.cloudninegifts.com
==========================================================================