Thread: RE: [HACKERS] 6.5.0 - Overflow bug in AVG( )
What does the spec have to say? It bothers me somewhat that an AVG is expected to return an integer result at all. Isn't the Average of 1 and 2, 1.5 not 1? just my $0.02,-DEJ > -----Original Message----- > From: Tom Lane [SMTP:tgl@sss.pgh.pa.us] > Sent: Wednesday, June 16, 1999 9:52 AM > To: Thomas Lockhart > Cc: Gene Sokolov; pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] 6.5.0 - Overflow bug in AVG( ) > > Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > >> Some broader solution should be considered though if you > >> want AVG to work on numeric/decimal as well. > > > The implementation can be specified for each datatype individually, > > In the current implementation, each datatype does use its own type as > the accumulator --- and also as the counter. float8 and numeric are > fine, float4 is sort of OK (a float8 accumulator would be better for > accuracy reasons), int4 loses, int2 loses *bad*. > > To fix it we'd need to invent operators that do the appropriate cross- > data-type operations. For example, int4 avg using float8 accumulator > would need "float8 + int4 yielding float8" and "float8 / int4 yielding > int4", neither of which are to be found in pg_proc at the moment. But > it's a straightforward thing to do. > > int8 is the only integer type that I wouldn't want to use a float8 > accumulator for. Maybe numeric would be the appropriate thing here, > slow though it be. > > Note that switching over to float accumulation would *not* be real > palatable until we have fixed the memory-leak issue. avg() on int4 > doesn't leak memory currently, but it would with a float accumulator... > > regards, tom lane
> What does the spec have to say? It bothers me somewhat that an AVG is > expected to return an integer result at all. Isn't the Average of 1 and 2, > 1.5 not 1? Yeah, well, it's a holdover from the original Postgres code. We just haven't made an effort to change it yet, but it seems a good candidate for a makeover, no? I'm pretty sure that the spec would suggest a float8 return value for avg(int), but I haven't looked recently to refresh my memory. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
"Jackson, DeJuan" <djackson@cpsgroup.com> writes: > What does the spec have to say? It bothers me somewhat that an AVG is > expected to return an integer result at all. Isn't the Average of 1 > and 2, 1.5 not 1? That bothered me too. The draft spec that I have sez: b) If SUM is specified and DT is exact numeric with scale S, then the data type of the result is exact numeric with implementation-defined precision and scale S. c) If AVG is specified and DT is exact numeric, then the data type of the result is exact numeric with implementation- defined precision not less than the precision of DT and implementation-defined scale not less than thescale of DT. d) If DT is approximate numeric, then the data type of the result is approximate numeric with implementation-defined precision not less than the precision of DT. 65)Subclause 6.5, "<set function specification>": The precision of the value derived from application of the SUM functionto a data type of exact numeric is implementation-defined. 66)Subclause 6.5, "<set function specification>": The precision and scale of the value derived from application of theAVG function to a data type of exact numeric is implementation-defined. 67)Subclause 6.5, "<set function specification>": The preci- sion of the value derived from application of the SUM func- tion or AVG function to a data type of approximate numeric is implementation-defined. This would seem to give license for the result of AVG() on an int4 field to be NUMERIC with a fraction part, but not FLOAT. But I suspect we could get away with making it be FLOAT anyway. Anyone know what other databases do? regards, tom lane
> This would seem to give license for the result of AVG() on an int4 field > to be NUMERIC with a fraction part, but not FLOAT. But I suspect we > could get away with making it be FLOAT anyway. Sure, that can't be worse in practice than what we do now. But it is interesting that we are currently SQL92 conforming (except for that nasty overflow business; they probably don't mention that ;). For int2/int4, we could bump the accumulator to int8 (certainly faster than our numeric implementation?), but there are a very few platforms which don't support int8 and we shouldn't break the aggregates for them. We could get around that by defining explicit routines to be used in the aggregates, and then having some #ifdef alternate code if int8 is not available... Tom, do you think that a hack in the aggregate support code which compares the pointer returned to the pointer input, then pfree'ing the input area if they differ, would fix the major leakage? We could even have a backend global variable which enables/disables the feature to allow performance tuning. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > For int2/int4, we could bump the accumulator to int8 (certainly faster > than our numeric implementation?), but there are a very few platforms > which don't support int8 and we shouldn't break the aggregates for > them. Right, that's why I preferred the idea of using float8. Note that any reasonable floating-point implementation will deliver an exact result for the sum of integer inputs, up to the point at which the sum exceeds the number of mantissa bits in a float (2^52 or so in IEEE float8). After that you start to lose accuracy. Using int8 would give an exact sum up to 2^63, but if we want to start delivering a fractional average then float still looks like a better deal... > Tom, do you think that a hack in the aggregate support code which > compares the pointer returned to the pointer input, then pfree'ing the > input area if they differ, would fix the major leakage? Yeah, that would probably work OK, although you'd have to be careful of the initial condition --- is the initial value always safely pfreeable? > We could even have a backend global variable which enables/disables > the feature to allow performance tuning. Seems unnecessary. regards, tom lane
<tt>PostgreSQL:</tt><br /><tt>^^^^^^^^^^^</tt><br /><tt>prova=> select min(a), max(a), avg(a) from aa;</tt><br /><tt>min|max|avg</tt><br/><tt>---+---+---</tt><br /><tt> 1| 2| 1</tt><br /><tt>(1 row)</tt><br /><tt></tt> <tt></tt><p><tt>informix:-----------hygea@hygea ------------ Press CTRL-W for Help --------</tt><br /><tt>^^^^^^^^^</tt><br/><tt> (min) (max) (avg)</tt><tt></tt><p><tt> 1 2 1.50000000000000</tt><tt></tt><p><tt>oracle:</tt><br /><tt>^^^^^^^</tt><br /><tt>SQL> select min(a),max(a), avg(a) from aa;</tt><tt></tt><p><tt> MIN(A) MAX(A) AVG(A)</tt><br /><tt>---------- ---------- ----------</tt><br /><tt> 1 2 1.5</tt><br /><tt></tt> <br /><tt></tt> <br /> <p>Tom Lane ha scritto: <blockquote type="CITE">"Jackson, DeJuan" <djackson@cpsgroup.com> writes: <br />> Whatdoes the spec have to say? It bothers me somewhat that an AVG is <br />> expected to return an integer result atall. Isn't the Average of 1 <br />> and 2, 1.5 not 1? <p>That bothered me too. The draft spec that I have sez: <p> b) If SUM is specified and DT is exact numeric with scale <br /> S, then the data type of the result is exact numericwith <br /> implementation-defined precision and scale S. <p> c) If AVG is specified and DT is exact numeric,then the data <br /> type of the result is exact numeric with implementation- <br /> defined precision notless than the precision of DT and <br /> implementation-defined scale not less than the scale of DT. <p> d) If DTis approximate numeric, then the data type of the <br /> result is approximate numeric with implementation-defined<br /> precision not less than the precision of DT. <p> 65)Subclause 6.5, "<set function specification>":The precision of <br /> the value derived from application of the SUM function to a data <br /> type of exact numeric is implementation-defined. <p> 66)Subclause 6.5, "<set function specification>": The precisionand <br /> scale of the value derived from application of the AVG function <br /> to a data type of exactnumeric is implementation-defined. <p> 67)Subclause 6.5, "<set function specification>": The preci- <br /> sion of the value derived from application of the SUM func- <br /> tion or AVG function to a data type of approximatenumeric is <br /> implementation-defined. <p>This would seem to give license for the result of AVG() on anint4 field <br />to be NUMERIC with a fraction part, but not FLOAT. But I suspect we <br />could get away with makingit be FLOAT anyway. Anyone know what other <br />databases do? <p> regards, tom lane</blockquote><p><br/>______________________________________________________________ <br />PostgreSQL 6.5.0 on i586-pc-linux-gnu,compiled by gcc 2.7.2.3 <br />^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ <br />Jose'<br />
On Wed, 16 Jun 1999, [iso-8859-1] Jos� Soares wrote: > PostgreSQL: > ^^^^^^^^^^^ > prova=> select min(a), max(a), avg(a) from aa; > min|max|avg > ---+---+--- > 1| 2| 1 > (1 row) > > Sybase - I'm guessing/ass-u-me ing it's around version 4.9 1> select min(a), max(a), avg(a) from aa 2> go ----------- ----------- ----------- 1 2 1 (1 row affected) 1> > > This would seem to give license for the result of AVG() on an int4 field > > to be NUMERIC with a fraction part, but not FLOAT. But I suspect we > > could get away with making it be FLOAT anyway. Anyone know what other > > databases do? Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com flame-mail: /dev/null # include <std/disclaimers.h> TEAM-OS2 Online Campground Directory http://www.camping-usa.com Online Giftshop Superstore http://www.cloudninegifts.com ==========================================================================