Thread: Use int8 for int4/int2 aggregate accumulators?

Use int8 for int4/int2 aggregate accumulators?

From
Tom Lane
Date:
This was discussed on pgsql-general a little bit on 21-July, but the
discussion died off without reaching a conclusion.  I'd like to
put out a concrete proposal and see if anyone has objections.

1. SUM() and AVG() for int2 and int4 inputs should accumulate the
running sum as an INT8, not a NUMERIC, for speed reasons.  INT8 seems
large enough to avoid overflow in practical situations.  The final
output datatype of AVG() will still be NUMERIC, but the final output
of SUM() will become INT8 for these two input types.

2. STDDEV() and VARIANCE() for int2 and int4 inputs will continue to
use NUMERIC for accuracy and overflow reasons (accumulating sum(x^2)
is much more prone to overflow than sum(x)).  So will all these
aggregates for INT8.

3. As a separate proposal, we could change COUNT()'s running counter
and output datatype from INT4 to INT8.  This would make it a little
slower but effectively overflow-proof.


All of these changes are within the latitude that the SQL92 spec
affords (it just says that the output values are exact numeric with
implementation-defined precision and scale).  Issues to consider are:

* On machines with no 8-byte-int C datatype, the accumulator would
effectively be int4.  This would make the behavior no worse than
currently for COUNT(), and no worse than it was in 7.0 for SUM() and
AVG(), so that doesn't bother me a whole lot.  But it would be a
new source of cross-platform behavioral differences.

* Changing the output datatype of these operations --- especially COUNT
--- might affect or even break applications.  We got a few complaints,
not many, about changing SUM() and AVG() from integer to NUMERIC output
in 7.1.  Changing SUM() to INT8 isn't likely to hurt anyone who survived
that transition.  But COUNT() is much more widely used and is more
likely to affect people.  Should we keep it at INT4 output to avoid
compatibility problems?
        regards, tom lane


Re: Use int8 for int4/int2 aggregate accumulators?

From
Tom Lane
Date:
I wrote:
> 3. As a separate proposal, we could change COUNT()'s running counter
> and output datatype from INT4 to INT8.  This would make it a little
> slower but effectively overflow-proof.

> * Changing the output datatype of these operations --- especially COUNT
> --- might affect or even break applications.  We got a few complaints,
> not many, about changing SUM() and AVG() from integer to NUMERIC output
> in 7.1.  Changing SUM() to INT8 isn't likely to hurt anyone who survived
> that transition.  But COUNT() is much more widely used and is more
> likely to affect people.  Should we keep it at INT4 output to avoid
> compatibility problems?

I started working on this, and immediately got a pile of regression test
failures arising from:

  create function rtest_viewfunc1(int4) returns int4 as
        'select count(*) from rtest_view2 where a = $1'
        language 'sql';
+ ERROR:  return type mismatch in function: declared to return integer, returns bigint

While it'd be easy enough to change this regression test, this does
highlight my concern about changing the output type of COUNT().

I'm currently thinking that leaving the output type of COUNT() alone
might be the better part of valor.  Possibly we could invent a separate
aggregate COUNT8() that returns int8, for use by them that need it.

Comments anyone?  There wasn't a lot of discussion before...

            regards, tom lane

Re: Re: Use int8 for int4/int2 aggregate accumulators?

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
>> create function rtest_viewfunc1(int4) returns int4 as
>> 'select count(*) from rtest_view2 where a = $1'
>> language 'sql';
>> + ERROR:  return type mismatch in function: declared to return integer, returns bigint

> Maybe instead of testing for strict equality of the types, test for
> compatibility.

We could try to force-convert the result of an SQL function to the right
thing, I suppose, but I'm worried that that might mask programmer errors
more than it helps.

On the other hand, the equivalent forced conversion happens already in
plpgsql functions; it's only SQL-language functions that are so picky.
Maybe your idea is good.  Anyone else have an opinion?
        regards, tom lane


Re: Re: Use int8 for int4/int2 aggregate accumulators?

From
Peter Eisentraut
Date:
Tom Lane writes:

> I started working on this, and immediately got a pile of regression test
> failures arising from:
>
>   create function rtest_viewfunc1(int4) returns int4 as
>         'select count(*) from rtest_view2 where a = $1'
>         language 'sql';
> + ERROR:  return type mismatch in function: declared to return integer, returns bigint

Maybe instead of testing for strict equality of the types, test for
compatibility.

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: Re: Use int8 for int4/int2 aggregate accumulators?

From
Bruce Momjian
Date:
> Peter Eisentraut <peter_e@gmx.net> writes:
> >> create function rtest_viewfunc1(int4) returns int4 as
> >> 'select count(*) from rtest_view2 where a = $1'
> >> language 'sql';
> >> + ERROR:  return type mismatch in function: declared to return integer, returns bigint
> 
> > Maybe instead of testing for strict equality of the types, test for
> > compatibility.
> 
> We could try to force-convert the result of an SQL function to the right
> thing, I suppose, but I'm worried that that might mask programmer errors
> more than it helps.
> 
> On the other hand, the equivalent forced conversion happens already in
> plpgsql functions; it's only SQL-language functions that are so picky.
> Maybe your idea is good.  Anyone else have an opinion?

I don't know.  Doing a force for SQL functions and not for others seems
kind of confusing.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Re: Use int8 for int4/int2 aggregate accumulators?

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> What I had in mind was to allow type conversion between the same
> TypeCategory().  The SQL function analyzer is extraordinarily picky:

I just finished looking at that.  It'd be possible (and probably
reasonable) to accept a binary-compatible datatype rather than requiring
exact equality; this would fix your varchar-vs-text example.  However,
inserting any runtime type conversion would require a significant amount
of code added.  We couldn't do it just by inserting the conversion
function call into the SELECT querytree, because that'd alter the SELECT
semantics, if not actively crash it --- an example of a likely crash is
create function mymax() returns int4 as '    select int8col from tab order by int8col desc limit 1'language sql;

Here, the prepared parsetree is already set up to apply int8 sorting to
the first column of its result.  If we try to insert a cast-to-int4,
we will end up sorting int4 data with int8 operators -- instant
coredump.

So the conversion function application would have to be done at runtime
in the SQL function manager, which is more code than I care to take on
at the moment.

Note also that there is code in there to figure out whether a targetlist
satisfies a tuple return datatype; should we also apply automatic type
conversion to elements of such a list?  It's getting to be more of a
stretch to say that this is being helpful rather than masking programmer
error.

But binary compatibility is easy.  Shall we do that?
        regards, tom lane


Re: Re: Use int8 for int4/int2 aggregate accumulators?

From
Peter Eisentraut
Date:
Tom Lane writes:

> We could try to force-convert the result of an SQL function to the right
> thing, I suppose, but I'm worried that that might mask programmer errors
> more than it helps.

What I had in mind was to allow type conversion between the same
TypeCategory().  The SQL function analyzer is extraordinarily picky:

create function test(int) returns varchar as '   select substring(''PostgreSQL'' from $1);
' language sql;
ERROR:  return type mismatch in function: declared to return character
varying, returns text

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: Re: Use int8 for int4/int2 aggregate accumulators?

From
Bruce Momjian
Date:
> Note also that there is code in there to figure out whether a targetlist
> satisfies a tuple return datatype; should we also apply automatic type
> conversion to elements of such a list?  It's getting to be more of a
> stretch to say that this is being helpful rather than masking programmer
> error.
> 
> But binary compatibility is easy.  Shall we do that?

If we don't do binary compatible already, we certainly should.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026