Thread: Use int8 for int4/int2 aggregate accumulators?
This was discussed on pgsql-general a little bit on 21-July, but the discussion died off without reaching a conclusion. I'd like to put out a concrete proposal and see if anyone has objections. 1. SUM() and AVG() for int2 and int4 inputs should accumulate the running sum as an INT8, not a NUMERIC, for speed reasons. INT8 seems large enough to avoid overflow in practical situations. The final output datatype of AVG() will still be NUMERIC, but the final output of SUM() will become INT8 for these two input types. 2. STDDEV() and VARIANCE() for int2 and int4 inputs will continue to use NUMERIC for accuracy and overflow reasons (accumulating sum(x^2) is much more prone to overflow than sum(x)). So will all these aggregates for INT8. 3. As a separate proposal, we could change COUNT()'s running counter and output datatype from INT4 to INT8. This would make it a little slower but effectively overflow-proof. All of these changes are within the latitude that the SQL92 spec affords (it just says that the output values are exact numeric with implementation-defined precision and scale). Issues to consider are: * On machines with no 8-byte-int C datatype, the accumulator would effectively be int4. This would make the behavior no worse than currently for COUNT(), and no worse than it was in 7.0 for SUM() and AVG(), so that doesn't bother me a whole lot. But it would be a new source of cross-platform behavioral differences. * Changing the output datatype of these operations --- especially COUNT --- might affect or even break applications. We got a few complaints, not many, about changing SUM() and AVG() from integer to NUMERIC output in 7.1. Changing SUM() to INT8 isn't likely to hurt anyone who survived that transition. But COUNT() is much more widely used and is more likely to affect people. Should we keep it at INT4 output to avoid compatibility problems? regards, tom lane
I wrote: > 3. As a separate proposal, we could change COUNT()'s running counter > and output datatype from INT4 to INT8. This would make it a little > slower but effectively overflow-proof. > * Changing the output datatype of these operations --- especially COUNT > --- might affect or even break applications. We got a few complaints, > not many, about changing SUM() and AVG() from integer to NUMERIC output > in 7.1. Changing SUM() to INT8 isn't likely to hurt anyone who survived > that transition. But COUNT() is much more widely used and is more > likely to affect people. Should we keep it at INT4 output to avoid > compatibility problems? I started working on this, and immediately got a pile of regression test failures arising from: create function rtest_viewfunc1(int4) returns int4 as 'select count(*) from rtest_view2 where a = $1' language 'sql'; + ERROR: return type mismatch in function: declared to return integer, returns bigint While it'd be easy enough to change this regression test, this does highlight my concern about changing the output type of COUNT(). I'm currently thinking that leaving the output type of COUNT() alone might be the better part of valor. Possibly we could invent a separate aggregate COUNT8() that returns int8, for use by them that need it. Comments anyone? There wasn't a lot of discussion before... regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes: >> create function rtest_viewfunc1(int4) returns int4 as >> 'select count(*) from rtest_view2 where a = $1' >> language 'sql'; >> + ERROR: return type mismatch in function: declared to return integer, returns bigint > Maybe instead of testing for strict equality of the types, test for > compatibility. We could try to force-convert the result of an SQL function to the right thing, I suppose, but I'm worried that that might mask programmer errors more than it helps. On the other hand, the equivalent forced conversion happens already in plpgsql functions; it's only SQL-language functions that are so picky. Maybe your idea is good. Anyone else have an opinion? regards, tom lane
Tom Lane writes: > I started working on this, and immediately got a pile of regression test > failures arising from: > > create function rtest_viewfunc1(int4) returns int4 as > 'select count(*) from rtest_view2 where a = $1' > language 'sql'; > + ERROR: return type mismatch in function: declared to return integer, returns bigint Maybe instead of testing for strict equality of the types, test for compatibility. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
> Peter Eisentraut <peter_e@gmx.net> writes: > >> create function rtest_viewfunc1(int4) returns int4 as > >> 'select count(*) from rtest_view2 where a = $1' > >> language 'sql'; > >> + ERROR: return type mismatch in function: declared to return integer, returns bigint > > > Maybe instead of testing for strict equality of the types, test for > > compatibility. > > We could try to force-convert the result of an SQL function to the right > thing, I suppose, but I'm worried that that might mask programmer errors > more than it helps. > > On the other hand, the equivalent forced conversion happens already in > plpgsql functions; it's only SQL-language functions that are so picky. > Maybe your idea is good. Anyone else have an opinion? I don't know. Doing a force for SQL functions and not for others seems kind of confusing. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Peter Eisentraut <peter_e@gmx.net> writes: > What I had in mind was to allow type conversion between the same > TypeCategory(). The SQL function analyzer is extraordinarily picky: I just finished looking at that. It'd be possible (and probably reasonable) to accept a binary-compatible datatype rather than requiring exact equality; this would fix your varchar-vs-text example. However, inserting any runtime type conversion would require a significant amount of code added. We couldn't do it just by inserting the conversion function call into the SELECT querytree, because that'd alter the SELECT semantics, if not actively crash it --- an example of a likely crash is create function mymax() returns int4 as ' select int8col from tab order by int8col desc limit 1'language sql; Here, the prepared parsetree is already set up to apply int8 sorting to the first column of its result. If we try to insert a cast-to-int4, we will end up sorting int4 data with int8 operators -- instant coredump. So the conversion function application would have to be done at runtime in the SQL function manager, which is more code than I care to take on at the moment. Note also that there is code in there to figure out whether a targetlist satisfies a tuple return datatype; should we also apply automatic type conversion to elements of such a list? It's getting to be more of a stretch to say that this is being helpful rather than masking programmer error. But binary compatibility is easy. Shall we do that? regards, tom lane
Tom Lane writes: > We could try to force-convert the result of an SQL function to the right > thing, I suppose, but I'm worried that that might mask programmer errors > more than it helps. What I had in mind was to allow type conversion between the same TypeCategory(). The SQL function analyzer is extraordinarily picky: create function test(int) returns varchar as ' select substring(''PostgreSQL'' from $1); ' language sql; ERROR: return type mismatch in function: declared to return character varying, returns text -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
> Note also that there is code in there to figure out whether a targetlist > satisfies a tuple return datatype; should we also apply automatic type > conversion to elements of such a list? It's getting to be more of a > stretch to say that this is being helpful rather than masking programmer > error. > > But binary compatibility is easy. Shall we do that? If we don't do binary compatible already, we certainly should. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026