Thread: Re: Proposal for fixing numeric type-resolution issues

Re: Proposal for fixing numeric type-resolution issues

From
Bruce Momjian
Date:
Again, anything to add to the TODO here?

> We've got a collection of problems that are related to the parser's
> inability to make good type-resolution choices for numeric constants.
> In some cases you get a hard error; for example "NumericVar + 4.4"
> yields
> ERROR:  Unable to identify an operator '+' for types 'numeric' and 'float8'
>         You will have to retype this query using an explicit cast
> because "4.4" is initially typed as float8 and the system can't figure
> out whether to use numeric or float8 addition.  A more subtle problem
> is that a query like "... WHERE Int2Var < 42" is unable to make use of
> an index on the int2 column: 42 is resolved as int4, so the operator
> is int24lt, which works but is not in the opclass of an int2 index.
> 
> Here is a proposal for fixing these problems.  I think we could get this
> done for 7.1 if people like it.
> 
> The basic problem is that there's not enough smarts in the type resolver
> about the interrelationships of the numeric datatypes.  All it has is
> a concept of a most-preferred type within the category of numeric types.
> (We are abusing the most-preferred-type mechanism, BTW, because both
> FLOAT8 and NUMERIC claim to be the most-preferred type in the numeric
> category!  This is in fact why the resolver can't make a choice for
> "numeric+float8".)  We need more intelligence than that.
> 
> I propose that we set up a strictly-ordered hierarchy of numeric
> datatypes, running from least preferred to most preferred:
>     int2, int4, int8, numeric, float4, float8.
> Rather than simply considering coercions to the most-preferred type,
> the type resolver should use the following rules:
> 
> 1. No value will be down-converted (eg int4 to int2) except by an
> explicit conversion.
> 
> 2. If there is not an exact matching operator, numeric values will be
> up-converted to the highest numeric datatype present among the operator
> or function's arguments.  For example, given "int2 + int8" we'd up-
> convert the int2 to int8 and apply int8 addition.
> 
> The final piece of the puzzle is that the type initially assigned to
> an undecorated numeric constant should be NUMERIC if it contains a
> decimal point or exponent, and otherwise the smallest of int2, int4,
> int8, NUMERIC that will represent it.  This is a considerable change
> from the current lexer behavior, where you get either int4 or float8.
> 
> For example, given "NumericVar + 4.4", the constant 4.4 will initially
> be assigned type NUMERIC, we will resolve the operator as numeric plus,
> and everything's fine.  Given "Float8Var + 4.4", the constant is still
> initially numeric, but will be up-converted to float8 so that float8
> addition can be used.  The end result is the same as in traditional
> Postgres: you get float8 addition.  Given "Int2Var < 42", the constant
> is initially typed as int2, since it fits, and we end up selecting
> int2lt, thereby allowing use of an int2 index.  (On the other hand,
> given "Int2Var < 100000", we'd end up using int4lt, which is correct
> to avoid overflow.)
> 
> A couple of crucial subtleties here:
> 
> 1. We are assuming that the parser or optimizer will constant-fold
> any conversion functions that are introduced.  Thus, in the
> "Float8Var + 4.4" case, the 4.4 is represented as a float8 4.4 by the
> time execution begins, so there's no performance loss.
> 
> 2. We cannot lose precision by initially representing a constant as
> numeric and later converting it to float.  Nor can we exceed NUMERIC's
> range (the default 1000-digit limit is more than the range of IEEE
> float8 data).  It would not work as well to start out by representing
> a constant as float and then converting it to numeric.
> 
> Presently, the pg_proc and pg_operator tables contain a pretty fair
> collection of cross-datatype numeric operators, such as int24lt,
> float48pl, etc.  We could perhaps leave these in, but I believe that
> it is better to remove them.  For example, if int42lt is left in place,
> then it would capture cases like "Int4Var < 42", whereas we need that
> to be translated to int4lt so that an int4 index can be used.  Removing
> these operators will eliminate some code bloat and system-catalog bloat
> to boot.
> 
> As far as I can tell, this proposal is almost compatible with the rules
> given in SQL92: in particular, SQL92 specifies that an operator having
> both "approximate numeric" (float) and "exact numeric" (int or numeric)
> inputs should deliver an approximate-numeric result.  I propose
> deviating from SQL92 in a single respect: SQL92 specifies that a
> constant containing an exponent (eg 1.2E34) is approximate numeric,
> which implies that the result of an operator using it is approximate
> even if the other operand is exact.  I believe it's better to treat
> such a constant as exact (ie, type NUMERIC) and only convert it to
> float if the other operand is float.  Without doing that, an assignment
> like
>     UPDATE tab SET NumericVar = 1.234567890123456789012345E34;
> will not work as desired because the constant will be prematurely
> coerced to float, causing precision loss.
> 
> Comments?
> 
>             regards, tom lane
> 


--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Proposal for fixing numeric type-resolution issues

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Again, anything to add to the TODO here?

IIRC, there was some unhappiness with the proposal you quote, so I'm
not sure we've quite agreed what to do... but clearly something must
be done.
        regards, tom lane


>> We've got a collection of problems that are related to the parser's
>> inability to make good type-resolution choices for numeric constants.
>> In some cases you get a hard error; for example "NumericVar + 4.4"
>> yields
>> ERROR:  Unable to identify an operator '+' for types 'numeric' and 'float8'
>> You will have to retype this query using an explicit cast
>> because "4.4" is initially typed as float8 and the system can't figure
>> out whether to use numeric or float8 addition.  A more subtle problem
>> is that a query like "... WHERE Int2Var < 42" is unable to make use of
>> an index on the int2 column: 42 is resolved as int4, so the operator
>> is int24lt, which works but is not in the opclass of an int2 index.
>> 
>> Here is a proposal for fixing these problems.  I think we could get this
>> done for 7.1 if people like it.
>> 
>> The basic problem is that there's not enough smarts in the type resolver
>> about the interrelationships of the numeric datatypes.  All it has is
>> a concept of a most-preferred type within the category of numeric types.
>> (We are abusing the most-preferred-type mechanism, BTW, because both
>> FLOAT8 and NUMERIC claim to be the most-preferred type in the numeric
>> category!  This is in fact why the resolver can't make a choice for
>> "numeric+float8".)  We need more intelligence than that.
>> 
>> I propose that we set up a strictly-ordered hierarchy of numeric
>> datatypes, running from least preferred to most preferred:
>> int2, int4, int8, numeric, float4, float8.
>> Rather than simply considering coercions to the most-preferred type,
>> the type resolver should use the following rules:
>> 
>> 1. No value will be down-converted (eg int4 to int2) except by an
>> explicit conversion.
>> 
>> 2. If there is not an exact matching operator, numeric values will be
>> up-converted to the highest numeric datatype present among the operator
>> or function's arguments.  For example, given "int2 + int8" we'd up-
>> convert the int2 to int8 and apply int8 addition.
>> 
>> The final piece of the puzzle is that the type initially assigned to
>> an undecorated numeric constant should be NUMERIC if it contains a
>> decimal point or exponent, and otherwise the smallest of int2, int4,
>> int8, NUMERIC that will represent it.  This is a considerable change
>> from the current lexer behavior, where you get either int4 or float8.
>> 
>> For example, given "NumericVar + 4.4", the constant 4.4 will initially
>> be assigned type NUMERIC, we will resolve the operator as numeric plus,
>> and everything's fine.  Given "Float8Var + 4.4", the constant is still
>> initially numeric, but will be up-converted to float8 so that float8
>> addition can be used.  The end result is the same as in traditional
>> Postgres: you get float8 addition.  Given "Int2Var < 42", the constant
>> is initially typed as int2, since it fits, and we end up selecting
>> int2lt, thereby allowing use of an int2 index.  (On the other hand,
>> given "Int2Var < 100000", we'd end up using int4lt, which is correct
>> to avoid overflow.)
>> 
>> A couple of crucial subtleties here:
>> 
>> 1. We are assuming that the parser or optimizer will constant-fold
>> any conversion functions that are introduced.  Thus, in the
>> "Float8Var + 4.4" case, the 4.4 is represented as a float8 4.4 by the
>> time execution begins, so there's no performance loss.
>> 
>> 2. We cannot lose precision by initially representing a constant as
>> numeric and later converting it to float.  Nor can we exceed NUMERIC's
>> range (the default 1000-digit limit is more than the range of IEEE
>> float8 data).  It would not work as well to start out by representing
>> a constant as float and then converting it to numeric.
>> 
>> Presently, the pg_proc and pg_operator tables contain a pretty fair
>> collection of cross-datatype numeric operators, such as int24lt,
>> float48pl, etc.  We could perhaps leave these in, but I believe that
>> it is better to remove them.  For example, if int42lt is left in place,
>> then it would capture cases like "Int4Var < 42", whereas we need that
>> to be translated to int4lt so that an int4 index can be used.  Removing
>> these operators will eliminate some code bloat and system-catalog bloat
>> to boot.
>> 
>> As far as I can tell, this proposal is almost compatible with the rules
>> given in SQL92: in particular, SQL92 specifies that an operator having
>> both "approximate numeric" (float) and "exact numeric" (int or numeric)
>> inputs should deliver an approximate-numeric result.  I propose
>> deviating from SQL92 in a single respect: SQL92 specifies that a
>> constant containing an exponent (eg 1.2E34) is approximate numeric,
>> which implies that the result of an operator using it is approximate
>> even if the other operand is exact.  I believe it's better to treat
>> such a constant as exact (ie, type NUMERIC) and only convert it to
>> float if the other operand is float.  Without doing that, an assignment
>> like
>> UPDATE tab SET NumericVar = 1.234567890123456789012345E34;
>> will not work as desired because the constant will be prematurely
>> coerced to float, causing precision loss.
>> 
>> Comments?
>> 
>> regards, tom lane
>> 


> -- 
>   Bruce Momjian                        |  http://www.op.net/~candle
>   pgman@candle.pha.pa.us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


Re: Proposal for fixing numeric type-resolution issues

From
Peter Eisentraut
Date:
On Tue, 13 Jun 2000, Tom Lane wrote:

> IIRC, there was some unhappiness with the proposal you quote, so I'm
> not sure we've quite agreed what to do... but clearly something must
> be done.

You might want to look at SQL99, too. It contains a type system and has
something to say on these issues. At least I would hate to see something
flat-out incompatible drawn up.


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: Proposal for fixing numeric type-resolution issues

From
Bruce Momjian
Date:
Sorry to be asking again, but any status on this?

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Again, anything to add to the TODO here?
> 
> IIRC, there was some unhappiness with the proposal you quote, so I'm
> not sure we've quite agreed what to do... but clearly something must
> be done.
> 
>             regards, tom lane
> 
> 
> >> We've got a collection of problems that are related to the parser's
> >> inability to make good type-resolution choices for numeric constants.
> >> In some cases you get a hard error; for example "NumericVar + 4.4"
> >> yields
> >> ERROR:  Unable to identify an operator '+' for types 'numeric' and 'float8'
> >> You will have to retype this query using an explicit cast
> >> because "4.4" is initially typed as float8 and the system can't figure
> >> out whether to use numeric or float8 addition.  A more subtle problem
> >> is that a query like "... WHERE Int2Var < 42" is unable to make use of
> >> an index on the int2 column: 42 is resolved as int4, so the operator
> >> is int24lt, which works but is not in the opclass of an int2 index.
> >> 
> >> Here is a proposal for fixing these problems.  I think we could get this
> >> done for 7.1 if people like it.
> >> 
> >> The basic problem is that there's not enough smarts in the type resolver
> >> about the interrelationships of the numeric datatypes.  All it has is
> >> a concept of a most-preferred type within the category of numeric types.
> >> (We are abusing the most-preferred-type mechanism, BTW, because both
> >> FLOAT8 and NUMERIC claim to be the most-preferred type in the numeric
> >> category!  This is in fact why the resolver can't make a choice for
> >> "numeric+float8".)  We need more intelligence than that.
> >> 
> >> I propose that we set up a strictly-ordered hierarchy of numeric
> >> datatypes, running from least preferred to most preferred:
> >> int2, int4, int8, numeric, float4, float8.
> >> Rather than simply considering coercions to the most-preferred type,
> >> the type resolver should use the following rules:
> >> 
> >> 1. No value will be down-converted (eg int4 to int2) except by an
> >> explicit conversion.
> >> 
> >> 2. If there is not an exact matching operator, numeric values will be
> >> up-converted to the highest numeric datatype present among the operator
> >> or function's arguments.  For example, given "int2 + int8" we'd up-
> >> convert the int2 to int8 and apply int8 addition.
> >> 
> >> The final piece of the puzzle is that the type initially assigned to
> >> an undecorated numeric constant should be NUMERIC if it contains a
> >> decimal point or exponent, and otherwise the smallest of int2, int4,
> >> int8, NUMERIC that will represent it.  This is a considerable change
> >> from the current lexer behavior, where you get either int4 or float8.
> >> 
> >> For example, given "NumericVar + 4.4", the constant 4.4 will initially
> >> be assigned type NUMERIC, we will resolve the operator as numeric plus,
> >> and everything's fine.  Given "Float8Var + 4.4", the constant is still
> >> initially numeric, but will be up-converted to float8 so that float8
> >> addition can be used.  The end result is the same as in traditional
> >> Postgres: you get float8 addition.  Given "Int2Var < 42", the constant
> >> is initially typed as int2, since it fits, and we end up selecting
> >> int2lt, thereby allowing use of an int2 index.  (On the other hand,
> >> given "Int2Var < 100000", we'd end up using int4lt, which is correct
> >> to avoid overflow.)
> >> 
> >> A couple of crucial subtleties here:
> >> 
> >> 1. We are assuming that the parser or optimizer will constant-fold
> >> any conversion functions that are introduced.  Thus, in the
> >> "Float8Var + 4.4" case, the 4.4 is represented as a float8 4.4 by the
> >> time execution begins, so there's no performance loss.
> >> 
> >> 2. We cannot lose precision by initially representing a constant as
> >> numeric and later converting it to float.  Nor can we exceed NUMERIC's
> >> range (the default 1000-digit limit is more than the range of IEEE
> >> float8 data).  It would not work as well to start out by representing
> >> a constant as float and then converting it to numeric.
> >> 
> >> Presently, the pg_proc and pg_operator tables contain a pretty fair
> >> collection of cross-datatype numeric operators, such as int24lt,
> >> float48pl, etc.  We could perhaps leave these in, but I believe that
> >> it is better to remove them.  For example, if int42lt is left in place,
> >> then it would capture cases like "Int4Var < 42", whereas we need that
> >> to be translated to int4lt so that an int4 index can be used.  Removing
> >> these operators will eliminate some code bloat and system-catalog bloat
> >> to boot.
> >> 
> >> As far as I can tell, this proposal is almost compatible with the rules
> >> given in SQL92: in particular, SQL92 specifies that an operator having
> >> both "approximate numeric" (float) and "exact numeric" (int or numeric)
> >> inputs should deliver an approximate-numeric result.  I propose
> >> deviating from SQL92 in a single respect: SQL92 specifies that a
> >> constant containing an exponent (eg 1.2E34) is approximate numeric,
> >> which implies that the result of an operator using it is approximate
> >> even if the other operand is exact.  I believe it's better to treat
> >> such a constant as exact (ie, type NUMERIC) and only convert it to
> >> float if the other operand is float.  Without doing that, an assignment
> >> like
> >> UPDATE tab SET NumericVar = 1.234567890123456789012345E34;
> >> will not work as desired because the constant will be prematurely
> >> coerced to float, causing precision loss.
> >> 
> >> Comments?
> >> 
> >> regards, tom lane
> >> 
> 
> 
> > -- 
> >   Bruce Momjian                        |  http://www.op.net/~candle
> >   pgman@candle.pha.pa.us               |  (610) 853-3000
> >   +  If your life is a hard drive,     |  830 Blythe Avenue
> >   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
> 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Proposal for fixing numeric type-resolution issues

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Sorry to be asking again, but any status on this?

At this point I think I can safely say nothing's going to be done for
7.1.  It is still an issue that needs to be addressed though.
        regards, tom lane