Re: Proposal for fixing numeric type-resolution issues - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Proposal for fixing numeric type-resolution issues
Date
Msg-id 200010091957.PAA21868@candle.pha.pa.us
Whole thread Raw
In response to Re: Proposal for fixing numeric type-resolution issues  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Proposal for fixing numeric type-resolution issues
List pgsql-hackers
Sorry to be asking again, but any status on this?

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Again, anything to add to the TODO here?
> 
> IIRC, there was some unhappiness with the proposal you quote, so I'm
> not sure we've quite agreed what to do... but clearly something must
> be done.
> 
>             regards, tom lane
> 
> 
> >> We've got a collection of problems that are related to the parser's
> >> inability to make good type-resolution choices for numeric constants.
> >> In some cases you get a hard error; for example "NumericVar + 4.4"
> >> yields
> >> ERROR:  Unable to identify an operator '+' for types 'numeric' and 'float8'
> >> You will have to retype this query using an explicit cast
> >> because "4.4" is initially typed as float8 and the system can't figure
> >> out whether to use numeric or float8 addition.  A more subtle problem
> >> is that a query like "... WHERE Int2Var < 42" is unable to make use of
> >> an index on the int2 column: 42 is resolved as int4, so the operator
> >> is int24lt, which works but is not in the opclass of an int2 index.
> >> 
> >> Here is a proposal for fixing these problems.  I think we could get this
> >> done for 7.1 if people like it.
> >> 
> >> The basic problem is that there's not enough smarts in the type resolver
> >> about the interrelationships of the numeric datatypes.  All it has is
> >> a concept of a most-preferred type within the category of numeric types.
> >> (We are abusing the most-preferred-type mechanism, BTW, because both
> >> FLOAT8 and NUMERIC claim to be the most-preferred type in the numeric
> >> category!  This is in fact why the resolver can't make a choice for
> >> "numeric+float8".)  We need more intelligence than that.
> >> 
> >> I propose that we set up a strictly-ordered hierarchy of numeric
> >> datatypes, running from least preferred to most preferred:
> >> int2, int4, int8, numeric, float4, float8.
> >> Rather than simply considering coercions to the most-preferred type,
> >> the type resolver should use the following rules:
> >> 
> >> 1. No value will be down-converted (eg int4 to int2) except by an
> >> explicit conversion.
> >> 
> >> 2. If there is not an exact matching operator, numeric values will be
> >> up-converted to the highest numeric datatype present among the operator
> >> or function's arguments.  For example, given "int2 + int8" we'd up-
> >> convert the int2 to int8 and apply int8 addition.
> >> 
> >> The final piece of the puzzle is that the type initially assigned to
> >> an undecorated numeric constant should be NUMERIC if it contains a
> >> decimal point or exponent, and otherwise the smallest of int2, int4,
> >> int8, NUMERIC that will represent it.  This is a considerable change
> >> from the current lexer behavior, where you get either int4 or float8.
> >> 
> >> For example, given "NumericVar + 4.4", the constant 4.4 will initially
> >> be assigned type NUMERIC, we will resolve the operator as numeric plus,
> >> and everything's fine.  Given "Float8Var + 4.4", the constant is still
> >> initially numeric, but will be up-converted to float8 so that float8
> >> addition can be used.  The end result is the same as in traditional
> >> Postgres: you get float8 addition.  Given "Int2Var < 42", the constant
> >> is initially typed as int2, since it fits, and we end up selecting
> >> int2lt, thereby allowing use of an int2 index.  (On the other hand,
> >> given "Int2Var < 100000", we'd end up using int4lt, which is correct
> >> to avoid overflow.)
> >> 
> >> A couple of crucial subtleties here:
> >> 
> >> 1. We are assuming that the parser or optimizer will constant-fold
> >> any conversion functions that are introduced.  Thus, in the
> >> "Float8Var + 4.4" case, the 4.4 is represented as a float8 4.4 by the
> >> time execution begins, so there's no performance loss.
> >> 
> >> 2. We cannot lose precision by initially representing a constant as
> >> numeric and later converting it to float.  Nor can we exceed NUMERIC's
> >> range (the default 1000-digit limit is more than the range of IEEE
> >> float8 data).  It would not work as well to start out by representing
> >> a constant as float and then converting it to numeric.
> >> 
> >> Presently, the pg_proc and pg_operator tables contain a pretty fair
> >> collection of cross-datatype numeric operators, such as int24lt,
> >> float48pl, etc.  We could perhaps leave these in, but I believe that
> >> it is better to remove them.  For example, if int42lt is left in place,
> >> then it would capture cases like "Int4Var < 42", whereas we need that
> >> to be translated to int4lt so that an int4 index can be used.  Removing
> >> these operators will eliminate some code bloat and system-catalog bloat
> >> to boot.
> >> 
> >> As far as I can tell, this proposal is almost compatible with the rules
> >> given in SQL92: in particular, SQL92 specifies that an operator having
> >> both "approximate numeric" (float) and "exact numeric" (int or numeric)
> >> inputs should deliver an approximate-numeric result.  I propose
> >> deviating from SQL92 in a single respect: SQL92 specifies that a
> >> constant containing an exponent (eg 1.2E34) is approximate numeric,
> >> which implies that the result of an operator using it is approximate
> >> even if the other operand is exact.  I believe it's better to treat
> >> such a constant as exact (ie, type NUMERIC) and only convert it to
> >> float if the other operand is float.  Without doing that, an assignment
> >> like
> >> UPDATE tab SET NumericVar = 1.234567890123456789012345E34;
> >> will not work as desired because the constant will be prematurely
> >> coerced to float, causing precision loss.
> >> 
> >> Comments?
> >> 
> >> regards, tom lane
> >> 
> 
> 
> > -- 
> >   Bruce Momjian                        |  http://www.op.net/~candle
> >   pgman@candle.pha.pa.us               |  (610) 853-3000
> >   +  If your life is a hard drive,     |  830 Blythe Avenue
> >   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
> 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Syslog and pg_options (for RPMs)
Next
From: Bruce Momjian
Date:
Subject: Re: [GENERAL] rules on INSERT can't UPDATE new instance?