Thread: numeric data type on 6.5
I thought numeric data type on 6.5 allows a very large precision. Am I missing something? -- Tatsuo Ishii test=> create table t1(n numeric(100,0)); CREATE test=> \d t1; Table = t1 +----------------------------------+----------------------------------+-------+ | Field | Type | Length| +----------------------------------+----------------------------------+-------+ | n | numeric | var | +----------------------------------+----------------------------------+-------+ test=> insert into t1 values(100000000000000000000000000000); NOTICE: Integer input '100000000000000000000000000000' is out of range; promoted to float INSERT 149033 1 test=> select * from t1; n - 1 (1 row)
Thus spake Tatsuo Ishii > I thought numeric data type on 6.5 allows a very large precision. Am I > missing something? [...] > test=> insert into t1 values(100000000000000000000000000000); > NOTICE: Integer input '100000000000000000000000000000' is out of range; promoted to float Try this. insert into t1 values('100000000000000000000000000000'::numeric); -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 424 2871 (DoD#0082) (eNTP) | what's for dinner.
>> I thought numeric data type on 6.5 allows a very large precision. Am I >> missing something? >[...] >> test=> insert into t1 values(100000000000000000000000000000); >> NOTICE: Integer input '100000000000000000000000000000' is out of range; promoted to float > >Try this. >insert into t1 values('100000000000000000000000000000'::numeric); Thanks. It definitely works! -- Tatsuo Ishii
> > >> I thought numeric data type on 6.5 allows a very large precision. Am I > >> missing something? > >[...] > >> test=> insert into t1 values(100000000000000000000000000000); > >> NOTICE: Integer input '100000000000000000000000000000' is out of range; promoted to float > > > >Try this. > >insert into t1 values('100000000000000000000000000000'::numeric); > > Thanks. It definitely works! insert into t1 values('100000000000000000000000000000'); That one too. The problem is that the yacc parser already tries to convert it into an integer or float if you omit the quotes. I'll try to implement a NUMERIC fallback for this case for 6.6 and then have all the auto conversion functionality so NUMERIC, INTEGER and FLOAT can be used mixed. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
> The problem is that the yacc parser already tries to convert > it into an integer or float if you omit the quotes. I'll try > to implement a NUMERIC fallback for this case for 6.6 and > then have all the auto conversion functionality so NUMERIC, > INTEGER and FLOAT can be used mixed. I'm looking at this right now. I had coded in a fallback to FLOAT8 for the integer types because at the time that was the only other useful numeric type. However, I'm going to try changing the code to leave a failed INTx token as a string of unspecified type, which would be typed and converted later using the automatic coersion mechanism. istm that this would be a no-brainer for v6.5, since it is just replacing one heuristic with a more general and more correct one. And I had implemented both, so we know who to blame :) Will let y'all know how it goes... - Tom -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > I'm looking at this right now. I had coded in a fallback to FLOAT8 for > the integer types because at the time that was the only other useful > numeric type. However, I'm going to try changing the code to leave a > failed INTx token as a string of unspecified type, which would be > typed and converted later using the automatic coersion mechanism. That would be good as far as it goes, but what about cases with a decimal point in 'em? Converting to float and then to numeric will lose precision. I'm inclined to think you should prevent the parser from converting *any* numeric constant out of string form until it knows the target data type. (IIRC, INT8 has problems similar to NUMERIC's...) regards, tom lane
> > The problem is that the yacc parser already tries to convert > > it into an integer or float if you omit the quotes. > ... I'm going to try changing the code to leave a > failed INTx token as a string of unspecified type, which would be > typed and converted later using the automatic coersion mechanism. OK, this seems to work: postgres=> create table t1 (n numeric(20,0)); CREATE postgres=> insert into t1 values ('10000000000000000000'); INSERT 18552 1 postgres=> insert into t1 values (20000000000000000000); INSERT 18553 1 postgres=> select * from t1; n -------------------- 10000000000000000000 20000000000000000000 postgres=> select n * 5000000000000000000000000000000 from t1; ---------------------------------------------------50000000000000000000000000000000000000000000000000 100000000000000000000000000000000000000000000000000 But, there are some cases which aren't transparent: postgres=> select 10000000000000000000000000*2; ERROR: pg_atoi: error reading "10000000000000000000000000": Numerical result out of range postgres=> select 10000000000000000000000000*2::numeric; -------------------------- 20000000000000000000000000 And, if a long numeric string is entered, it actually stays a string all the way through (never being converted to anything internal): postgres=> select 400000000000000000000000000000000000000000000000000; --------------------------------------------------- 400000000000000000000000000000000000000000000000000 Comments? btw, I've got some float8->numeric conversion troubles on my i686/Linux box: postgres=> insert into t1 values ('30000000000000000000'::float8); INSERT 18541 1 postgres=> select * from t1; n ------------------ 3 Any ideas on this last one? I'm running from this morning's development tree... - Tom -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
> btw, I've got some float8->numeric conversion troubles on my > i686/Linux box: > postgres=> insert into t1 values ('30000000000000000000'::float8); > INSERT 18541 1 > postgres=> select * from t1; > n > ------------------ > 3 OK, so the problem is that large floats are printed using exponential notation, and the float8->numeric conversion routine uses the float8out() routine to convert to a string in preparation for ingestion as a numeric type. I've modified my copy of float8_numeric() to instead print directly into a (large!) buffer using the "%f" specifier, to ensure that the string is always compatible with the numeric reader: postgres=> create table t1 (f float8, n numeric(20,2), d decimal(20,2)); CREATE postgres=> insert into t1 values (300.1); INSERT 18641 1 postgres=> insert into t1 values (300000000000000000); INSERT 18642 1 postgres=> update t1 set n = f, d = f; UPDATE 2 postgres=> select * from t1; f | n| d -----+---------------------+--------------------- 300.1| 300.10| 300.10 3e+17|300000000000000000.00|300000000000000000.00 (2 rows) The float8_numeric() code already had checked for NULL and NaN, so I think this does not lose functionality. What do you think Jan? Should I make the change? Or is there another way?? - Tom -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Thomas Lockhart wrote: > The float8_numeric() code already had checked for NULL and NaN, so I > think this does not lose functionality. What do you think Jan? Should > I make the change? Or is there another way?? Think it's O.K. - commit the changes. The other way would be to enhance the NUMERIC input function to read exponential notation. But I wouldn't do this now because I've planned to someday implement NUMERIC again from scratch. The current implementation has a packed storage format and the arithmetic operations are based on a character format (each digit is stored in one byte). After thinking about it I discovered, that storing the value internally in short int's (16 bit) and base 10000 would have some advantages. 1. No need to pack/unpack storage format for computations. 2. One arithmetic operation in the innermost loops (only add/subtract are really implemented) mucks with 4 digits at a time. The disadvantages are small. Base 10000 to base 10 (decimal) conversion is easily to parse/print. Only rounding functions will be a little tricky. I think the speedup gained from adding/subtracting 4 digits per loop iteration will be worth the efford. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
> > I'm looking at this right now. I had coded in a fallback to FLOAT8 for > > the integer types because at the time that was the only other useful > > numeric type. However, I'm going to try changing the code to leave a > > failed INTx token as a string of unspecified type, which would be > > typed and converted later using the automatic coersion mechanism. > That would be good as far as it goes, but what about cases with a > decimal point in 'em? Converting to float and then to numeric will > lose precision. > I'm inclined to think you should prevent the parser from converting > *any* numeric constant out of string form until it knows the target data > type. > (IIRC, INT8 has problems similar to NUMERIC's...) Right. Here is a patch which tries to do something right for most cases. For the "integer" token (numbers w/o a decimal point) it keeps the token as a string if the conversion to int4 fails. I split the "real" token into "decimal" (w/o exponent) and "real"; at the moment "decimal" is forced to become a float8 if there are fewer than 18 characters in the string, but there may be a more robust strategy to be had. When a numeric token is kept as a string, the parser requires some typing context to handle the string later, otherwise it will complain. But that is probably better than silently swallowing numeric data and possibly mishandling it. Seems to do OK with numeric tokens of unspecified type which will become int8 and numeric in the parser. There may be some edge-effect cases (e.g. decimal data with 17 characters) which aren't quite right. Comments? - Tom -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California*** ../src/backend/parser/scan.l.orig Tue Mar 30 15:08:02 1999 --- ../src/backend/parser/scan.l Tue Apr 27 16:29:21 1999 *************** *** 152,161 **** xmstop - integer [\-]?{digit}+ /* - real [\-]?{digit}+\.{digit}+([Ee][-+]?{digit}+)? - */ real [\-]?(((({digit}*\.{digit}+)|({digit}+\.{digit}*))([Ee][-+]?{digit}+)?)|({digit}+[Ee][-+]?{digit}+)) param \${integer} --- 152,162 ---- xmstop - integer [\-]?{digit}+ + decimal [\-]?(({digit}*\.{digit}+)|({digit}+\.{digit}*)) + real [\-]?((({digit}*\.{digit}+)|({digit}+\.{digit}*)|({digit}+))([Ee][-+]?{digit}+)) /* real [\-]?(((({digit}*\.{digit}+)|({digit}+\.{digit}*))([Ee][-+]?{digit}+)?)|({digit}+[Ee][-+]?{digit}+)) + */ param \${integer} *************** *** 334,348 **** --- 335,369 ---- if (*endptr != '\0' || errno == ERANGE) { errno = 0; + #if 0 yylval.dval = strtod(((char *)yytext),&endptr); if (*endptr != '\0' || errno == ERANGE) elog(ERROR,"Bad integer input '%s'",yytext); CheckFloat8Val(yylval.dval); elog(NOTICE,"Integer input '%s' is out of range; promoted to float", yytext); return FCONST; + #endif + yylval.str = pstrdup((char*)yytext); + return SCONST; } return ICONST; } + {decimal}/{space}*-{number} { + char* endptr; + + BEGIN(xm); + if (strlen((char *)yytext) <= 17) + { + errno = 0; + yylval.dval = strtod(((char *)yytext),&endptr); + if (*endptr != '\0' || errno == ERANGE) + elog(ERROR,"Bad float8 input '%s'",yytext); + CheckFloat8Val(yylval.dval); + return FCONST; + } + yylval.str = pstrdup((char*)yytext); + return SCONST; + } {real}/{space}*-{number} { char* endptr; *************** *** 362,375 **** --- 383,415 ---- if (*endptr != '\0' || errno == ERANGE) { errno = 0; + #if 0 yylval.dval = strtod(((char *)yytext),&endptr); if (*endptr != '\0' || errno == ERANGE) elog(ERROR,"Bad integer input '%s'",yytext); CheckFloat8Val(yylval.dval); elog(NOTICE,"Integer input '%s' is out of range; promoted to float", yytext); return FCONST; + #endif + yylval.str = pstrdup((char*)yytext); + return SCONST; } return ICONST; + } + {decimal} { + char* endptr; + + if (strlen((char *)yytext) <= 17) + { + errno = 0; + yylval.dval = strtod((char *)yytext,&endptr); + if (*endptr != '\0' || errno == ERANGE) + elog(ERROR,"Bad float input '%s'",yytext); + CheckFloat8Val(yylval.dval); + return FCONST; + } + yylval.str = pstrdup((char*)yytext); + return SCONST; } {real} { char* endptr;
> > > I'm looking at this right now. I had coded in a fallback to FLOAT8 for > > > the integer types because at the time that was the only other useful > > > numeric type. However, I'm going to try changing the code to leave a > > > failed INTx token as a string of unspecified type, which would be > > > typed and converted later using the automatic coersion mechanism. > > That would be good as far as it goes, but what about cases with a > > decimal point in 'em? Converting to float and then to numeric will > > lose precision. > > I'm inclined to think you should prevent the parser from converting > > *any* numeric constant out of string form until it knows the target data > > type. > > (IIRC, INT8 has problems similar to NUMERIC's...) > > Right. Here is a patch which tries to do something right for most > cases. For the "integer" token (numbers w/o a decimal point) it keeps > the token as a string if the conversion to int4 fails. I split the > "real" token into "decimal" (w/o exponent) and "real"; at the moment > "decimal" is forced to become a float8 if there are fewer than 18 > characters in the string, but there may be a more robust strategy to > be had. This seems like a perfect approach. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > Seems to do OK with numeric tokens of unspecified type which will > become int8 and numeric in the parser. There may be some edge-effect > cases (e.g. decimal data with 17 characters) which aren't quite right. > Comments? I'd suggest backing off one more place on the length of string you will try to convert to a float8. Since the test is strlen() <= 17, you actually can have at most 16 digits (there must be a decimal point in there too). But IEEE float is only good to 16-and-change digits; I'm not sure I'd want to assume that the 16th digit will always be reproduced exactly. 15 digits would be safer. It could still break if the C library's float<=>string conversion routines are sloppy :-(. I suppose you're interested in preserving the info that "this constant looks numeric-ish" to assist in type resolution heuristics? Otherwise the value could be left in string form till later. Is there any value in marking the constant as a numeric token, yet leaving its specific value as a string until after type resolution is done? regards, tom lane
> > Seems to do OK with numeric tokens of unspecified type which will > > become int8 and numeric in the parser. There may be some edge-effect > > cases (e.g. decimal data with 17 characters) which aren't quite right. > > Comments? > I'd suggest backing off one more place on the length of string you will > try to convert to a float8. Since the test is strlen() <= 17, you > actually can have at most 16 digits (there must be a decimal point in > there too). But IEEE float is only good to 16-and-change digits; I'm > not sure I'd want to assume that the 16th digit will always be > reproduced exactly. 15 digits would be safer. Yeah. I'd chosen 17 to get sign+decimal+15digits... > It could still break if the C library's float<=>string conversion > routines are sloppy :-(. I suppose you're interested in preserving > the info that "this constant looks numeric-ish" to assist in type > resolution heuristics? Otherwise the value could be left in string > form till later. > Is there any value in marking the constant as a numeric token, yet > leaving its specific value as a string until after type resolution > is done? Possibly. I didn't think too hard about it, but had assumed that doing much more than I did would propagate back into the parser, which I didn't want to tackle this close to release. - Tom -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California