Thread: Integer parsing bug?

Integer parsing bug?

From
Steve Atkins
Date:
Section 8.1 of the manual gives the range of an integer
as -2147483648 to +2147483647.


template1=# select '-2147483648'::int;
    int4
-------------
 -2147483648
(1 row)

template1=# select -2147483648::int;
ERROR:  integer out of range

Oops.

template1=# select version();
                           version
-------------------------------------------------------------
 PostgreSQL 7.4.1 on i686-pc-linux-gnu, compiled by GCC 2.96
(1 row)

Completely vanilla build - no options other than --prefix to
configure. Clean installation, this is immediately after an initdb.

I see the same bug on Solaris, built with Forte C in 64 bit mode.

Cheers,
  Steve

Re: Integer parsing bug?

From
Bruce Momjian
Date:
Steve Atkins wrote:
> Section 8.1 of the manual gives the range of an integer
> as -2147483648 to +2147483647.
>
>
> template1=# select '-2147483648'::int;
>     int4
> -------------
>  -2147483648
> (1 row)
>
> template1=# select -2147483648::int;
> ERROR:  integer out of range
>
> Oops.
>
> template1=# select version();
>                            version
> -------------------------------------------------------------
>  PostgreSQL 7.4.1 on i686-pc-linux-gnu, compiled by GCC 2.96
> (1 row)
>
> Completely vanilla build - no options other than --prefix to
> configure. Clean installation, this is immediately after an initdb.
>
> I see the same bug on Solaris, built with Forte C in 64 bit mode.

Yep, it definately looks weird:

    test=> select '-2147483648'::int;
        int4
    -------------
     -2147483648
    (1 row)

    test=> select -2147483648::int;
    ERROR:  integer out of range
    test=> select -2147483647::int;
      ?column?
    -------------
     -2147483647
    (1 row)

    test=> select '-2147483649'::int;
    ERROR:  value "-2147483649" is out of range for type integer

The non-quoting works only for *47, and the quoting works for *48, but
both fail for *49.

I looked at libc's strtol(), and that works fine, as does our existing
parser checks.  The error is coming from int84, a comparison function
called from the executor.  Here is a test program:

    #include <stdio.h>
    #include <stdlib.h>

    int main(int argc, char *argv[])
    {
    long long l = -2147483648;
    int i = l;

        if (i != l)
            printf("not equal\n");
        else
            printf("equal\n");
        return 0;
    }

A compile generates the following warning:

    tst1.c:6: warning: decimal constant is so large that it is unsigned

and reports "not equal".

I see in the freebsd machine/limits.h file:

 * According to ANSI (section 2.2.4.2), the values below must be usable by
 * #if preprocessing directives.  Additionally, the expression must have the
 * same type as would an expression that is an object of the corresponding
 * type converted according to the integral promotions.  The subtraction for
 * INT_MIN, etc., is so the value is not unsigned; e.g., 0x80000000 is an
 * unsigned int for 32-bit two's complement ANSI compilers (section 3.1.3.2).
 * These numbers are for the default configuration of gcc.  They work for
 * some other compilers as well, but this should not be depended on.

 #define INT_MAX         0x7fffffff      /* max value for an int */
 #define INT_MIN         (-0x7fffffff - 1)       /* min value for an int */

Basically, what is happening is that the special value -INT_MAX-1 is
being converted to an int value, and the compiler is casting it to an
unsigned.  Seems this is a known C issue and I can't see a good fix for
it except perhaps check for INT_MIN int he int84 function, but I ran
some tests and that didn't work either.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Integer parsing bug?

From
Steve Atkins
Date:
On Wed, Mar 03, 2004 at 12:31:47PM -0500, Bruce Momjian wrote:

> Yep, it definately looks weird:
>
>     test=> select '-2147483648'::int;
>         int4
>     -------------
>      -2147483648
>     (1 row)
>
>     test=> select -2147483648::int;
>     ERROR:  integer out of range
>     test=> select -2147483647::int;
>       ?column?
>     -------------
>      -2147483647
>     (1 row)
>
>     test=> select '-2147483649'::int;
>     ERROR:  value "-2147483649" is out of range for type integer
>
> The non-quoting works only for *47, and the quoting works for *48, but
> both fail for *49.
>
> I looked at libc's strtol(), and that works fine, as does our existing
> parser checks.  The error is coming from int84, a comparison function
> called from the executor.  Here is a test program:

I traced through that far and managed to convince myself that the
problem was that it was considering a -...48 to be an int8, rather
than an int4, so was hitting int84() when it shouldn't have been - and
the input values for int84() looked very, very broken.

Specifically, a breakpoint on int84() fires on -..48 and -..49, but
not on -..47, suggesting that the problem is somewhere in the parsing
before it reaches int84().

I'm happy to take a look at it, but got very lost in the maze of twisty
parse routines, all alike, when I tried to track back further. Is there
any overview documentation on that end of the code?

> I see in the freebsd machine/limits.h file:
>
>  * According to ANSI (section 2.2.4.2), the values below must be usable by
>  * #if preprocessing directives.  Additionally, the expression must have the
>  * same type as would an expression that is an object of the corresponding
>  * type converted according to the integral promotions.  The subtraction for
>  * INT_MIN, etc., is so the value is not unsigned; e.g., 0x80000000 is an
>  * unsigned int for 32-bit two's complement ANSI compilers (section 3.1.3.2).
>  * These numbers are for the default configuration of gcc.  They work for
>  * some other compilers as well, but this should not be depended on.
>
>  #define INT_MAX         0x7fffffff      /* max value for an int */
>  #define INT_MIN         (-0x7fffffff - 1)       /* min value for an int */
>
> Basically, what is happening is that the special value -INT_MAX-1 is
> being converted to an int value, and the compiler is casting it to an
> unsigned.  Seems this is a known C issue and I can't see a good fix for
> it except perhaps check for INT_MIN int he int84 function, but I ran
> some tests and that didn't work either.

I don't read it that way. INT_MIN is correctly read as a signed int,
but it can't be defined as -0x8000000 as that would be parsed as
-(0x80000000) and the constant 0x80000000 is unsigned.

Cheers,
  Steve

Re: Integer parsing bug?

From
Tom Lane
Date:
Steve Atkins <steve@blighty.com> writes:
>> test=> select -2147483648::int;
>> ERROR:  integer out of range

There is no bug here.  You are mistakenly assuming that the above
represents
    select (-2147483648)::int;
But actually the :: operator binds more tightly than unary minus,
so Postgres reads it as
    select -(2147483648::int);
and quite rightly fails to convert the int8 literal to int.

If you write it with the correct parenthesization it works:

regression=# select -2147483648::int;
ERROR:  integer out of range
regression=# select (-2147483648)::int;
    int4
-------------
 -2147483648
(1 row)


            regards, tom lane

Re: Integer parsing bug?

From
Steve Atkins
Date:
On Wed, Mar 03, 2004 at 06:27:07PM -0500, Tom Lane wrote:
> Steve Atkins <steve@blighty.com> writes:
> >> test=> select -2147483648::int;
> >> ERROR:  integer out of range
>
> There is no bug here.  You are mistakenly assuming that the above
> represents
>     select (-2147483648)::int;
> But actually the :: operator binds more tightly than unary minus,
> so Postgres reads it as
>     select -(2147483648::int);
> and quite rightly fails to convert the int8 literal to int.
>
> If you write it with the correct parenthesization it works:
>
> regression=# select -2147483648::int;
> ERROR:  integer out of range
> regression=# select (-2147483648)::int;

OK... That makes sense if the parser has no support for negative
constants, but it doesn't seem like intuitive behaviour.


BTW, the original issue that led to this was:

db=>CREATE function t(integer) RETURNS integer AS '
BEGIN
  return 0;
END;
' LANGUAGE 'plpgsql';

db=> select t(-2147483648);
ERROR:  function t(bigint) does not exist

Which again makes sense considering the way the parser works, but
still seems to violate the principle of least surprise.

Cheers,
  Steve