Thread: BUG #15519: Casting float4 into int4 gets the wrong sign instead of"integer out of range" error
BUG #15519: Casting float4 into int4 gets the wrong sign instead of"integer out of range" error
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 15519 Logged by: Victor Petrovykh Email address: victor@magic.io PostgreSQL version: 10.5 Operating system: Gentoo Linux 4.14.20 Description: Offending examples: SELECT ((2147483647::float4) - 1.0::float4)::int4; SELECT ((2147483590::float4) - 1.0::float4)::int4; SELECT ((2147483647::float4) + 1.0::float4)::int4; They all produce the same result: -2147483648 I understand that a float4 cannot represent large integers with the same precision as int4, that's OK. What surprised me is that instead of getting an "overflow error" or "integer out of range" I simply got a negative result for a value that is actually close to maximum int4. To contrast this, the query: SELECT ((2147483647::float4) + 200.0::float4)::int4; The above produces the expected "ERROR: integer out of range"
Re: BUG #15519: Casting float4 into int4 gets the wrong sign instead of "integer out of range" error
From
Andrew Gierth
Date:
>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes: PG> The following bug has been logged on the website: PG> Bug reference: 15519 PG> Logged by: Victor Petrovykh PG> Email address: victor@magic.io PG> PostgreSQL version: 10.5 PG> Operating system: Gentoo Linux 4.14.20 PG> Description: PG> Offending examples: PG> SELECT ((2147483647::float4) - 1.0::float4)::int4; PG> SELECT ((2147483590::float4) - 1.0::float4)::int4; PG> SELECT ((2147483647::float4) + 1.0::float4)::int4; PG> They all produce the same result: -2147483648 The bug here is that the ftoi4 conversion function thinks it can do this: if (num < INT_MIN || num > INT_MAX || isnan(num)) but this causes INT_MIN and INT_MAX to be implicitly converted to floats (not doubles), which can't represent their values exactly and ends up with the conditions being false in the edge cases. (Specifically, (float)INT_MAX is a bit larger than INT_MAX.) Probably this should be changed to use ((float8) INT_MAX) etc. instead? PG> I understand that a float4 cannot represent large integers with the PG> same precision as int4, that's OK. What surprised me is that PG> instead of getting an "overflow error" or "integer out of range" I PG> simply got a negative result for a value that is actually close to PG> maximum int4. To contrast this, the query: PG> SELECT ((2147483647::float4) + 200.0::float4)::int4; PG> The above produces the expected "ERROR: integer out of range" Likewise you also get that error if you cast the float4 to a float8 before casting to integer. -- Andrew (irc:RhodiumToad)
Re: BUG #15519: Casting float4 into int4 gets the wrong sign instead of "integer out of range" error
From
Tom Lane
Date:
=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes: > Offending examples: > SELECT ((2147483647::float4) - 1.0::float4)::int4; > SELECT ((2147483590::float4) - 1.0::float4)::int4; > SELECT ((2147483647::float4) + 1.0::float4)::int4; > They all produce the same result: -2147483648 Huh, interesting. The code underlying this looks sane enough at first glance: float4 num = PG_GETARG_FLOAT4(0); if (num < INT_MIN || num > INT_MAX || isnan(num)) ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg("integer out of range"))); PG_RETURN_INT32((int32) rint(num)); I think what is happening is that the compiler is interpreting "num > INT_MAX" as something to be done in float4 arithmetic, so it computes (float4) INT_MAX which rounds off to be the equivalent of exactly 2147483648. Then the problematic input values, which all also round to 2147483648, get past the if-test and result in undetected overflow in the cast to int32. Perhaps we could fix this by writing if (num < (float8) INT_MIN || num > (float8) INT_MAX || isnan(num)) but I don't have a huge amount of confidence in that either, especially not for the similar coding in ftoi8 and dtoi8, where the comparison values are large enough that they'd not be exactly represented by float8 either. Maybe we need an explicit check for the integer result being of the wrong sign, to catch just-barely-overflowing cases like these. regards, tom lane
Re: BUG #15519: Casting float4 into int4 gets the wrong sign instead of "integer out of range" error
From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "PG" == PG Bug reporting form <noreply@postgresql.org> writes: > PG> Offending examples: > PG> SELECT ((2147483647::float4) - 1.0::float4)::int4; > PG> SELECT ((2147483590::float4) - 1.0::float4)::int4; > PG> SELECT ((2147483647::float4) + 1.0::float4)::int4; > PG> They all produce the same result: -2147483648 > The bug here is that the ftoi4 conversion function thinks it can do > this: > if (num < INT_MIN || num > INT_MAX || isnan(num)) > Probably this should be changed to use ((float8) INT_MAX) etc. instead? After thinking about this for awhile, there's another class of undesirable behavior here, which can be illustrated by regression=# select '32766.1'::float4::int2; int2 ------- 32766 (1 row) regression=# select '32767.1'::float4; float4 --------- 32767.1 (1 row) regression=# select '32767.1'::float4::int2; ERROR: smallint out of range It doesn't seem very nice that that fails rather than producing 32767. I think we could fix that by doing the rint() first, before the comparisons, which should be safe enough: as the Linux man page for it points out, it really can't ever fail. I'm inclined to write ftoi4 as /* * Get rid of any fractional part in the input. This is so we don't * fail on just-out-of-range values that would round into range. */ num = rint(num); /* * Range check. We must be careful here that the boundary values are * expressed exactly in the appropriate float domain; we assume float4 * is going to round off INT_MAX to a power of 2. Also note assumption * that rint() will pass through a NaN or Inf unchanged. */ if (unlikely(num < (float4) INT_MIN || num >= (float4) INT_MAX || isnan(num))) ereport(...); PG_RETURN_INT32((int32) num); We could use the same pattern for the other five functions at issue, but "num >=" would be "num >" in cases where we expect the comparison value to be represented without roundoff. Obviously, some regression test cases to verify all these assumptions would be a good idea. regards, tom lane
Re: BUG #15519: Casting float4 into int4 gets the wrong sign instead of "integer out of range" error
From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> /* Tom> * Range check. We must be careful here that the boundary values are Tom> * expressed exactly in the appropriate float domain; we assume float4 Tom> * is going to round off INT_MAX to a power of 2. Also note assumption Tom> * that rint() will pass through a NaN or Inf unchanged. Tom> */ Tom> if (unlikely(num < (float4) INT_MIN || num >= (float4) INT_MAX || isnan(num))) Tom> ereport(...); Looking around, another approach I've seen (which I like better) is to use if (num < (float4)INT_MIN || num >= -(float4)INT_MIN || ... which doesn't rely on assumptions about how the compiler is going to round INT_MAX. (It does rely on two's complement representation of ints, but that assumption is known to be safe.) -- Andrew (irc:RhodiumToad)
Re: BUG #15519: Casting float4 into int4 gets the wrong sign instead of "integer out of range" error
From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Tom> if (unlikely(num < (float4) INT_MIN || num >= (float4) INT_MAX || isnan(num))) > if (num < (float4)INT_MIN || num >= -(float4)INT_MIN || ... Meh. Seems to me that's relying on pretty much the same assumptions and throwing in an extra dollop of obscurantism on top. regards, tom lane
Re: BUG #15519: Casting float4 into int4 gets the wrong sign instead of "integer out of range" error
From
Tom Lane
Date:
I wrote: > Andrew Gierth <andrew@tao11.riddles.org.uk> writes: >> if (num < (float4)INT_MIN || num >= -(float4)INT_MIN || ... > Meh. Seems to me that's relying on pretty much the same assumptions > and throwing in an extra dollop of obscurantism on top. mmm ... but on the other hand, it means we don't need to write the test differently depending on whether we think INTxx_MAX will get rounded. So that's worth something. regards, tom lane
Re: BUG #15519: Casting float4 into int4 gets the wrong sign instead of "integer out of range" error
From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> if (unlikely(num < (float4) INT_MIN || num >= (float4) INT_MAX || isnan(num))) >> if (num < (float4)INT_MIN || num >= -(float4)INT_MIN || ... Tom> Meh. Seems to me that's relying on pretty much the same Tom> assumptions No, because we know that INT_MIN is always exactly representable as a float (whereas INT_MAX is not), and therefore the cast result will not depend on any rounding choices whether at compile or run time. Nor does it depend on knowing that float4 can't represent INT_MAX exactly. -- Andrew (irc:RhodiumToad)
Re: BUG #15519: Casting float4 into int4 gets the wrong sign insteadof "integer out of range" error
From
Victor Petrovykh
Date:
I'm sorry I'm kinda new here.
Am I missing something in thinking that the cast of a float to int should produce an error if the sign of the original value doesn't match the sign of the cast result? Presumably you have access to both values at some point and checking sign bits for sameness should be simple.
Also, I would say that doing some float4 arithmetic and then casting into int4 or int8 should be consistent, so if
SELECT (54610::float4 * 39324::float4)::int8;
gets me 2147483648, then
SELECT (54610::float4 * 39324::float4)::int4;
should be an error rather than the unintuitive -2147483648.
The scenario that I'm pointing at is basically first doing some float4 arithmetic, casting the result into an int4 and expecting that the result (if it was without error) is actually about as close to the original float value as float4 precision would normally allow.
On Fri, Nov 23, 2018 at 7:45 PM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> if (unlikely(num < (float4) INT_MIN || num >= (float4) INT_MAX || isnan(num)))
>> if (num < (float4)INT_MIN || num >= -(float4)INT_MIN || ...
Tom> Meh. Seems to me that's relying on pretty much the same
Tom> assumptions
No, because we know that INT_MIN is always exactly representable as a
float (whereas INT_MAX is not), and therefore the cast result will not
depend on any rounding choices whether at compile or run time. Nor does
it depend on knowing that float4 can't represent INT_MAX exactly.
--
Andrew (irc:RhodiumToad)
Re: BUG #15519: Casting float4 into int4 gets the wrong sign instead of "integer out of range" error
From
Tom Lane
Date:
Victor Petrovykh <victor@magic.io> writes: > Am I missing something in thinking that the cast of a float to int should > produce an error if the sign of the original value doesn't match the sign > of the cast result? Well, yeah, our traditional way of coding this overflow check would probably have compared the sign of the result to the sign of the input, but (a) we'd still have needed some ad-hoc range check to avoid getting fooled by inputs large enough to have wrapped around an even number of times, and (b) this approach depends on the compiler not taking any liberties based on an assumption that the program doesn't provoke integer overflow. (b) gets more worrisome with each year that goes by, because the compiler guys keep finding ever-more-creative ways to break your code if it violates C-spec semantics. So we really want to write a test that will fail exactly when the integer coercion would overflow, not do the coercion and then check to see if it overflowed. regards, tom lane
Re: BUG #15519: Casting float4 into int4 gets the wrong sign insteadof "integer out of range" error
From
Victor Petrovykh
Date:
Thanks for the clarification.
I don't think I make any unsafe architecture assumptions in the above. As far as I can tell the only such assumption is about likely values of X and Y being 8, 16, 32, 64, 128, etc. and that the mantissa of a float is using more than half of its bits. I'm assuming that casting function knows the 2 actual types so it can make a deterministic decision about the comparison it must use.
I was thinking about the assumption whether or not INT_MAX is representable perfectly as a float. Consider any combination of types int_X and float_Y where X and Y denote the number of bits used for representation:
- for any type int_X it is always true that its maximum value cannot be represented by a float_Y if X >= Y because at least one bit of the float_Y must be used to represent the exponent part of the float (else the float_Y is indistinguishable from int_Y)
- for any type int_X it is always possible to represent the maximum value (and by extension any other value) exactly as a float_Y if X <= M(Y), meaning that if the mantissa of the float_Y has at least as many bits as X. In practice X and Y will be some form of 8, 16, 32, 64, etc. so 2X <= Y and unless the mantissa is less than half of the significant bits of the float we will have x <= M(Y) for any practical X < Y
- for any int_X and float_Y there exists a specific value FLOATY, such that for all possible float_Y values y < FLOATY it is also true that y <= MAX_INT_X and for all y >= FLOATY it is also true that y > MAX_INT (basically I can always pick a float threshold above which all numbers would be above MAX_INT and therefore out of range and below it would be guaranteed to be representable as ints). I conjecture that this special value FLOATY = (float_Y)MAX_INT_X for any X >= Y. I think I can formally prove this conjecture, basically it has to do with sparseness of float values when compared to MAX_INT_X + 1 and MAX_INT_X - 1.
So it seems to me that the rule for casting would depend on whether the float has same or fewer bits than the int or not:
- when the int has same or more bits (e.g. float4 -> int4 or float4 -> int8)
if (num < INT_MIN || num >= INT_MAX || isnan(num))
- when the int has fewer bits (e.g. float4 -> int2 or float8 -> int4)
if (num < INT_MIN || num > INT_MAX || isnan(num))
I have one more related question: is a fix for this likely to appear in the next Postgres release?
On Fri, Nov 23, 2018 at 8:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Victor Petrovykh <victor@magic.io> writes:
> Am I missing something in thinking that the cast of a float to int should
> produce an error if the sign of the original value doesn't match the sign
> of the cast result?
Well, yeah, our traditional way of coding this overflow check would
probably have compared the sign of the result to the sign of the input,
but (a) we'd still have needed some ad-hoc range check to avoid getting
fooled by inputs large enough to have wrapped around an even number of
times, and (b) this approach depends on the compiler not taking any
liberties based on an assumption that the program doesn't provoke
integer overflow. (b) gets more worrisome with each year that goes by,
because the compiler guys keep finding ever-more-creative ways to break
your code if it violates C-spec semantics. So we really want to write
a test that will fail exactly when the integer coercion would overflow,
not do the coercion and then check to see if it overflowed.
regards, tom lane
Re: BUG #15519: Casting float4 into int4 gets the wrong sign instead of "integer out of range" error
From
Tom Lane
Date:
Victor Petrovykh <victor@magic.io> writes: > I have one more related question: is a fix for this likely to appear in the > next Postgres release? Yes, it was pushed last week. https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=e473e1f2b regards, tom lane