Thread: BUG #6015: to_hex and negative integer

BUG #6015: to_hex and negative integer

From
"Marco Spiga"
Date:
The following bug has been logged online:

Bug reference:      6015
Logged by:          Marco Spiga
Email address:      ctxspi@gmail.com
PostgreSQL version: 8.3.12
Operating system:   Ubuntu Linux on kernel version 2.6.30.5
Description:        to_hex and negative integer
Details:

Hi!
First, thank for your excellent software.
My question is:
Why when I try: SELECT to_hex(-1)
it give me a strange result: ffffffff


Thanks Again

Re: BUG #6015: to_hex and negative integer

From
Craig Ringer
Date:
On 09/05/11 02:07, Marco Spiga wrote:
>
> The following bug has been logged online:
>
> Bug reference:      6015
> Logged by:          Marco Spiga
> Email address:      ctxspi@gmail.com
> PostgreSQL version: 8.3.12
> Operating system:   Ubuntu Linux on kernel version 2.6.30.5
> Description:        to_hex and negative integer
> Details:
>
> Hi!
> First, thank for your excellent software.
> My question is:
> Why when I try: SELECT to_hex(-1)
> it give me a strange result: ffffffff

That's not a bug, it's exactly what I'd expect to happen.

What did you expect to get instead? -0x01 ? 0xffff? 0xffffffffffffffff?

Don't understand why the result is the expected one by most people? Read
this: http://en.wikipedia.org/wiki/Two's_complement

... so you understand how signed numbers are represented internally by
most binary computer systems.

As it happens, the real problem with to_hex() on a negative number is
that the result doesn't make much sense unless you know the size of the
representation. For example, ffff can mean 65535 if interpreted as an
unsigned 16-bit interger, or can mean -1 if interpreted as a signed
16-bit int. ffffffff can mean 4294967295 if interpreted as an unsigned
32 bit int or as a signed 64-bit integer, but is -1 if interpreted as a
signed 32-bit integer.

Essentially, to_hex takes the _size_ of the input datum as an implicit
input. This means that you should NOT USE IT in any context where you do
not know the data type of the input, because its results vary in meaning
depending on what you feed into it.

classads=> select to_hex(-1 :: integer);
  to_hex
----------
 ffffffff
(1 row)

classads=> select to_hex(-1 :: bigint);
      to_hex
------------------
 ffffffffffffffff
(1 row)


--
Craig Ringer

Re: BUG #6015: to_hex and negative integer

From
Craig Ringer
Date:
On 05/26/2011 11:21 PM, Marco Spiga wrote:
> Thanks Craig for your reply.
>
> I have read wikipedia link and what you have tell me is TRUE.
>
> But if I give these select:
> megaradius=# SELECT to_hex(-42);
>    to_hex
> ----------
>   ffffffd6
> (1 riga)
>
> megaradius=# SELECT to_hex(4294967254);
>    to_hex
> ----------
>   ffffffd6
> (1 riga)

Ha! Great example. What's happening is that 4294967254 cannot be
represented as a 32-bit signed integer, so it's being extended to a
64-bit integer. As a result, you're comparing a signed int32 to a signed
int64 using a bitwise comparison without extending the int32 to int64,
which is incorrect.

select pg_typeof(-42);
  pg_typeof
-----------
  integer
(1 row)

select pg_typeof(4294967254);
  pg_typeof
-----------
  bigint
(1 row)


When converting 4294967254 to a hex representation, to_hex omits leading
zeroes; it's really 0x00000000ffffffd6 . This is fine so long as you
retain information about the format the original number was in - in
particular whether it was a 32-bit or 64-bit value. It's also fine if
you're only ever dealing with unsigned or positive values. Alas, to_hex
_does_ deal with negative values, and by discarding leading zeroes it
discards information about the format of the original number.

In this particular case, 4294967254 doesn't fit into a signed 32 bit
int, but it DOES fit into an unsigned 32-bit int, so it's only eight hex
digits. The result is *really* (int64)0x00000000ffffffd6, but the
omission of leading zeroes is creating confusion.

Personally, I think it's incorrect for to_hex to drop leading zeroes if
it also supports negative numbers. It should do one or the other, or it
should extend all values up to 64-bits before conversion so there's no
confusion with negative numbers. Unfortunately I doubt any of those
options can be implemented for backward compat reasons, but adding
to_hex_32 and to_hex_64 that take only one width would help without
breaking BC.

You can work around this yourself by ensuring that you NEVER call to_hex
with a 32-bit integer. Always extend to a 64-bit value to eliminate
confusion.

SELECT to_hex('4294967254'::bigint);
   to_hex
----------
  ffffffd6
(1 row)

SELECT to_hex('-42'::bigint);
       to_hex
------------------
  ffffffffffffffd6

You can create a simple wrapper to do this:


CREATE OR REPLACE FUNCTION to_hex_64 (smallint) RETURNS text AS $$
SELECT to_hex($1::bigint);
$$ LANGUAGE 'sql';

CREATE OR REPLACE FUNCTION to_hex_64 (integer) RETURNS text AS $$
SELECT to_hex($1::bigint);
$$ LANGUAGE 'sql';

CREATE OR REPLACE FUNCTION to_hex_64 (bigint) RETURNS text AS $$
SELECT to_hex($1::bigint);
$$ LANGUAGE 'sql';

Alternately, you can implement a to_hex variant that replaces the
dropped leading zeroes by using pg_typeof(...) to decide whether the
argument is a 32-bit or 64-bit integer and padding the result of to_hex
appropriately. That way there's no confusion.

> SELECT 'true' AS test WHERE (SELECT to_hex(-42)) = (SELECT to_hex(4294967254));
>   test

SELECT (SELECT to_hex(-42)) = (SELECT to_hex(4294967254));

?column?
----------
  t

SELECT
(SELECT to_hex('-42'::bigint)) = (SELECT to_hex('4294967254'::bigint));

?column?
--------
    f


--
Craig Ringer