Thread: to_hex() for negative inputs
I only recently realised that to_hex() converts its input to unsigned before converting it to hex (something that's not mentioned in the docs): to_hex(-1) -> ffffffff I think that's something that some users might find surprising, especially if they were expecting to be able to use it to output values that could be read back in, now that we support non-decimal integer input. So I think the docs should be a little more explicit about this. I think the following should suffice: --- to_hex ( integer ) -> text to_hex ( bigint ) -> text Converts the number to its equivalent two's complement hexadecimal representation. to_hex(1234) -> 4d2 to_hex(-1234) -> fffffb2e --- instead of the existing example with 2147483647, which doesn't add much. I also think it might be useful for it to gain a couple of boolean options: 1). An option to output a signed value (defaulting to false, to preserve the current two's complement output). 2). An option to output the base prefix "0x" (which comes after the sign, making it inconvenient for the user to add themselves). I've also been idly wondering about whether we should have a numeric variant (for integers only, since non-integers won't necessarily terminate in hex, aren't accepted as inputs, and don't seem particularly useful anyway). I don't think two's complement output makes much sense for numeric, so perhaps it should only have the prefix option, and always output signed hex strings. Thoughts? Regards, Dean
Hi Dean, > I only recently realised that to_hex() converts its input to unsigned > before converting it to hex (something that's not mentioned in the > docs): Technically the documentation is accurate [1]: """ Converts the number to its equivalent hexadecimal representation. """ But I agree that adding an example with negative numbers will not hurt. Would you like to submit a patch? > I also think it might be useful for it to gain a couple of boolean options: Adding extra arguments for something the user can implement (him/her)self doesn't seem to be a great idea. With this approach we may end up with hundreds of arguments one day. > 1). An option to output a signed value (defaulting to false, to > preserve the current two's complement output). This in particular can be done like this: ``` =# select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x)) from ( values (123), (-123) ) as s(x); ?column? ---------- 7b -7b (2 rows) ``` > 2). An option to output the base prefix "0x" (which comes after the > sign, making it inconvenient for the user to add themselves). Ditto: ``` =# select '0x' || to_hex(x) from ( values (123), (-123) ) as s(x); ?column? ------------ 0x7b 0xffffff85 ``` [1]: https://www.postgresql.org/docs/current/functions-string.html -- Best regards, Aleksander Alekseev
On Tue, 24 Jan 2023 at 13:43, Aleksander Alekseev <aleksander@timescale.com> wrote: > > Adding extra arguments for something the user can implement > (him/her)self doesn't seem to be a great idea. With this approach we > may end up with hundreds of arguments one day. > I don't see how a couple of extra arguments will expand to hundreds. > =# select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x)) > from ( values (123), (-123) ) as s(x); > Which is broken for INT_MIN: select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x)) from ( values (-2147483648) ) as s(x); ERROR: integer out of range Part of the reason for implementing this in core would be to save users from such easy-to-overlook bugs. Regards, Dean
Hi Dean, > I don't see how a couple of extra arguments will expand to hundreds. Maybe I was exaggerating, but the point is that adding extra flags for every possible scenario is a disadvantageous approach in general. There is no need to increase the code base, the amount of test cases and the amount of documentation every time someone has an idea "in rare cases I also may want to do A or B, let's add a flag for this". > Which is broken for INT_MIN: > > select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x)) > from ( values (-2147483648) ) as s(x); > > ERROR: integer out of range I'm afraid the behavior of something like to_hex(X, with_sign => true) is going to be exactly the same. There is no safe and consistent way to calculate abs(INT_MIN). -- Best regards, Aleksander Alekseev
On Wed, 25 Jan 2023 at 09:02, Aleksander Alekseev <aleksander@timescale.com> wrote: > > > I don't see how a couple of extra arguments will expand to hundreds. > > Maybe I was exaggerating, but the point is that adding extra flags for > every possible scenario is a disadvantageous approach in general. > There is no need to increase the code base, the amount of test cases > and the amount of documentation every time someone has an idea "in > rare cases I also may want to do A or B, let's add a flag for this". > OK, but the point was that we've just added support for accepting hex inputs in a particular format, so I think it would be useful if to_hex() could produce outputs compatible with that. > > Which is broken for INT_MIN: > > > > select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x)) > > from ( values (-2147483648) ) as s(x); > > > > ERROR: integer out of range > > I'm afraid the behavior of something like to_hex(X, with_sign => true) > is going to be exactly the same. There is no safe and consistent way > to calculate abs(INT_MIN). > Of course there is. This is easy to code in C using unsigned ints, without resorting to abs() (yes, I'm aware that abs() is undefined for INT_MIN). Regards, Dean
Hi Dean, > Of course there is. This is easy to code in C using unsigned ints, > without resorting to abs() (yes, I'm aware that abs() is undefined for > INT_MIN). So in your opinion what is the expected result of to_hex(INT_MIN, with_sign => true)? -- Best regards, Aleksander Alekseev
On Wed, 25 Jan 2023 at 10:57, Aleksander Alekseev <aleksander@timescale.com> wrote: > > > Of course there is. This is easy to code in C using unsigned ints, > > without resorting to abs() (yes, I'm aware that abs() is undefined for > > INT_MIN). > > So in your opinion what is the expected result of to_hex(INT_MIN, > with_sign => true)? > "-80000000" or "-0x80000000", depending on whether the prefix is requested. The latter is legal input for an integer, which is the point (to allow round-tripping): SELECT int '-0x80000000'; int4 ------------- -2147483648 (1 row) SELECT pg_typeof(-0x80000000); pg_typeof ----------- integer (1 row) Regards, Dean
Hi Dean, > > So in your opinion what is the expected result of to_hex(INT_MIN, > > with_sign => true)? > > > > "-80000000" or "-0x80000000", depending on whether the prefix is > requested. Whether this is the right result is very debatable. 0x80000000 is a binary representation of -2147483648: ``` (gdb) ptype cur_timeout type = int (gdb) p cur_timeout = 0x80000000 $1 = -2147483648 (gdb) p/x cur_timeout $2 = 0x80000000 ``` So what you propose to return is -(-2147483648). For some users this may be a wanted result, for some it may be not. Personally I would prefer to get an ERROR in this case. And this is exactly how you end up with even more flags. I believe it would be better to let the user write the exact query depending on what he/she wants. -- Best regards, Aleksander Alekseev
On 24.01.23 14:10, Dean Rasheed wrote: > I also think it might be useful for it to gain a couple of boolean options: > > 1). An option to output a signed value (defaulting to false, to > preserve the current two's complement output). I find the existing behavior so strange, I would rather give up and invent a whole new function family with correct behavior, which could then also support octal and binary, and perhaps any base. This could be something generally named, like "convert" or "format". > 2). An option to output the base prefix "0x" (which comes after the > sign, making it inconvenient for the user to add themselves). This could also be handled with a "format"-like function.
On Wed, 25 Jan 2023 at 21:43, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > On 24.01.23 14:10, Dean Rasheed wrote: > > I also think it might be useful for it to gain a couple of boolean options: > > > > 1). An option to output a signed value (defaulting to false, to > > preserve the current two's complement output). > > I find the existing behavior so strange, I would rather give up and > invent a whole new function family with correct behavior, which could > then also support octal and binary, and perhaps any base. This could be > something generally named, like "convert" or "format". > > > 2). An option to output the base prefix "0x" (which comes after the > > sign, making it inconvenient for the user to add themselves). > > This could also be handled with a "format"-like function. > Yeah, making a break from the existing to_hex() functions might not be a bad idea. My only concern with something general like convert() or format() is that it'll end up being hard to use, with lots of different formatting options, like to_char(). In fact Oracle's to_char() has an 'X' formatting option to output hexadecimal, though it's pretty limited (e.g., it doesn't support negative inputs). MySQL has conv() that'll convert between any two bases, but it does bizarre things for negative inputs or inputs larger than 2^64. TBH, I'm not that interested in bases other than hex (I mean who really uses octal anymore?), and I don't particularly care about formats other than the format we accept as input. For that, just adding a numeric_to_hex() function would suffice. That works fine for int4 and int8 inputs too, and doesn't preclude adding a more general base-conversion / formatting function later, if there's demand. Regards, Dean