Thread: BUG #1578: ::bit(n) behaves "differently" if applied to bit strings than integers.
BUG #1578: ::bit(n) behaves "differently" if applied to bit strings than integers.
From
"Celia McInnis"
Date:
The following bug has been logged online: Bug reference: 1578 Logged by: Celia McInnis Email address: celia@drmath.ca PostgreSQL version: 8.0.1 Operating system: Linux Description: ::bit(n) behaves "differently" if applied to bit strings than integers. Details: It's probably not good (at least for mathematicians!) to have the following give different results: select B'1110110101'::bit(6); select B'1110110101'::integer::bit(6); The first gives 110101 (the 6 least significant bits). The second gives 111011 (the 6 most significant bis).
Celia McInnis wrote: > > The following bug has been logged online: > > Bug reference: 1578 > Logged by: Celia McInnis > Email address: celia@drmath.ca > PostgreSQL version: 8.0.1 > Operating system: Linux > Description: ::bit(n) behaves "differently" if applied to bit strings > than integers. > Details: > > It's probably not good (at least for mathematicians!) to have the following > give different results: > > select B'1110110101'::bit(6); > select B'1110110101'::integer::bit(6); > > The first gives 110101 (the 6 least significant bits). > The second gives 111011 (the 6 most significant bis). I ran some tests on your example: test=> select B'1'::bit(6); bit -------- 100000 (1 row) test=> select B'1'::integer::bit(6); bit -------- 000001 (1 row) test=> select B'100000'::bit(6); bit -------- 100000 (1 row) test=> select B'100000'::integer::bit(6); bit -------- 100000 (1 row) From this, it seems the issue is how ::bit should pad a string if the number of bits supplied in the string is less than the length specified. I think it is behaving correctly to pad with zeros on the end because it is not a number but a string of bits. What happens with the ::integer cast is that the string is expanded to 32 bits, and then cast to only six. An argument could be made that it should then take the left 6 bits and they should all be 0's. In fact, that's what it does if you supply just a string with zero padding: test=> select B'0000000000100000'::bit(6); bit -------- 000000 (1 row) test=> select B'0000000000100000'::integer::bit(6); bit -------- 100000 (1 row) Looking at the code, backend/utils/adt/varbit.c::bitfromint4 is a special function just for converting to bit from int4, and this piece of the code seems most significant: /* drop any input bits that don't fit */ srcbitsleft = Min(srcbitsleft, destbitsleft); And, I think it is done this way because int4 has a fixed length, so it is pretty hard to set the high 3 bits in an int32 value, so the way the code works is it assumes you want the lower X bits during the conversion. Though this is slightly inconsistent in how it works with a bit string, it does seem the most useful approach. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > ... Though this is slightly inconsistent in how it works with a > bit string, it does seem the most useful approach. Also, this is a behavior that we changed in 8.0 in response to previous complaints. Before changing it again, I'd want to see a pretty detailed analysis of either (1) why the previous complainers were idiots, or (2) why this change is still compatible with what they asked for ;-) Where relevant, comments on SQL spec compliance would help the argument too. I don't think the spec defines any bit<->integer conversions, but it definitely addresses the issue of converting from bit(m) to bit(n). regards, tom lane
"Celia McInnis" <celia@drmath.ca> writes: > select 949::bit(10) gives 1110110101 (as expected). > select 949::bit(10)::bit(3) gives 111 (the 3 most significant bits) > select 949::bit(3) gives 101 (the 3 least significant bits). > As a mathematician, I'd certainly at least want the last two selects to give > the same results! Unfortunately, the behavior of casting between bit(m) and bit(n) is not open to negotiation --- it's given by the SQL standard. I think the only way to make this stuff truly mathematically consistent would be to go over to a "little endian" approach in which casting from integer to bit puts the LSB of the integer into the first, not the last, bit of the bit string. But that is probably too big a compatibility hit to consider ... and I for one would find it less useful not more. regards, tom lane
Thanks very much Bruce and Tom for your responses and explanations. The current mode of operation is mathematically disturbing so I'm hoping that you can figure out some sort of a fix! Perhaps this example is a little more convincing than my last one at showing the bug/feature: 949 base 10 = 1110110101 base two. select 949::bit(10) gives 1110110101 (as expected). select 949::bit(10)::bit(3) gives 111 (the 3 most significant bits) select 949::bit(3) gives 101 (the 3 least significant bits). As a mathematician, I'd certainly at least want the last two selects to give the same results! Whether the least or most significant bits are given is negotiable, though in most things I do with bitstrings it would be more convenient to get the least significant bits (as I guess was the case before 8.0, and which is still what 8.0 does in casting from integer to bit(n).). If my example above does not convince you, can you please point me to the arguments which the previous complainers used to convince the postgresql developers to change the bit() casting behaviour in 8.0? It would certainly seem to me that the old way was better mathematically! Perhaps their desires could have been handled with some new bit handling function rather than through a change in how things are casted? Thanks for your help, Celia McInnis On Wed, 11 May 2005 01:36:51 -0400, Tom Lane wrote > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > ... Though this is slightly inconsistent in how it works with a > > bit string, it does seem the most useful approach. > > Also, this is a behavior that we changed in 8.0 in response to previous > complaints. Before changing it again, I'd want to see a pretty detailed > analysis of either (1) why the previous complainers were idiots, or > (2) why this change is still compatible with what they asked for ;-) > > Where relevant, comments on SQL spec compliance would help the argument > too. I don't think the spec defines any bit<->integer conversions, > but it definitely addresses the issue of converting from bit(m) to > bit(n). > > regards, tom lane -- Open WebMail Project (http://openwebmail.org)
On Thu, May 12, 2005 at 10:40:34 -0500, Celia McInnis <celia@drmath.ca> wrote: > Thanks very much Bruce and Tom for your responses and explanations. The > current mode of operation is mathematically disturbing so I'm hoping that you > can figure out some sort of a fix! Perhaps this example is a little more > convincing than my last one at showing the bug/feature: > > 949 base 10 = 1110110101 base two. > > select 949::bit(10) gives 1110110101 (as expected). Why is that expected? Based on your reasoning for the last case I would expect '0000000000', since the first 10 bits of that integer are 0. > select 949::bit(10)::bit(3) gives 111 (the 3 most significant bits) > select 949::bit(3) gives 101 (the 3 least significant bits). > > As a mathematician, I'd certainly at least want the last two selects to give > the same results!
> Celia McInnis <celia@drmath.ca> wrote: >>949 base 10 = 1110110101 base two. >> >>select 949::bit(10) gives 1110110101 (as expected). Bruno Wolff III wrote: > Why is that expected? Based on your reasoning for the last case I would > expect '0000000000', since the first 10 bits of that integer are 0. Good - you see the inconsistency which I was pointing out - that casting to bit(n)'s is done "differently" for integers than it is for bit(m)'s! In terms of "as expected", I meant "as expected by a mathematician" - ie., that I'd expect that casting an integer (in the mathematical sense rather than in some limited-storage redefinition made by computer scientists) to a bit string would give the base two representation of the integer provided the bit string is long enough (as it was in my example). It's actually too bad that the SQL standard took the MSB route rather than the LSB route for bit(n)->bit(m) casting, since it kind of requires the inconsistency given above in dealing with integers in order to be functional. Tom Lane explained this very well. Anyway, I'm happy that 949::bit(10) gives 1110110101 which is the base 2 representation of 949 (and I'm sure that nobody would be impressed if they had to start worrying about the effect of changing to machines with different (computer-science) integer sizes), so, unless somebody changes the SQL standard for bit(n), I appreciate that the postgresql developers have made a sensible choice concerning casting from integers to bit(n). Thanks very much for helping me to understand, Celia McInnis (who'll just have to "get over" her discomfort for the below :-) ) >>select 949::bit(10)::bit(3) gives 111 (the 3 most significant bits) >>select 949::bit(3) gives 101 (the 3 least significant bits). >> >>As a mathematician, I'd certainly at least want the last two selects to give >>the same results!