Thread: BUG #1578: ::bit(n) behaves "differently" if applied to bit strings than integers.

The following bug has been logged online:

Bug reference:      1578
Logged by:          Celia McInnis
Email address:      celia@drmath.ca
PostgreSQL version: 8.0.1
Operating system:   Linux
Description:        ::bit(n) behaves "differently" if applied to bit strings
than integers.
Details:

It's probably not good (at least for mathematicians!) to have the following
give different results:

select B'1110110101'::bit(6);
select B'1110110101'::integer::bit(6);

The first gives 110101 (the 6 least significant bits).
The second gives 111011 (the 6 most significant bis).

Re: BUG #1578: ::bit(n) behaves "differently" if applied to bit

From
Bruce Momjian
Date:
Celia McInnis wrote:
>
> The following bug has been logged online:
>
> Bug reference:      1578
> Logged by:          Celia McInnis
> Email address:      celia@drmath.ca
> PostgreSQL version: 8.0.1
> Operating system:   Linux
> Description:        ::bit(n) behaves "differently" if applied to bit strings
> than integers.
> Details:
>
> It's probably not good (at least for mathematicians!) to have the following
> give different results:
>
> select B'1110110101'::bit(6);
> select B'1110110101'::integer::bit(6);
>
> The first gives 110101 (the 6 least significant bits).
> The second gives 111011 (the 6 most significant bis).

I ran some tests on your example:

    test=> select B'1'::bit(6);
      bit
    --------
     100000
    (1 row)

    test=> select B'1'::integer::bit(6);
      bit
    --------
     000001
    (1 row)

    test=> select B'100000'::bit(6);
      bit
    --------
     100000
    (1 row)

    test=> select B'100000'::integer::bit(6);
      bit
    --------
     100000
    (1 row)

From this, it seems the issue is how ::bit should pad a string if the
number of bits supplied in the string is less than the length specified.
I think it is behaving correctly to pad with zeros on the end because it
is not a number but a string of bits.

What happens with the ::integer cast is that the string is expanded to
32 bits, and then cast to only six.  An argument could be made that it
should then take the left 6 bits and they should all be 0's.  In fact,
that's what it does if you supply just a string with zero padding:

    test=> select B'0000000000100000'::bit(6);
      bit
    --------
     000000
    (1 row)

    test=> select B'0000000000100000'::integer::bit(6);
      bit
    --------
     100000
    (1 row)

Looking at the code, backend/utils/adt/varbit.c::bitfromint4 is a
special function just for converting to bit from int4, and this piece of
the code seems most significant:

    /* drop any input bits that don't fit */
    srcbitsleft = Min(srcbitsleft, destbitsleft);

And, I think it is done this way because int4 has a fixed length, so it
is pretty hard to set the high 3 bits in an int32 value, so the way the
code works is it assumes you want the lower X bits during the
conversion.  Though this is slightly inconsistent in how it works with a
bit string, it does seem the most useful approach.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: BUG #1578: ::bit(n) behaves "differently" if applied to bit

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> ... Though this is slightly inconsistent in how it works with a
> bit string, it does seem the most useful approach.

Also, this is a behavior that we changed in 8.0 in response to previous
complaints.  Before changing it again, I'd want to see a pretty detailed
analysis of either (1) why the previous complainers were idiots, or
(2) why this change is still compatible with what they asked for ;-)

Where relevant, comments on SQL spec compliance would help the argument
too.  I don't think the spec defines any bit<->integer conversions,
but it definitely addresses the issue of converting from bit(m) to bit(n).

            regards, tom lane

Re: BUG #1578: ::bit(n) behaves "differently" if applied to bit

From
Tom Lane
Date:
"Celia McInnis" <celia@drmath.ca> writes:
> select 949::bit(10) gives 1110110101 (as expected).
> select 949::bit(10)::bit(3) gives 111 (the 3 most significant bits)
> select 949::bit(3) gives 101 (the 3 least significant bits).

> As a mathematician, I'd certainly at least want the last two selects to give
> the same results!

Unfortunately, the behavior of casting between bit(m) and bit(n) is not
open to negotiation --- it's given by the SQL standard.

I think the only way to make this stuff truly mathematically consistent
would be to go over to a "little endian" approach in which casting from
integer to bit puts the LSB of the integer into the first, not the last,
bit of the bit string.  But that is probably too big a compatibility
hit to consider ... and I for one would find it less useful not more.

            regards, tom lane

Re: BUG #1578: ::bit(n) behaves "differently" if applied to bit

From
"Celia McInnis"
Date:
Thanks very much Bruce and Tom for your responses and explanations. The
current mode of operation is mathematically disturbing so I'm hoping that you
can figure out some sort of a fix! Perhaps this example is a little more
convincing than my last one at showing the bug/feature:

949 base 10 = 1110110101 base two.

select 949::bit(10) gives 1110110101 (as expected).
select 949::bit(10)::bit(3) gives 111 (the 3 most significant bits)
select 949::bit(3) gives 101 (the 3 least significant bits).

As a mathematician, I'd certainly at least want the last two selects to give
the same results!

Whether the least or most significant bits are given is negotiable, though in
most things I do with bitstrings it would be more convenient to get the least
significant bits (as I guess was the case before 8.0, and which is still what
8.0 does in casting from integer to bit(n).).

If my example above does not convince you, can you please point me to the
arguments which the previous complainers used to convince the postgresql
developers to change the bit() casting behaviour in 8.0? It would certainly
seem to me that the old way was better mathematically! Perhaps their desires
could have been handled with some new bit handling function rather than
through a change in how things are casted?

Thanks for your help,
Celia McInnis

On Wed, 11 May 2005 01:36:51 -0400, Tom Lane wrote
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > ... Though this is slightly inconsistent in how it works with a
> > bit string, it does seem the most useful approach.
>
> Also, this is a behavior that we changed in 8.0 in response to previous
> complaints.  Before changing it again, I'd want to see a pretty detailed
> analysis of either (1) why the previous complainers were idiots, or
> (2) why this change is still compatible with what they asked for ;-)
>
> Where relevant, comments on SQL spec compliance would help the argument
> too.  I don't think the spec defines any bit<->integer conversions,
> but it definitely addresses the issue of converting from bit(m) to
> bit(n).
>
>             regards, tom lane


--
Open WebMail Project (http://openwebmail.org)

Re: BUG #1578: ::bit(n) behaves "differently" if applied to bit

From
Bruno Wolff III
Date:
On Thu, May 12, 2005 at 10:40:34 -0500,
  Celia McInnis <celia@drmath.ca> wrote:
> Thanks very much Bruce and Tom for your responses and explanations. The
> current mode of operation is mathematically disturbing so I'm hoping that you
> can figure out some sort of a fix! Perhaps this example is a little more
> convincing than my last one at showing the bug/feature:
>
> 949 base 10 = 1110110101 base two.
>
> select 949::bit(10) gives 1110110101 (as expected).

Why is that expected? Based on your reasoning for the last case I would
expect '0000000000', since the first 10 bits of that integer are 0.

> select 949::bit(10)::bit(3) gives 111 (the 3 most significant bits)
> select 949::bit(3) gives 101 (the 3 least significant bits).
>
> As a mathematician, I'd certainly at least want the last two selects to give
> the same results!

Re: BUG #1578: ::bit(n) behaves "differently" if applied to bit

From
Celia McInnis
Date:
>   Celia McInnis <celia@drmath.ca> wrote:
>>949 base 10 = 1110110101 base two.
>>
>>select 949::bit(10) gives 1110110101 (as expected).

Bruno Wolff III wrote:
> Why is that expected? Based on your reasoning for the last case I would
> expect '0000000000', since the first 10 bits of that integer are 0.

Good - you see the inconsistency which I was pointing out - that casting
to bit(n)'s is done "differently" for integers than it is for bit(m)'s!

In terms of "as expected", I meant "as expected by a mathematician" -
ie., that I'd expect that casting an integer (in the mathematical sense
rather than in some limited-storage redefinition made by computer
scientists) to a bit string would give the base two representation of
the integer provided the bit string is long enough (as it was in my
example).

It's actually too bad that the SQL standard took the MSB route rather
than the LSB route for bit(n)->bit(m) casting, since it kind of requires
  the inconsistency given above in dealing with integers in order to be
functional. Tom Lane explained this very well. Anyway, I'm happy that
949::bit(10) gives 1110110101 which is the base 2 representation of 949
(and I'm sure that nobody would be impressed if they had to start
worrying about the effect of changing to machines with different
(computer-science) integer sizes), so, unless somebody changes the SQL
standard for bit(n), I appreciate that the postgresql developers have
made a sensible choice concerning casting from integers to bit(n).

Thanks very much for helping me to understand,
Celia McInnis
(who'll just have to "get over" her discomfort for the below :-) )

>>select 949::bit(10)::bit(3) gives 111 (the 3 most significant bits)
>>select 949::bit(3) gives 101 (the 3 least significant bits).
>>
>>As a mathematician, I'd certainly at least want the last two selects to give
>>the same results!