Thread: Extending a bit string

Extending a bit string

From
Evan Carroll
Date:
Currently the behavior of bit-string extensions is pretty insane.

SELECT b'01'::bit(2)::bit(4),
  b'01'::bit(2)::int::bit(4);
 bit  | bit  
------+------
 0100 | 0001
(1 row)

I'd like propose we standardize this a bit. Previously, in version 8 compatibility was broke. From the Version 8 release notes (thanks to Rhodium Toad for the research),

> Casting an integer to BIT(N) selects the rightmost N bits of the integer, not the leftmost N bits as before.

Everything should select the right-most bits, and extend the left-most bits from the docs:

> Casting an integer to a bit string width wider than the integer itself will sign-extend on the left.

That makes sense to me. Intergers sign-extend on the left, and the behavior is currently undefined for bit->bit extensions. What say you?

--
Evan Carroll - me@evancarroll.com
System Lord of the Internets
web: http://www.evancarroll.com
ph: 281.901.0011

Re: Extending a bit string

From
Tom Lane
Date:
Evan Carroll <me@evancarroll.com> writes:
> Currently the behavior of bit-string extensions is pretty insane.

You've provided no support for this assertion, much less any defense
of why your proposed semantics change is any less insane.  Also, if
I understood you correctly, you want to change the semantics of
casting a bitstring to a bitstring of a different length, which is
an operation that's defined by the SQL standard.  You will get zero
traction on that unless you convince people that we've misread the
standard.  Which is possible, but the text seems clear to me that
casting bit(2) to bit(4) requires addition of zeroes on the right:

        11) If TD is fixed-length bit string, then let LTD be the length in
            bits of TD. Let BLSV be the result of BIT_LENGTH(SV).
            ...
            c) If BLSV is smaller than LTD, then TV is SV expressed as a
              bit string extended on the right with LTD-BLSV bits whose
              values are all 0 (zero) and a completion condition is raised:
              warning - implicit zero-bit padding.

That's SQL:99 6.22 <cast specification> general rule 11) c).
(SV and TD are the source value and the target datatype for a cast.)

In hindsight, it would likely be more consistent with this if we'd
considered bitstrings to be LSB first when coercing them to/from integer,
but whoever stuck that behavior in didn't think about it.  Too late to
change that now I'm afraid, though perhaps we could provide non-cast
conversion functions that act that way.

            regards, tom lane


Re: Extending a bit string

From
Evan Carroll
Date:
That's SQL:99 6.22 <cast specification> general rule 11) c).
(SV and TD are the source value and the target datatype for a cast.)

In hindsight, it would likely be more consistent with this if we'd
considered bitstrings to be LSB first when coercing them to/from integer,
but whoever stuck that behavior in didn't think about it.  Too late to
change that now I'm afraid, though perhaps we could provide non-cast
conversion functions that act that way.

Apologies, I was under the impression that casts were not in the spec. I withdraw my request. In the 2016-draft it reads,

> If the length in octets M of SV is smaller than LTD, then TV is SV extended on the right by
LTD–M X'00's.

That's how I read it too, and whether I feel like it's insane doesn't matter much. But yet, the idea

    5:bit(8)::bit(32)::int

Not being 5 is terrifying, so you won't find any objections to the current behavior from me.

--
Evan Carroll - me@evancarroll.com
System Lord of the Internets
web: http://www.evancarroll.com
ph: 281.901.0011

Re: Extending a bit string

From
Evan Carroll
Date:
In hindsight, it would likely be more consistent with this if we'd
considered bitstrings to be LSB first when coercing them to/from integer,
but whoever stuck that behavior in didn't think about it.  Too late to
change that now I'm afraid, though perhaps we could provide non-cast
conversion functions that act that way.

I've been thinking about that, and that actually makes sense and I'd prefer to revert to the pre-8.0 behavior. I just wanted to speak up to retract that response too. In reality, I am used to the interger display as it currently is. The current behavior of the coercion to/from int enforces the bias that I have. It led me to believe that PostgreSQL would act like that consistently because that's what I am used to.

SELECT 5::int::bit(8);
   bit    
----------
 00000101

As compared to 10100000. But fundamentally SQL and the current helper functions don't operate like that, so it's bizarre. Moreover, the difference between the two makes it very error prone. For example, this doesn't make sense,

    SELECT get_bit(1::bit(1), 0),     get_bit(1::bit(2), 1);

But, this does

    SELECT get_bit(B'1'::bit(1), 0),     get_bit(B'1'::bit(2), 1);

I'm sure it would have been substantially less confusing if integers displayed their LSB on the left after casting. I think I would have figured out what was going on *much* faster. You were right on everything in your initial response (as I've come to expect).

--
Evan Carroll - me@evancarroll.com
System Lord of the Internets
web: http://www.evancarroll.com
ph: 281.901.0011