Thread: bit type external representation

bit type external representation

From
Thomas Lockhart
Date:
SQL9x defines bit string constants with a format like
 B'101010'
and X'ABCD'

for binary and hexadecimal representations. But at the moment we don't
explicitly handle both of these cases as bit strings; the hex version is
converted to decimal in the scanner (*really* early in the parsing
stage) and then handled as an integer.

It looks like our current bit string type support looks for a "B" or "X"
embedded in the actual input string, rather than outside the quote as in
the standard.

I'd like to have more support for the SQL9x syntax, which requires a
little more invasive modification of at least the scanner and parser. I
have a couple of questions:

1) the SQL standard says what hex values should be translated to in
binary, which implies that all values may be *output* in binary format.
Should we do this, or should we preserve the info on what units were
used for input in the internal storage? Anyone interpret the standard
differently from this??

2) we would need to be able to determine the format style when a string
is output to be able to reconstruct the SQL shorthand representation (if
preserving binary or hex is to be done). So a column or value should
have a corresponding is_hex() function call. Any other suggestions??
                   - Thomas


Re: bit type external representation

From
Thomas Lockhart
Date:
> for binary and hexadecimal representations. But at the moment we don't
> explicitly handle both of these cases as bit strings; the hex version is
> converted to decimal in the scanner (*really* early in the parsing
> stage) and then handled as an integer.
> It looks like our current bit string type support looks for a "B" or "X"
> embedded in the actual input string, rather than outside the quote as in
> the standard.

I should point out that this is probably for historical reasons; I'd
implemented the hex to decimal conversion way before we had bit string
support, and we didn't consolidate those features when bit strings came
along.
                  - Thomas


Re: bit type external representation

From
"Christopher Kings-Lynne"
Date:
> for binary and hexadecimal representations. But at the moment we don't
> explicitly handle both of these cases as bit strings; the hex version is
> converted to decimal in the scanner (*really* early in the parsing
> stage) and then handled as an integer.
>
> It looks like our current bit string type support looks for a "B" or "X"
> embedded in the actual input string, rather than outside the quote as in
> the standard.

Postgres supports both:

test=# create table test (a bit(3));
CREATE
test=# insert into test values (B'101');
INSERT 3431020 1
test=# insert into test values (b'101');
INSERT 3431021 1
test=# insert into test values ('B101');
INSERT 3431022 1
test=# insert into test values ('b101');
INSERT 3431023 1
test=# select * from test; a
-----101101101101
(4 rows)

In fact, some of our apps actually _rely_ on the old 'b101' syntax...
Although these could be changed with some effort...

Chris



Re: bit type external representation

From
Thomas Lockhart
Date:
> > It looks like our current bit string type support looks for a "B" or "X"
> > embedded in the actual input string, rather than outside the quote as in
> > the standard.
> Postgres supports both:
> test=# insert into test values (B'101');
> test=# insert into test values ('B101');

Right. But internally the first example has the "B" stripped out, and
the bit string input routine assumes a binary bit string if there is no
embedded leading [bBxX]. However, if we were to just pass something
without a [xX] as an explicit prefix on the string then it will always
be interpreted as a binary bit string (remember that currently X'ABCD'
is converted to decimal, not to a bit string).

> In fact, some of our apps actually _rely_ on the old 'b101' syntax...
> Although these could be changed with some effort...

Yes, it seems that applications reading from something like libpq would
need to detect that this is a bit string and then figure out how to
represent it on input or output.

"Cheating" with a leading "B" or "X" helps with this a lot. Note that
simply a leading "B" is not sufficient to distinguish between a binary
value and some hex values, if we were to allow output in hex...
                    - Thomas


Re: bit type external representation

From
Joe Conway
Date:
Thomas Lockhart wrote:
> 1) the SQL standard says what hex values should be translated to in
> binary, which implies that all values may be *output* in binary format.
> Should we do this, or should we preserve the info on what units were
> used for input in the internal storage? Anyone interpret the standard
> differently from this??

SQL99, Section "5.3 <literal>":
11) The declared type of a <bit string literal> is fixed-length    bit string. The length of a <bit string literal> is
thenumber    of bits that it contains.
 
12) The declared type of a <hex string literal> is fixed-length bit    string. Each <hexit> appearing in the literal is
equivalentto    a quartet of bits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E,    and F are interpreted as 0000, 0001,
0010,0011, 0100, 0101,    0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, and 1111,    respectively. The <hexit>s
a,b, c, d, e, and f have respectively    the same values as the <hexit>s A, B, C, D, E, and F.
 

So the standard says both represent a fixed-length bit string data type. 
ISTM that we should not try to preserve any information on the units 
used for input, and that both should be output in the same way.

> 
> 2) we would need to be able to determine the format style when a string
> is output to be able to reconstruct the SQL shorthand representation (if
> preserving binary or hex is to be done). So a column or value should
> have a corresponding is_hex() function call. Any other suggestions??
> 

Based on above comment, I'd say no. We might want to be able to specify 
that the output format should be hex using a formatting function though. 
But I guess hex output format would have to be reserved for bit strings 
that are integer multiples of 4 bits in length.

Joe



Re: bit type external representation

From
Thomas Lockhart
Date:
> > 1) the SQL standard says what hex values should be translated to in
> > binary, which implies that all values may be *output* in binary format.
> So the standard says both represent a fixed-length bit string data type.
> ISTM that we should not try to preserve any information on the units
> used for input, and that both should be output in the same way.

OK, that's how I read it too. I'll work on making it behave with hex
input and not worry about the output, which takes care of itself.

I also notice that the following has trouble:
 thomas=# select bit '1010'; ERROR:  bit string length does not match type bit(1)

which is similar to the very recent problem with fixed-length character
strings. I've got patches to fix this one too.
                     - Thomas


Re: bit type external representation

From
Peter Eisentraut
Date:
Thomas Lockhart writes:

> SQL9x defines bit string constants with a format like
>
>   B'101010'
> and
>   X'ABCD'
>
> for binary and hexadecimal representations. But at the moment we don't
> explicitly handle both of these cases as bit strings; the hex version is
> converted to decimal in the scanner (*really* early in the parsing
> stage) and then handled as an integer.

The sole reason that this is still unresolved is that the SQL standard is
ambiguous about whether a literal of the form X'something' is of type bit
(<hex string literal>) or of type blob (<binary string literal>).  If I
had to choose one, I'd actually lean toward blob (or bytea in our case).
Two ideas:  1. make an "unknownhex" type and resolve it late, like the
"unknown" type.  2. allow an implicit cast from bytea to bit.

> It looks like our current bit string type support looks for a "B" or "X"
> embedded in the actual input string, rather than outside the quote as in
> the standard.

This was a stopgap measure before the B'' syntax was implemented.  I guess
it's grown to be a feature now. :-/

> 1) the SQL standard says what hex values should be translated to in
> binary, which implies that all values may be *output* in binary format.
> Should we do this, or should we preserve the info on what units were
> used for input in the internal storage? Anyone interpret the standard
> differently from this??

I believe you are caught in the confusion I was referring to above: hex
values are possibly not even of type bit at all.

-- 
Peter Eisentraut   peter_e@gmx.net