Thread: bit type external representation
SQL9x defines bit string constants with a format like B'101010' and X'ABCD' for binary and hexadecimal representations. But at the moment we don't explicitly handle both of these cases as bit strings; the hex version is converted to decimal in the scanner (*really* early in the parsing stage) and then handled as an integer. It looks like our current bit string type support looks for a "B" or "X" embedded in the actual input string, rather than outside the quote as in the standard. I'd like to have more support for the SQL9x syntax, which requires a little more invasive modification of at least the scanner and parser. I have a couple of questions: 1) the SQL standard says what hex values should be translated to in binary, which implies that all values may be *output* in binary format. Should we do this, or should we preserve the info on what units were used for input in the internal storage? Anyone interpret the standard differently from this?? 2) we would need to be able to determine the format style when a string is output to be able to reconstruct the SQL shorthand representation (if preserving binary or hex is to be done). So a column or value should have a corresponding is_hex() function call. Any other suggestions?? - Thomas
> for binary and hexadecimal representations. But at the moment we don't > explicitly handle both of these cases as bit strings; the hex version is > converted to decimal in the scanner (*really* early in the parsing > stage) and then handled as an integer. > It looks like our current bit string type support looks for a "B" or "X" > embedded in the actual input string, rather than outside the quote as in > the standard. I should point out that this is probably for historical reasons; I'd implemented the hex to decimal conversion way before we had bit string support, and we didn't consolidate those features when bit strings came along. - Thomas
> for binary and hexadecimal representations. But at the moment we don't > explicitly handle both of these cases as bit strings; the hex version is > converted to decimal in the scanner (*really* early in the parsing > stage) and then handled as an integer. > > It looks like our current bit string type support looks for a "B" or "X" > embedded in the actual input string, rather than outside the quote as in > the standard. Postgres supports both: test=# create table test (a bit(3)); CREATE test=# insert into test values (B'101'); INSERT 3431020 1 test=# insert into test values (b'101'); INSERT 3431021 1 test=# insert into test values ('B101'); INSERT 3431022 1 test=# insert into test values ('b101'); INSERT 3431023 1 test=# select * from test; a -----101101101101 (4 rows) In fact, some of our apps actually _rely_ on the old 'b101' syntax... Although these could be changed with some effort... Chris
> > It looks like our current bit string type support looks for a "B" or "X" > > embedded in the actual input string, rather than outside the quote as in > > the standard. > Postgres supports both: > test=# insert into test values (B'101'); > test=# insert into test values ('B101'); Right. But internally the first example has the "B" stripped out, and the bit string input routine assumes a binary bit string if there is no embedded leading [bBxX]. However, if we were to just pass something without a [xX] as an explicit prefix on the string then it will always be interpreted as a binary bit string (remember that currently X'ABCD' is converted to decimal, not to a bit string). > In fact, some of our apps actually _rely_ on the old 'b101' syntax... > Although these could be changed with some effort... Yes, it seems that applications reading from something like libpq would need to detect that this is a bit string and then figure out how to represent it on input or output. "Cheating" with a leading "B" or "X" helps with this a lot. Note that simply a leading "B" is not sufficient to distinguish between a binary value and some hex values, if we were to allow output in hex... - Thomas
Thomas Lockhart wrote: > 1) the SQL standard says what hex values should be translated to in > binary, which implies that all values may be *output* in binary format. > Should we do this, or should we preserve the info on what units were > used for input in the internal storage? Anyone interpret the standard > differently from this?? SQL99, Section "5.3 <literal>": 11) The declared type of a <bit string literal> is fixed-length bit string. The length of a <bit string literal> is thenumber of bits that it contains. 12) The declared type of a <hex string literal> is fixed-length bit string. Each <hexit> appearing in the literal is equivalentto a quartet of bits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F are interpreted as 0000, 0001, 0010,0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, and 1111, respectively. The <hexit>s a,b, c, d, e, and f have respectively the same values as the <hexit>s A, B, C, D, E, and F. So the standard says both represent a fixed-length bit string data type. ISTM that we should not try to preserve any information on the units used for input, and that both should be output in the same way. > > 2) we would need to be able to determine the format style when a string > is output to be able to reconstruct the SQL shorthand representation (if > preserving binary or hex is to be done). So a column or value should > have a corresponding is_hex() function call. Any other suggestions?? > Based on above comment, I'd say no. We might want to be able to specify that the output format should be hex using a formatting function though. But I guess hex output format would have to be reserved for bit strings that are integer multiples of 4 bits in length. Joe
> > 1) the SQL standard says what hex values should be translated to in > > binary, which implies that all values may be *output* in binary format. > So the standard says both represent a fixed-length bit string data type. > ISTM that we should not try to preserve any information on the units > used for input, and that both should be output in the same way. OK, that's how I read it too. I'll work on making it behave with hex input and not worry about the output, which takes care of itself. I also notice that the following has trouble: thomas=# select bit '1010'; ERROR: bit string length does not match type bit(1) which is similar to the very recent problem with fixed-length character strings. I've got patches to fix this one too. - Thomas
Thomas Lockhart writes: > SQL9x defines bit string constants with a format like > > B'101010' > and > X'ABCD' > > for binary and hexadecimal representations. But at the moment we don't > explicitly handle both of these cases as bit strings; the hex version is > converted to decimal in the scanner (*really* early in the parsing > stage) and then handled as an integer. The sole reason that this is still unresolved is that the SQL standard is ambiguous about whether a literal of the form X'something' is of type bit (<hex string literal>) or of type blob (<binary string literal>). If I had to choose one, I'd actually lean toward blob (or bytea in our case). Two ideas: 1. make an "unknownhex" type and resolve it late, like the "unknown" type. 2. allow an implicit cast from bytea to bit. > It looks like our current bit string type support looks for a "B" or "X" > embedded in the actual input string, rather than outside the quote as in > the standard. This was a stopgap measure before the B'' syntax was implemented. I guess it's grown to be a feature now. :-/ > 1) the SQL standard says what hex values should be translated to in > binary, which implies that all values may be *output* in binary format. > Should we do this, or should we preserve the info on what units were > used for input in the internal storage? Anyone interpret the standard > differently from this?? I believe you are caught in the confusion I was referring to above: hex values are possibly not even of type bit at all. -- Peter Eisentraut peter_e@gmx.net