Thread: BIT/BIT VARYING status
I have made a first cut at completing integration of Adriaan Joubert's BIT code into the backend. There are a couple little things left to do (for example, scalarltsel doesn't know what to do with BIT values) as well as some not-so-little things: 1. SQL92 mentions a bitwise position function, which we do not have. 2. We don't handle <bit string> and <hex string> literals correctly; the scanner converts them into integers which seems quite at variance with the spec's semantics. We could solve #2 fairly easily if we don't mind breaking backwards compatibility with existing apps that expect B'101' or X'5' to be equivalent to 5. I'm not sure how to handle it without breaking that compatibility. Thoughts? regards, tom lane
> We could solve #2 fairly easily if we don't mind breaking backwards > compatibility with existing apps that expect B'101' or X'5' to be > equivalent to 5. I'm not sure how to handle it without breaking that > compatibility. Thoughts? Break "compatibility". I implemented the syntax in the lexer so that we could deal with it somehow (rather than just dying); but we should always be willing to implement something the right way when we can. In this case (and probably many others coming up ;) there is no great glory in the original implementation... - Thomas
Tom Lane wrote: > > I have made a first cut at completing integration of Adriaan Joubert's > BIT code into the backend. There are a couple little things left to > do (for example, scalarltsel doesn't know what to do with BIT values) > as well as some not-so-little things: > > 1. SQL92 mentions a bitwise position function, which we do not have. Sorry, I have been very busy, so only got down to implementing a position function last night. It's a bit messy (lots of masks and bit-twiddling), but I feel fairly happy now that it is doing the right thing. I tested it with my own loadable types, as the integration into postgres proper stumped my somewhat. The next oid up for a bit function is in use already. Anyway, the patches are attached, and I'm hoping that some friendly sole will integrate the new position function into postgres proper. > 2. We don't handle <bit string> and <hex string> literals correctly; > the scanner converts them into integers which seems quite at variance > with the spec's semantics. This is still a problem that needs to be fixed. Also, it the parser did not seem to be too happy about the 'position' syntax, but I may have it wrong of course. I don;t know how to attach the position function to a piece of syntax such as (position <substr> in <field>) either, so I'm hoping that somebody can pick this up. Also, i have started putting together a file for regression testing. I noticed that the substring syntax does not seem to work: SELECT SUBSTRING(b FROM 2 FOR 4) FROM ZPBIT_TABLE; gives: ERROR: Function 'substr(bit, int4, int4)' does not exist Unable to identify a function that satisfies the given argument types You may need to add explicit typecasts and similar for a varying bit argument. If somebody with better knowledge of postgres could do the integration, please, I will finish off a regression test. Thanks! Adriaan*** src/backend/utils/adt/varbit.c.old Sun Oct 29 11:05:11 2000 --- src/backend/utils/adt/varbit.c Mon Oct 30 04:58:35 2000 *************** *** 1053,1060 **** /* Negative shift is a shift to the left */ if (shft < 0) PG_RETURN_DATUM(DirectFunctionCall2(bitshiftleft, ! VarBitPGetDatum(arg), ! Int32GetDatum(-shft))); result = (VarBit *) palloc(VARSIZE(arg)); VARATT_SIZEP(result) = VARSIZE(arg); --- 1053,1060 ---- /* Negative shift is a shift to the left */ if (shft < 0) PG_RETURN_DATUM(DirectFunctionCall2(bitshiftleft, ! VarBitPGetDatum(arg), ! Int32GetDatum(-shft))); result = (VarBit *) palloc(VARSIZE(arg)); VARATT_SIZEP(result)= VARSIZE(arg); *************** *** 1145,1148 **** --- 1145,1242 ---- result >>= VARBITPAD(arg); PG_RETURN_INT32(result); + } + + /* Determines the position of S1 in the bitstring S2 (1-based string). + * If S1 does not appear in S2 this function returns 0. + * If S1 is of length 0 this function returns 1. + */ + Datum + bitposition(PG_FUNCTION_ARGS) + { + VarBit *substr = PG_GETARG_VARBIT_P(0); + VarBit *arg = PG_GETARG_VARBIT_P(1); + int substr_length, + arg_length, + i, + is; + bits8 *s, /* pointer into substring */ + *p; /* pointer into arg */ + bits8 cmp, /* shifted substring byte to compare */ + mask1, /* mask for substring byte shifted right */ + mask2, /* mask for substring byte shifted left */ + end_mask, /* pad mask for last substring byte */ + arg_mask; /* pad mask for last argument byte */ + bool is_match; + + /* Get the substring length */ + substr_length = VARBITLEN(substr); + arg_length = VARBITLEN(arg); + + /* Argument has 0 length or substring longer than argument, return 0 */ + if (arg_length == 0 || substr_length > arg_length) + PG_RETURN_INT32(0); + + /* 0-length means return 1 */ + if (substr_length == 0) + PG_RETURN_INT32(1); + + /* Initialise the padding masks */ + end_mask = BITMASK << VARBITPAD(substr); + arg_mask = BITMASK << VARBITPAD(arg); + for (i = 0; i < VARBITBYTES(arg) - VARBITBYTES(substr) + 1; i++) + { + for (is = 0; is < BITS_PER_BYTE; is++) { + is_match = true; + p = VARBITS(arg) + i; + mask1 = BITMASK >> is; + mask2 = ~mask1; + for (s = VARBITS(substr); + is_match && s < VARBITEND(substr); s++) + { + cmp = *s >> is; + if (s == VARBITEND(substr) - 1) + { + mask1 &= end_mask >> is; + if (p == VARBITEND(arg) - 1) { + /* Check that there is enough of arg left */ + if (mask1 & ~arg_mask) { + is_match = false; + break; + } + mask1 &= arg_mask; + } + } + is_match = ((cmp ^ *p) & mask1) == 0; + if (!is_match) + break; + // Move on to the next byte + p++; + if (p == VARBITEND(arg)) { + mask2 = end_mask << (BITS_PER_BYTE - is); + is_match = mask2 == 0; + elog(NOTICE,"S. %d %d em=%2x sm=%2x r=%d", + i,is,end_mask,mask2,is_match); + break; + } + cmp = *s << (BITS_PER_BYTE - is); + if (s == VARBITEND(substr) - 1) + { + mask2 &= end_mask << (BITS_PER_BYTE - is); + if (p == VARBITEND(arg) - 1) { + if (mask2 & ~arg_mask) { + is_match = false; + break; + } + mask2 &= arg_mask; + } + } + is_match = ((cmp ^ *p) & mask2) == 0; + } + /* Have we found a match */ + if (is_match) + PG_RETURN_INT32(i*BITS_PER_BYTE + is + 1); + } + } + PG_RETURN_INT32(0); } *** src/include/utils/varbit.h.old Sun Oct 29 11:04:58 2000 --- src/include/utils/varbit.h Sun Oct 29 11:05:58 2000 *************** *** 87,91 **** --- 87,92 ---- extern Datum bitoctetlength(PG_FUNCTION_ARGS); extern Datum bitfromint4(PG_FUNCTION_ARGS); extern Datum bittoint4(PG_FUNCTION_ARGS); + extern Datum bitposition(PG_FUNCTION_ARGS); #endif
Adriaan Joubert writes: > > 2. We don't handle <bit string> and <hex string> literals correctly; > > the scanner converts them into integers which seems quite at variance > > with the spec's semantics. > > This is still a problem that needs to be fixed. I have gotten the B'1001'-style syntax to work, but the zpbit_in function rejects the input. You need to change the *_in functions to accept input in the form of a string of only 1's and 0's. Also, the output functions should print 1's and 0's. I'm somewhat confused about the <hex string>s; according to the standard they might also be a BLOB literal. I'd say we get the binary version working first, and then wonder about this. > Also, it the parser did not seem to be too happy about the 'position' > syntax, The parser converted 'position(a in b)' into 'strpos(b, a)'. I changed it so it converts it into 'position(b, a)' and aliased the other functions appropriately. I changed the order of your arguments for that. > I noticed that the substring syntax does not seem to work: Similar issue as above. Should work now. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Peter Eisentraut wrote: > > Adriaan Joubert writes: > > > > 2. We don't handle <bit string> and <hex string> literals correctly; > > > the scanner converts them into integers which seems quite at variance > > > with the spec's semantics. > > > > This is still a problem that needs to be fixed. > > I have gotten the B'1001'-style syntax to work, but the zpbit_in function > rejects the input. You need to change the *_in functions to accept input > in the form of a string of only 1's and 0's. Also, the output functions > should print 1's and 0's. > > I'm somewhat confused about the <hex string>s; according to the standard > they might also be a BLOB literal. I'd say we get the binary version > working first, and then wonder about this. Peter, I think it is a problem if the B or X are dropped from the input, as that is the only way to determine whether it is a binary or hex string. Isn't it possible to just remove the quotes, or even do nothing? The current code expects a string of the form Bxxxxx or Xyyyyy. If the quotes are left in, I can easily modify the code, but guessing whether the string 1001 is hex or binary is an issue, and I seem to recall that the SQL standard requires both to be valid input. Also, on output, shouldn't we poduce B'xxxx' and X'yyyyy' to conform with the input strings? Adriaan
Adriaan Joubert writes: > Peter, I think it is a problem if the B or X are dropped from the input, > as that is the only way to determine whether it is a binary or hex > string. Well, you just assume it's a binary string, because it's unclear as of yet whether you're going to get to handle hex strings at all. However, I changed the scanner to include a leading 'b', so now it works: peter=# select B'1001';?column? ----------X9 (1 row) peter=# select B'1001' | b'11';?column? ----------XC (1 row) The output definitely ought to be in binary though ("b1001"). You also might want to make the leading 'b' optional because this seems confusing: peter=# select cast ('1001' as bit); ERROR: zpbit_in: 1001 is not a valid bitstring > Also, on output, shouldn't we poduce B'xxxx' and X'yyyyy' to conform > with the input strings? If you did that, then your input function has to be prepared for values like "B'1001'". (Think copy out/copy in.) I think the above plan should work okay. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Thanks Peter. I will download tomorrow when the new snapshot is available. So how do we find out whether hex needs to be supported? I see what you mean with ('1001' as bit), but shouldn't that be (B'1001' as bit)? Certainly if hex values are allowed the first version is ambiguous. I would have to make the error message a bit more sensible though. Adriaan > > > Peter, I think it is a problem if the B or X are dropped from the input, > > as that is the only way to determine whether it is a binary or hex > > string. > > Well, you just assume it's a binary string, because it's unclear as of yet > whether you're going to get to handle hex strings at all. However, I > changed the scanner to include a leading 'b', so now it works: > > peter=# select B'1001'; > ?column? > ---------- > X9 > (1 row) > > peter=# select B'1001' | b'11'; > ?column? > ---------- > XC > (1 row) > > The output definitely ought to be in binary though ("b1001"). > > You also might want to make the leading 'b' optional because this seems > confusing: > > peter=# select cast ('1001' as bit); > ERROR: zpbit_in: 1001 is not a valid bitstring > > > Also, on output, shouldn't we poduce B'xxxx' and X'yyyyy' to conform > > with the input strings? > > If you did that, then your input function has to be prepared for values > like "B'1001'". (Think copy out/copy in.) I think the above plan should > work okay. > > -- > Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Can someone tell me if this patch should be applied? Seems like it was just for testing, right? > Tom Lane wrote: > > > > I have made a first cut at completing integration of Adriaan Joubert's > > BIT code into the backend. There are a couple little things left to > > do (for example, scalarltsel doesn't know what to do with BIT values) > > as well as some not-so-little things: > > > > 1. SQL92 mentions a bitwise position function, which we do not have. > > Sorry, I have been very busy, so only got down to implementing a > position function last night. It's a bit messy (lots of masks and > bit-twiddling), but I feel fairly happy now that it is doing the right > thing. I tested it with my own loadable types, as the integration into > postgres proper stumped my somewhat. The next oid up for a bit function > is in use already. Anyway, the patches are attached, and I'm hoping that > some friendly sole will integrate the new position function into > postgres proper. > > > 2. We don't handle <bit string> and <hex string> literals correctly; > > the scanner converts them into integers which seems quite at variance > > with the spec's semantics. > > This is still a problem that needs to be fixed. Also, it the parser did > not seem to be too happy about the 'position' syntax, but I may have it > wrong of course. I don;t know how to attach the position function to a > piece of syntax such as (position <substr> in <field>) either, so I'm > hoping that somebody can pick this up. > > Also, i have started putting together a file for regression testing. I > noticed that the substring syntax does not seem to work: > > SELECT SUBSTRING(b FROM 2 FOR 4) > FROM ZPBIT_TABLE; > > gives: > > ERROR: Function 'substr(bit, int4, int4)' does not exist > Unable to identify a function that satisfies the given argument > types > You may need to add explicit typecasts > > and similar for a varying bit argument. > > If somebody with better knowledge of postgres could do the integration, > please, I will finish off a regression test. > > Thanks! > > Adriaan > *** src/backend/utils/adt/varbit.c.old Sun Oct 29 11:05:11 2000 > --- src/backend/utils/adt/varbit.c Mon Oct 30 04:58:35 2000 > *************** > *** 1053,1060 **** > /* Negative shift is a shift to the left */ > if (shft < 0) > PG_RETURN_DATUM(DirectFunctionCall2(bitshiftleft, > ! VarBitPGetDatum(arg), > ! Int32GetDatum(-shft))); > > result = (VarBit *) palloc(VARSIZE(arg)); > VARATT_SIZEP(result) = VARSIZE(arg); > --- 1053,1060 ---- > /* Negative shift is a shift to the left */ > if (shft < 0) > PG_RETURN_DATUM(DirectFunctionCall2(bitshiftleft, > ! VarBitPGetDatum(arg), > ! Int32GetDatum(-shft))); > > result = (VarBit *) palloc(VARSIZE(arg)); > VARATT_SIZEP(result) = VARSIZE(arg); > *************** > *** 1145,1148 **** > --- 1145,1242 ---- > result >>= VARBITPAD(arg); > > PG_RETURN_INT32(result); > + } > + > + /* Determines the position of S1 in the bitstring S2 (1-based string). > + * If S1 does not appear in S2 this function returns 0. > + * If S1 is of length 0 this function returns 1. > + */ > + Datum > + bitposition(PG_FUNCTION_ARGS) > + { > + VarBit *substr = PG_GETARG_VARBIT_P(0); > + VarBit *arg = PG_GETARG_VARBIT_P(1); > + int substr_length, > + arg_length, > + i, > + is; > + bits8 *s, /* pointer into substring */ > + *p; /* pointer into arg */ > + bits8 cmp, /* shifted substring byte to compare */ > + mask1, /* mask for substring byte shifted right */ > + mask2, /* mask for substring byte shifted left */ > + end_mask, /* pad mask for last substring byte */ > + arg_mask; /* pad mask for last argument byte */ > + bool is_match; > + > + /* Get the substring length */ > + substr_length = VARBITLEN(substr); > + arg_length = VARBITLEN(arg); > + > + /* Argument has 0 length or substring longer than argument, return 0 */ > + if (arg_length == 0 || substr_length > arg_length) > + PG_RETURN_INT32(0); > + > + /* 0-length means return 1 */ > + if (substr_length == 0) > + PG_RETURN_INT32(1); > + > + /* Initialise the padding masks */ > + end_mask = BITMASK << VARBITPAD(substr); > + arg_mask = BITMASK << VARBITPAD(arg); > + for (i = 0; i < VARBITBYTES(arg) - VARBITBYTES(substr) + 1; i++) > + { > + for (is = 0; is < BITS_PER_BYTE; is++) { > + is_match = true; > + p = VARBITS(arg) + i; > + mask1 = BITMASK >> is; > + mask2 = ~mask1; > + for (s = VARBITS(substr); > + is_match && s < VARBITEND(substr); s++) > + { > + cmp = *s >> is; > + if (s == VARBITEND(substr) - 1) > + { > + mask1 &= end_mask >> is; > + if (p == VARBITEND(arg) - 1) { > + /* Check that there is enough of arg left */ > + if (mask1 & ~arg_mask) { > + is_match = false; > + break; > + } > + mask1 &= arg_mask; > + } > + } > + is_match = ((cmp ^ *p) & mask1) == 0; > + if (!is_match) > + break; > + // Move on to the next byte > + p++; > + if (p == VARBITEND(arg)) { > + mask2 = end_mask << (BITS_PER_BYTE - is); > + is_match = mask2 == 0; > + elog(NOTICE,"S. %d %d em=%2x sm=%2x r=%d", > + i,is,end_mask,mask2,is_match); > + break; > + } > + cmp = *s << (BITS_PER_BYTE - is); > + if (s == VARBITEND(substr) - 1) > + { > + mask2 &= end_mask << (BITS_PER_BYTE - is); > + if (p == VARBITEND(arg) - 1) { > + if (mask2 & ~arg_mask) { > + is_match = false; > + break; > + } > + mask2 &= arg_mask; > + } > + } > + is_match = ((cmp ^ *p) & mask2) == 0; > + } > + /* Have we found a match */ > + if (is_match) > + PG_RETURN_INT32(i*BITS_PER_BYTE + is + 1); > + } > + } > + PG_RETURN_INT32(0); > } > *** src/include/utils/varbit.h.old Sun Oct 29 11:04:58 2000 > --- src/include/utils/varbit.h Sun Oct 29 11:05:58 2000 > *************** > *** 87,91 **** > --- 87,92 ---- > extern Datum bitoctetlength(PG_FUNCTION_ARGS); > extern Datum bitfromint4(PG_FUNCTION_ARGS); > extern Datum bittoint4(PG_FUNCTION_ARGS); > + extern Datum bitposition(PG_FUNCTION_ARGS); > > #endif -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Peter, I've looked at the current implementation of the bit types and still have some doubts concerning the following issues: 1. Constants. The current behaviour just seems somewhat strange, and I have no idea where to fix it. test=# select B'1001'; ?column? ---------- X9 (1 row) test=# select B'1001'::bit; ERROR: Cannot cast this expression to type 'bit' test=# select B'1001'::varbit; ERROR: Cannot cast this expression to type 'varbit' test=# select 'B1001'::varbit; ?column? ---------- B1001 (1 row) test=# select 'B1001'::bit; ?column? ---------- X9 (1 row) test=# select X'1001'::varbit; ERROR: varbit_in: The bit string 4097 must start with B or X test=# select 'X1001'::varbit; ?column? ------------------- B0001000000000001 (1 row) test=# select 'X1001'::bit; ?column? ---------- X1001 (1 row) test=# select X'1001'::bit; ERROR: zpbit_in: The bit string 4097 must start with B or X Also, I have two output routines, that have been renames to zpbit_out and varbit_out. In fact, both will work just fine for bot bit and varbit, but the first prints as hex and the second as a bit string. Printing as hex is more compact, so good for long strings, but printing as a bit string is much more intuitive. One solution would be to make them both print to a bit string by default and define a function to generate a hex string. Another would be to have this under control of a variable. Most people who contacted me about bit strings seemed to want to use them for flags, so I guess the default should be to print them as a bit string. More for my information, if a user does not know about varbit, how does he cast to bit varying? 2. This is not a problem, more a question. There is no default way to compare bit to varbit, as in test=# select 'b10'::bit='b10'::varbit; ERROR: Unable to identify an operator '=' for types 'bit' and 'varbit' You will have to retype this query using an explicit cast This may be a good thing, as the comparison does depend on the lenght of the bit strings. 3. The ^ operator seems to attempt to coerce the arguments to float8? select 'B110011'::bit ^ 'B011101'::bit; ERROR: Function 'float8(bit)' does not exist Unable to identify a function that satisfies the given argument types You may need to add explicit typecasts 4. This is a policy question. When I use the bit shift operator, this always shifts within the current string only. So if I do select ('B010'::bit(6) >> 2)::varbit; ?column? ----------- B000100 I get what I would expect. But if I have a bit varying(6) field (in a table, this is just an example), I only get select ('B010'::varbit >> 2)::varbit; ?column? ----------- B000 which I find counter-intuitive. I have thus added 'zpshiftright' and 'varbitshiftright' functions. The second extends the bitstring to the right, while the first is the old bitshiftright function. I find this more intuitive at least. Question is what a shift left function should do? Should I shorten the string in the case of a shift left, to keep it symmetrical to shift right? This seems a pure policy decision, as there are arguments for both behaviours, although I am a great fan of symmetry. Let me know and I can implement a separate function. I have made a start on a file for regression tests, which I append with the diffs for the varbit files. Please let me know what else is needed and where I can help. Thanks! Adriaan-- -- BIT types -- -- -- Build tables for testing -- CREATE TABLE ZPBIT_TABLE(b BIT(11)); INSERT INTO ZPBIT_TABLE VALUES ('B'); INSERT INTO ZPBIT_TABLE VALUES ('B0'); INSERT INTO ZPBIT_TABLE VALUES ('B010101'); INSERT INTO ZPBIT_TABLE VALUES ('B01010101010'); INSERT INTO ZPBIT_TABLE VALUES ('B010101010101'); INSERT INTO ZPBIT_TABLE VALUES ('X554'); INSERT INTO ZPBIT_TABLE VALUES ('X555'); SELECT * FROM ZPBIT_TABLE; CREATE TABLE VARBIT_TABLE(v BIT VARYING(11)); INSERT INTO VARBIT_TABLE VALUES ('B'); INSERT INTO VARBIT_TABLE VALUES ('B0'); INSERT INTO VARBIT_TABLE VALUES ('B010101'); INSERT INTO VARBIT_TABLE VALUES ('B01010101010'); INSERT INTO VARBIT_TABLE VALUES ('B010101010101'); INSERT INTO VARBIT_TABLE VALUES ('X554'); INSERT INTO VARBIT_TABLE VALUES ('X555'); SELECT * FROM VARBIT_TABLE; -- Delete from tables DROP TABLE ZPBIT_TABLE; CREATE TABLE ZPBIT_TABLE(b BIT(16)); INSERT INTO ZPBIT_TABLE VALUES ('B11011'); INSERT INTO ZPBIT_TABLE SELECT b>>1 FROM ZPBIT_TABLE; INSERT INTO ZPBIT_TABLE SELECT b>>2 FROM ZPBIT_TABLE; INSERT INTO ZPBIT_TABLE SELECT b>>4 FROM ZPBIT_TABLE; INSERT INTO ZPBIT_TABLE SELECT b>>8 FROM ZPBIT_TABLE; SELECT POSITION('B1101'::bit IN b) as pos, POSITION('B11011'::bit IN b) as pos, b FROM ZPBIT_TABLE ; DROP TABLE VARBIT_TABLE; CREATE TABLE VARBIT_TABLE(v BIT VARYING(19)); INSERT INTO VARBIT_TABLE VALUES ('B11011'); INSERT INTO VARBIT_TABLE SELECT v>>1 FROM VARBIT_TABLE; INSERT INTO VARBIT_TABLE SELECT v>>2 FROM VARBIT_TABLE; INSERT INTO VARBIT_TABLE SELECT v>>4 FROM VARBIT_TABLE; INSERT INTO VARBIT_TABLE SELECT v>>8 FROM VARBIT_TABLE; SELECT POSITION('B1101'::bit IN v) as pos, POSITION('B11011'::bit IN v) as pos, v FROM VARBIT_TABLE ; -- Concatenation SELECT v, b::varbit AS B, (v || b)::varbit AS C FROM ZPBIT_TABLE, VARBIT_TABLE WHERE v::bit(16)=b ORDER BY C; -- Length SELECT b, length(b) AS lb FROM ZPBIT_TABLE; SELECT v, length(v) AS lv FROM VARBIT_TABLE; -- Substring SELECT b::varbit, SUBSTRING(b FROM 2 FOR 4)::varbit AS sub1, SUBSTRING(b FROM 7 FOR 13)::varbit AS sub2, SUBSTRING(b FROM 6)::varbit AS sub3 FROM ZPBIT_TABLE; SELECT v, SUBSTRING(v FROM 2 FOR 4)::varbit AS sub1, SUBSTRING(v FROM 7 FOR 13)::varbit AS sub2, SUBSTRING(v FROM 6)::varbit AS sub3 FROM VARBIT_TABLE; -- Drop the tables DROP TABLE ZPBIT_TABLE; DROP TABLE VARBIT_TABLE; --- Bit operations CREATE TABLE varbit_table (a BIT VARYING(16), b BIT VARYING(16)); COPY varbit_table FROM stdin; X0F X10 X1F X11 X2F X12 X3F X13 X8F X04 X000F X0010 X0123 XFFFF X2468 X2468 XFA50 X05AF X1234 XFFF5 \. SELECT a,b,~a AS "~ a",a & b AS "a & b", a|b AS "a | b", a^b AS "a ^ b" FROM varbit_table; SELECT a,b,a<b AS "a<b",a<=b AS "a<=b",a=b AS "a=b", a>=b AS "a>=b",a>b AS "a>b",a<=>b AS "a<=>b" FROM varbit_table; SELECT a,a<<4 AS "a<<4",b,b>>2 AS "b>>2" FROM varbit_table; DROP TABLE varbit_table; --- Bit operations CREATE TABLE zpbit (a BIT(16), b BIT(16)); COPY zpbit FROM stdin; X0F X10 X1F X11 X2F X12 X3F X13 X8F X04 X000F X0010 X0123 XFFFF X2468 X2468 XFA50 X05AF X1234 XFFF5 \. SELECT a,b,~a AS "~ a",a & b AS "a & b", a|b AS "a | b", a^b AS "a ^ b" FROM zpbit; SELECT a,b,a<b AS "a<b",a<=b AS "a<=b",a=b AS "a=b", a>=b AS "a>=b",a>b AS "a>b",a<>b AS "a<>b" FROM zpbit; SELECT a,a<<4 AS "a<<4",b,b>>2 AS "b>>2" FROM zpbit; DROP TABLE zpbit; -- The following should fail select 'X123'::bit & 'X12'::bit; select 'B0111'::bit | 'B011'::bit; select 'X023'::bit ^ 'B011101'::bit; -- More position tests, checking all the boundary cases SELECT POSITION('B1010'::bit IN 'B0000101'::bit); -- 0 SELECT POSITION('B1010'::bit IN 'B00001010'::bit); -- 5 SELECT POSITION('B1010'::bit IN 'B00000101'::bit); -- 0 SELECT POSITION('B1010'::bit IN 'B000001010'::bit); -- 6 SELECT POSITION('B'::bit IN 'B00001010'::bit); -- 1 SELECT POSITION('B0'::bit IN 'B'::bit); -- 0 SELECT POSITION('B'::bit IN 'B'::bit); -- 0 SELECT POSITION('B101101'::bit IN 'B001011011011011000'::bit); -- 3 SELECT POSITION('B10110110'::bit IN 'B001011011011010'::bit); -- 3 SELECT POSITION('B1011011011011'::bit IN 'B001011011011011'::bit); -- 3 SELECT POSITION('B1011011011011'::bit IN 'B00001011011011011'::bit); -- 5 SELECT POSITION('B11101011'::bit IN 'B11101011'::bit); -- 1 SELECT POSITION('B11101011'::bit IN 'B011101011'::bit); -- 2 SELECT POSITION('B11101011'::bit IN 'B00011101011'::bit); -- 4 SELECT POSITION('B11101011'::bit IN 'B0000011101011'::bit); -- 6 SELECT POSITION('B111010110'::bit IN 'B111010110'::bit); -- 1 SELECT POSITION('B111010110'::bit IN 'B0111010110'::bit); -- 2 SELECT POSITION('B111010110'::bit IN 'B000111010110'::bit); -- 4 SELECT POSITION('B111010110'::bit IN 'B00000111010110'::bit); -- 6 SELECT POSITION('B111010110'::bit IN 'B11101011'::bit); -- 0 SELECT POSITION('B111010110'::bit IN 'B011101011'::bit); -- 0 SELECT POSITION('B111010110'::bit IN 'B00011101011'::bit); -- 0 SELECT POSITION('B111010110'::bit IN 'B0000011101011'::bit); -- 0 SELECT POSITION('B111010110'::bit IN 'B111010110'::bit); -- 1 SELECT POSITION('B111010110'::bit IN 'B0111010110'::bit); -- 2 SELECT POSITION('B111010110'::bit IN 'B000111010110'::bit); -- 4 SELECT POSITION('B111010110'::bit IN 'B00000111010110'::bit); -- 6 SELECT POSITION('B111010110'::bit IN 'B000001110101111101011'::bit); -- 0 SELECT POSITION('B111010110'::bit IN 'B0000001110101111101011'::bit); -- 0 SELECT POSITION('B111010110'::bit IN 'B000000001110101111101011'::bit); -- 0 SELECT POSITION('B111010110'::bit IN 'B00000000001110101111101011'::bit); -- 0 SELECT POSITION('B111010110'::bit IN 'B0000011101011111010110'::bit); -- 14 SELECT POSITION('B111010110'::bit IN 'B00000011101011111010110'::bit); -- 15 SELECT POSITION('B111010110'::bit IN 'B0000000011101011111010110'::bit); -- 17 SELECT POSITION('B111010110'::bit IN 'B000000000011101011111010110'::bit); -- 19 SELECT POSITION('B000000000011101011111010110'::bit IN 'B000000000011101011111010110'::bit); -- 1 SELECT POSITION('B00000000011101011111010110'::bit IN 'B000000000011101011111010110'::bit); -- 2 SELECT POSITION('B0000000000011101011111010110'::bit IN 'B000000000011101011111010110'::bit); -- 0 *** src/backend/utils/adt/varbit.c.old Sun Nov 5 12:03:04 2000 --- src/backend/utils/adt/varbit.c Sun Nov 5 19:12:10 2000 *************** *** 73,79 **** bit_not_hex = false; else { ! elog(ERROR, "zpbit_in: %s is not a valid bitstring", s); bit_not_hex = false; /* keep compiler quiet */ } --- 73,79 ---- bit_not_hex = false; else { ! elog(ERROR, "zpbit_in: The bit string %s must start with B or X", s); bit_not_hex = false; /* keep compiler quiet */ } *************** *** 299,305 **** bit_not_hex = false; else { ! elog(ERROR, "varbit_in: %s is not a valid bitstring", s); bit_not_hex = false; /* keep compiler quiet */ } --- 299,305 ---- bit_not_hex = false; else { ! elog(ERROR, "varbit_in: The bit string %s must start with B or X", s); bit_not_hex = false; /* keep compiler quiet */ } *************** *** 744,749 **** --- 744,752 ---- *ps; bitlen = VARBITLEN(arg); + /* Do we have an upper bound? */ + if (l==-1) + l = bitlen; e = s + l; s1 = Max(s, 1); e1 = Min(e, bitlen + 1); *************** *** 1039,1045 **** * do a right shift (i.e. towards the end of the string) */ Datum ! bitshiftright(PG_FUNCTION_ARGS) { VarBit *arg = PG_GETARG_VARBIT_P(0); int32 shft = PG_GETARG_INT32(1); --- 1042,1048 ---- * do a right shift (i.e. towards the end of the string) */ Datum ! zpbitshiftright(PG_FUNCTION_ARGS) { VarBit *arg = PG_GETARG_VARBIT_P(0); int32 shft = PG_GETARG_INT32(1); *************** *** 1097,1102 **** --- 1100,1166 ---- PG_RETURN_VARBIT_P(result); } + /* bitshiftright + * do a right shift (i.e. towards the end of the string) + */ + Datum + /*varbitshiftright(PG_FUNCTION_ARGS)*/ + bitshiftright(PG_FUNCTION_ARGS) + { + VarBit *arg = PG_GETARG_VARBIT_P(0); + int32 shft = PG_GETARG_INT32(1); + VarBit *result; + int byte_shift, + byte_len, + ishift, + len; + bits8 *p, + *r; + + /* Negative shift is a shift to the left */ + if (shft < 0) + PG_RETURN_DATUM(DirectFunctionCall2(bitshiftleft, + VarBitPGetDatum(arg), + Int32GetDatum(-shft))); + + /* When we have a varying bit string, the string may get longer */ + len = VARBITLEN(arg) + shft; + byte_len = VARBITTOTALLEN(len); + + result = (VarBit *) palloc(byte_len); + VARATT_SIZEP(result) = byte_len; + VARBITLEN(result) = len; + r = VARBITS(result); + + byte_shift = shft / BITS_PER_BYTE; + ishift = shft % BITS_PER_BYTE; + p = VARBITS(arg); + + /* Set the first part of the result to 0 */ + memset(r, 0, byte_shift); + r += byte_shift; + + if (ishift == 0) + { + /* Special case: we can do a memcpy */ + len = VARBITBYTES(arg); + memcpy(r, p, len); + } + else + { + if (r < VARBITEND(result)) + *r = 0; /* initialize first byte */ + for (; r < VARBITEND(result); p++) + { + *r |= *p >> ishift; + if ((++r) < VARBITEND(result)) + *r = (*p << (BITS_PER_BYTE - ishift)) & BITMASK; + } + } + + PG_RETURN_VARBIT_P(result); + } + /* This is not defined in any standard. We retain the natural ordering of * bits here, as it just seems more intuitive. */ *************** *** 1216,1224 **** p++; if (p == VARBITEND(arg)) { mask2 = end_mask << (BITS_PER_BYTE - is); ! is_match = mask2 == 0; ! elog(NOTICE,"S. %d %d em=%2x sm=%2x r=%d", ! i,is,end_mask,mask2,is_match); break; } cmp = *s << (BITS_PER_BYTE - is); --- 1280,1286 ---- p++; if (p == VARBITEND(arg)) { mask2 = end_mask << (BITS_PER_BYTE - is); ! is_match = (mask2 == 0); break; } cmp = *s << (BITS_PER_BYTE - is); *** src/include/utils/varbit.h.old Sun Nov 5 20:20:40 2000 --- src/include/utils/varbit.h Sun Nov 5 19:04:16 2000 *************** *** 80,86 **** --- 80,88 ---- extern Datum bitxor(PG_FUNCTION_ARGS); extern Datum bitnot(PG_FUNCTION_ARGS); extern Datum bitshiftleft(PG_FUNCTION_ARGS); + extern Datum zpbitshiftright(PG_FUNCTION_ARGS); extern Datum bitshiftright(PG_FUNCTION_ARGS); + /*extern Datum varbitshiftright(PG_FUNCTION_ARGS);*/ extern Datum bitcat(PG_FUNCTION_ARGS); extern Datum bitsubstr(PG_FUNCTION_ARGS); extern Datum bitlength(PG_FUNCTION_ARGS);
Adriaan Joubert writes: > 1. Constants. The current behaviour just seems somewhat strange, and I > have no idea where to fix it. > > test=# select B'1001'; > ?column? > ---------- > X9 > (1 row) That's because the zpbit output function chooses to represent values that way. Whether or not it's good to do that is another question, but it's legal as it stands. Then again, I do think that a binary output format would be preferred. > test=# select B'1001'::bit; > ERROR: Cannot cast this expression to type 'bit' > test=# select B'1001'::varbit; > ERROR: Cannot cast this expression to type 'varbit' I notice now that casting of constants is handled separately, but it'll get fixed. > test=# select X'1001'::varbit; > ERROR: varbit_in: The bit string 4097 must start with B or X That's because X'1001'-style constants get converted to integer currently. > Also, I have two output routines, that have been renames to zpbit_out > and varbit_out. In fact, both will work just fine for bot bit and > varbit, but the first prints as hex and the second as a bit string. That's obviously not good. > More for my information, if a user does not know about varbit, how does > he cast to bit varying? CAST(value to BIT VARYING) > test=# select 'b10'::bit='b10'::varbit; > ERROR: Unable to identify an operator '=' for types 'bit' and 'varbit' > You will have to retype this query using an explicit cast Ouch. I thought these types where binary equivalent, but evidently that doesn't help here. > 3. The ^ operator seems to attempt to coerce the arguments to float8? The ^ operator is exponentiation. We changed the xor operator to '#', because we now also have bit-wise operators on integers, and making 'int ^ int' to do xor would have been very confusing. > 4. This is a policy question. When I use the bit shift operator, this > always shifts within the current string only. So if I do > > select ('B010'::bit(6) >> 2)::varbit; > ?column? > ----------- > B000100 > > I get what I would expect. Really? I would expect 'b010'::bit(6) to be B'000010', thus shifting two to the right gives 0. > I have made a start on a file for regression tests, which I append with > the diffs for the varbit files. Please let me know what else is needed > and where I can help. A few notes here: Do not use 'B10001' as bit input, use B'10001'. The former is a text constant (sort of); it's only through the goodness of heart that the system casts anything to just about anything else if there's a way to get there, but I personally hope that we can stop it from doing that sometime. The latter on the other hand is guaranteed to be a bit string only (zpbit, to be precise). That also means you do not have to put casts everwhere. You'd only need a cast to varbit, but in that case write it as CAST(B'1001' AS BIT VARYING(x)). Better yet, also be sure to have a few cases where bit and bit varying are read from a table; that way you're really sure what type you're dealing with. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Adriaan Joubert writes: > 1. Constants. The current behaviour just seems somewhat strange, and I > have no idea where to fix it. > > test=# select B'1001'; > ?column? > ---------- > X9 > (1 row) Fixed. (Prints '1001'.) > test=# select B'1001'::bit; > ERROR: Cannot cast this expression to type 'bit' > test=# select B'1001'::varbit; > ERROR: Cannot cast this expression to type 'varbit' Works now. > test=# select X'1001'::varbit; > ERROR: varbit_in: The bit string 4097 must start with B or X Not sure what we'll do with this. X'1001' is currently an integer. According to SQL it may be a bit string or a binary object, but I don't have a clue how they decide it. Maybe we should look at other implementations. > Also, I have two output routines, that have been renames to zpbit_out > and varbit_out. In fact, both will work just fine for bot bit and > varbit, but the first prints as hex and the second as a bit string. Both print binary now. > Printing as hex is more compact, so good for long strings, but printing > as a bit string is much more intuitive. One solution would be to make > them both print to a bit string by default and define a function to > generate a hex string. Sounds okay. > test=# select 'b10'::bit='b10'::varbit; > ERROR: Unable to identify an operator '=' for types 'bit' and 'varbit' > You will have to retype this query using an explicit cast Works now. > select ('B010'::varbit >> 2)::varbit; > ?column? > ----------- > B000 > Question is what a shift left function should do? I'd say that having shift left and shift right be symmetrical (although they're obviously not strict inverses) is definitely important. I could live both with shortening the string on shift left or with keeping the string length fixed in shift right (as above). Your call. (I haven't installed your patch for this yet. Please submit one that implements whatever you think it should do completely both ways.) > I have made a start on a file for regression tests, I've put a modified version at http://www.postgresql.org/~petere/bit.sql (and bit.out) for you to work with. The substring tests are commented out because they cause the backend to crash. bitsubstr() needs to be modified to handle -1 as its third argument, meaning "the rest of the string". -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Are there any open items related to the BIT type? > Tom Lane wrote: > > > > I have made a first cut at completing integration of Adriaan Joubert's > > BIT code into the backend. There are a couple little things left to > > do (for example, scalarltsel doesn't know what to do with BIT values) > > as well as some not-so-little things: > > > > 1. SQL92 mentions a bitwise position function, which we do not have. > > Sorry, I have been very busy, so only got down to implementing a > position function last night. It's a bit messy (lots of masks and > bit-twiddling), but I feel fairly happy now that it is doing the right > thing. I tested it with my own loadable types, as the integration into > postgres proper stumped my somewhat. The next oid up for a bit function > is in use already. Anyway, the patches are attached, and I'm hoping that > some friendly sole will integrate the new position function into > postgres proper. > > > 2. We don't handle <bit string> and <hex string> literals correctly; > > the scanner converts them into integers which seems quite at variance > > with the spec's semantics. > > This is still a problem that needs to be fixed. Also, it the parser did > not seem to be too happy about the 'position' syntax, but I may have it > wrong of course. I don;t know how to attach the position function to a > piece of syntax such as (position <substr> in <field>) either, so I'm > hoping that somebody can pick this up. > > Also, i have started putting together a file for regression testing. I > noticed that the substring syntax does not seem to work: > > SELECT SUBSTRING(b FROM 2 FOR 4) > FROM ZPBIT_TABLE; > > gives: > > ERROR: Function 'substr(bit, int4, int4)' does not exist > Unable to identify a function that satisfies the given argument > types > You may need to add explicit typecasts > > and similar for a varying bit argument. > > If somebody with better knowledge of postgres could do the integration, > please, I will finish off a regression test. > > Thanks! > > Adriaan > *** src/backend/utils/adt/varbit.c.old Sun Oct 29 11:05:11 2000 > --- src/backend/utils/adt/varbit.c Mon Oct 30 04:58:35 2000 > *************** > *** 1053,1060 **** > /* Negative shift is a shift to the left */ > if (shft < 0) > PG_RETURN_DATUM(DirectFunctionCall2(bitshiftleft, > ! VarBitPGetDatum(arg), > ! Int32GetDatum(-shft))); > > result = (VarBit *) palloc(VARSIZE(arg)); > VARATT_SIZEP(result) = VARSIZE(arg); > --- 1053,1060 ---- > /* Negative shift is a shift to the left */ > if (shft < 0) > PG_RETURN_DATUM(DirectFunctionCall2(bitshiftleft, > ! VarBitPGetDatum(arg), > ! Int32GetDatum(-shft))); > > result = (VarBit *) palloc(VARSIZE(arg)); > VARATT_SIZEP(result) = VARSIZE(arg); > *************** > *** 1145,1148 **** > --- 1145,1242 ---- > result >>= VARBITPAD(arg); > > PG_RETURN_INT32(result); > + } > + > + /* Determines the position of S1 in the bitstring S2 (1-based string). > + * If S1 does not appear in S2 this function returns 0. > + * If S1 is of length 0 this function returns 1. > + */ > + Datum > + bitposition(PG_FUNCTION_ARGS) > + { > + VarBit *substr = PG_GETARG_VARBIT_P(0); > + VarBit *arg = PG_GETARG_VARBIT_P(1); > + int substr_length, > + arg_length, > + i, > + is; > + bits8 *s, /* pointer into substring */ > + *p; /* pointer into arg */ > + bits8 cmp, /* shifted substring byte to compare */ > + mask1, /* mask for substring byte shifted right */ > + mask2, /* mask for substring byte shifted left */ > + end_mask, /* pad mask for last substring byte */ > + arg_mask; /* pad mask for last argument byte */ > + bool is_match; > + > + /* Get the substring length */ > + substr_length = VARBITLEN(substr); > + arg_length = VARBITLEN(arg); > + > + /* Argument has 0 length or substring longer than argument, return 0 */ > + if (arg_length == 0 || substr_length > arg_length) > + PG_RETURN_INT32(0); > + > + /* 0-length means return 1 */ > + if (substr_length == 0) > + PG_RETURN_INT32(1); > + > + /* Initialise the padding masks */ > + end_mask = BITMASK << VARBITPAD(substr); > + arg_mask = BITMASK << VARBITPAD(arg); > + for (i = 0; i < VARBITBYTES(arg) - VARBITBYTES(substr) + 1; i++) > + { > + for (is = 0; is < BITS_PER_BYTE; is++) { > + is_match = true; > + p = VARBITS(arg) + i; > + mask1 = BITMASK >> is; > + mask2 = ~mask1; > + for (s = VARBITS(substr); > + is_match && s < VARBITEND(substr); s++) > + { > + cmp = *s >> is; > + if (s == VARBITEND(substr) - 1) > + { > + mask1 &= end_mask >> is; > + if (p == VARBITEND(arg) - 1) { > + /* Check that there is enough of arg left */ > + if (mask1 & ~arg_mask) { > + is_match = false; > + break; > + } > + mask1 &= arg_mask; > + } > + } > + is_match = ((cmp ^ *p) & mask1) == 0; > + if (!is_match) > + break; > + // Move on to the next byte > + p++; > + if (p == VARBITEND(arg)) { > + mask2 = end_mask << (BITS_PER_BYTE - is); > + is_match = mask2 == 0; > + elog(NOTICE,"S. %d %d em=%2x sm=%2x r=%d", > + i,is,end_mask,mask2,is_match); > + break; > + } > + cmp = *s << (BITS_PER_BYTE - is); > + if (s == VARBITEND(substr) - 1) > + { > + mask2 &= end_mask << (BITS_PER_BYTE - is); > + if (p == VARBITEND(arg) - 1) { > + if (mask2 & ~arg_mask) { > + is_match = false; > + break; > + } > + mask2 &= arg_mask; > + } > + } > + is_match = ((cmp ^ *p) & mask2) == 0; > + } > + /* Have we found a match */ > + if (is_match) > + PG_RETURN_INT32(i*BITS_PER_BYTE + is + 1); > + } > + } > + PG_RETURN_INT32(0); > } > *** src/include/utils/varbit.h.old Sun Oct 29 11:04:58 2000 > --- src/include/utils/varbit.h Sun Oct 29 11:05:58 2000 > *************** > *** 87,91 **** > --- 87,92 ---- > extern Datum bitoctetlength(PG_FUNCTION_ARGS); > extern Datum bitfromint4(PG_FUNCTION_ARGS); > extern Datum bittoint4(PG_FUNCTION_ARGS); > + extern Datum bitposition(PG_FUNCTION_ARGS); > > #endif -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Main open item is the handling of hex strings: they are still converted to integers by default. They could be BLOBs or bit strings, and the SQL standard gives no hint, so it is not clear what the solution should be. The only other complaint has been on the output of bit strings. I believe the current policy is that they are always output in binary, but I have had a request for output in hex. We may have to add a function that allows the user to use the hex output routine (a SET variable or some type of conversion function). I agree with leaving the default binary as most people seem to want to use it for bit masks anyway. Adriaan
> Main open item is the handling of hex strings: they are still converted > to integers by default. They could be BLOBs or bit strings, and the SQL > standard gives no hint, so it is not clear what the solution should be. Should I add this to the TODO list? > > The only other complaint has been on the output of bit strings. I > believe the current policy is that they are always output in binary, but > I have had a request for output in hex. We may have to add a function > that allows the user to use the hex output routine (a SET variable or > some type of conversion function). I agree with leaving the default > binary as most people seem to want to use it for bit masks anyway. > > Adriaan > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026