Re: Re: BIT/BIT VARYING status - Mailing list pgsql-hackers

From Adriaan Joubert
Subject Re: Re: BIT/BIT VARYING status
Date
Msg-id 3A05AC69.5066EE5D@albourne.com
Whole thread Raw
In response to Re: Re: BIT/BIT VARYING status  (Peter Eisentraut <peter_e@gmx.net>)
Responses Re: Re: BIT/BIT VARYING status  (Peter Eisentraut <peter_e@gmx.net>)
Re: Re: BIT/BIT VARYING status  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
Peter,

    I've looked at the current implementation of the bit types and still
have some doubts concerning the following issues:

1. Constants. The current behaviour just seems somewhat strange, and I
have no idea where to fix it.

test=# select B'1001';
 ?column?
----------
 X9
(1 row)

test=# select B'1001'::bit;
ERROR:  Cannot cast this expression to type 'bit'
test=# select B'1001'::varbit;
ERROR:  Cannot cast this expression to type 'varbit'
test=# select 'B1001'::varbit;
 ?column?
----------
 B1001
(1 row)

test=# select 'B1001'::bit;
 ?column?
----------
 X9
(1 row)

test=# select X'1001'::varbit;
ERROR:  varbit_in: The bit string 4097 must start with B or X
test=# select 'X1001'::varbit;
     ?column?
-------------------
 B0001000000000001
(1 row)

test=# select 'X1001'::bit;
 ?column?
----------
 X1001
(1 row)

test=# select X'1001'::bit;
ERROR:  zpbit_in: The bit string 4097 must start with B or X

Also, I have two output routines, that have been renames to zpbit_out
and varbit_out. In fact, both will work just fine for bot bit and
varbit, but the first prints as hex and the second as a bit string.
Printing as hex is more compact, so good for long strings, but printing
as a bit string is much more intuitive. One solution would be to make
them both print to a bit string by default and define a function to
generate a hex string. Another would be to have this under control of a
variable. Most people who contacted me about bit strings seemed to want
to use them for flags, so I guess the default should be to print them as
a bit string.

More for my information, if a user does not know about varbit, how does
he cast to bit varying?

2. This is not a problem, more a question. There is no default way to
compare bit to varbit, as in

test=# select 'b10'::bit='b10'::varbit;
ERROR:  Unable to identify an operator '=' for types 'bit' and 'varbit'
        You will have to retype this query using an explicit cast

This may be a good thing, as the comparison does depend on the lenght of
the bit strings.

3. The ^ operator seems to attempt to coerce the arguments to float8?

select 'B110011'::bit ^ 'B011101'::bit;
ERROR:  Function 'float8(bit)' does not exist
        Unable to identify a function that satisfies the given argument
types
        You may need to add explicit typecasts

4. This is a policy question. When I use the bit shift operator, this
always shifts within the current string only. So if I do

select ('B010'::bit(6) >> 2)::varbit;
 ?column?
-----------
 B000100

I get what I would expect. But if I have a bit varying(6) field (in a
table, this is just an example), I only get

select ('B010'::varbit >> 2)::varbit;
 ?column?
-----------
 B000

which I find counter-intuitive. I have thus added 'zpshiftright' and
'varbitshiftright' functions. The second extends the bitstring to the
right, while the first is the old bitshiftright function. I find this
more intuitive at least.

Question is what a shift left function should do? Should I shorten the
string in the case of a shift left, to keep it symmetrical to shift
right? This seems a pure policy decision, as there are arguments for
both behaviours, although I am a great fan of symmetry. Let me know and
I can implement a separate function.


I have made a start on a file for regression tests, which I append with
the diffs for the varbit files. Please let me know what else is needed
and where I can help.


Thanks!

Adriaan--
-- BIT types
--

--
-- Build tables for testing
--

CREATE TABLE ZPBIT_TABLE(b BIT(11));

INSERT INTO ZPBIT_TABLE VALUES ('B');

INSERT INTO ZPBIT_TABLE VALUES ('B0');

INSERT INTO ZPBIT_TABLE VALUES ('B010101');

INSERT INTO ZPBIT_TABLE VALUES ('B01010101010');

INSERT INTO ZPBIT_TABLE VALUES ('B010101010101');

INSERT INTO ZPBIT_TABLE VALUES ('X554');

INSERT INTO ZPBIT_TABLE VALUES ('X555');

SELECT * FROM ZPBIT_TABLE;

CREATE TABLE VARBIT_TABLE(v BIT VARYING(11));

INSERT INTO VARBIT_TABLE VALUES ('B');

INSERT INTO VARBIT_TABLE VALUES ('B0');

INSERT INTO VARBIT_TABLE VALUES ('B010101');

INSERT INTO VARBIT_TABLE VALUES ('B01010101010');

INSERT INTO VARBIT_TABLE VALUES ('B010101010101');

INSERT INTO VARBIT_TABLE VALUES ('X554');

INSERT INTO VARBIT_TABLE VALUES ('X555');

SELECT * FROM VARBIT_TABLE;

-- Delete from tables
DROP TABLE ZPBIT_TABLE;
CREATE TABLE ZPBIT_TABLE(b BIT(16));

INSERT INTO ZPBIT_TABLE VALUES ('B11011');
INSERT INTO ZPBIT_TABLE SELECT b>>1 FROM ZPBIT_TABLE;
INSERT INTO ZPBIT_TABLE SELECT b>>2 FROM ZPBIT_TABLE;
INSERT INTO ZPBIT_TABLE SELECT b>>4 FROM ZPBIT_TABLE;
INSERT INTO ZPBIT_TABLE SELECT b>>8 FROM ZPBIT_TABLE;
SELECT POSITION('B1101'::bit IN b) as pos,
       POSITION('B11011'::bit IN b) as pos,
       b
       FROM ZPBIT_TABLE ;

DROP TABLE VARBIT_TABLE;
CREATE TABLE VARBIT_TABLE(v BIT VARYING(19));
INSERT INTO VARBIT_TABLE VALUES ('B11011');
INSERT INTO VARBIT_TABLE SELECT v>>1 FROM VARBIT_TABLE;
INSERT INTO VARBIT_TABLE SELECT v>>2 FROM VARBIT_TABLE;
INSERT INTO VARBIT_TABLE SELECT v>>4 FROM VARBIT_TABLE;
INSERT INTO VARBIT_TABLE SELECT v>>8 FROM VARBIT_TABLE;
SELECT POSITION('B1101'::bit IN v) as pos,
       POSITION('B11011'::bit IN v) as pos,
       v
       FROM VARBIT_TABLE ;

-- Concatenation
SELECT v, b::varbit AS B, (v || b)::varbit AS C
       FROM ZPBIT_TABLE, VARBIT_TABLE
       WHERE v::bit(16)=b
       ORDER BY C;

-- Length
SELECT b, length(b) AS lb
       FROM ZPBIT_TABLE;
SELECT v, length(v) AS lv
       FROM VARBIT_TABLE;

-- Substring
SELECT b::varbit,
       SUBSTRING(b FROM 2 FOR 4)::varbit AS sub1,
       SUBSTRING(b FROM 7 FOR 13)::varbit AS sub2,
       SUBSTRING(b FROM 6)::varbit AS sub3
       FROM ZPBIT_TABLE;
SELECT v,
       SUBSTRING(v FROM 2 FOR 4)::varbit AS sub1,
       SUBSTRING(v FROM 7 FOR 13)::varbit AS sub2,
       SUBSTRING(v FROM 6)::varbit AS sub3
       FROM VARBIT_TABLE;

-- Drop the tables
DROP TABLE ZPBIT_TABLE;
DROP TABLE VARBIT_TABLE;

--- Bit operations
CREATE TABLE varbit_table (a BIT VARYING(16), b BIT VARYING(16));
COPY varbit_table FROM stdin;
X0F    X10
X1F    X11
X2F    X12
X3F    X13
X8F    X04
X000F    X0010
X0123    XFFFF
X2468    X2468
XFA50    X05AF
X1234    XFFF5
\.

SELECT a,b,~a AS "~ a",a & b AS "a & b",
    a|b AS "a | b", a^b AS "a ^ b" FROM varbit_table;
SELECT a,b,a<b AS "a<b",a<=b AS "a<=b",a=b AS "a=b",
        a>=b AS "a>=b",a>b AS "a>b",a<=>b AS "a<=>b" FROM varbit_table;
SELECT a,a<<4 AS "a<<4",b,b>>2 AS "b>>2" FROM varbit_table;

DROP TABLE varbit_table;

--- Bit operations
CREATE TABLE zpbit (a BIT(16), b BIT(16));
COPY zpbit FROM stdin;
X0F    X10
X1F    X11
X2F    X12
X3F    X13
X8F    X04
X000F    X0010
X0123    XFFFF
X2468    X2468
XFA50    X05AF
X1234    XFFF5
\.

SELECT a,b,~a AS "~ a",a & b AS "a & b",
    a|b AS "a | b", a^b AS "a ^ b" FROM zpbit;
SELECT a,b,a<b AS "a<b",a<=b AS "a<=b",a=b AS "a=b",
        a>=b AS "a>=b",a>b AS "a>b",a<>b AS "a<>b" FROM zpbit;
SELECT a,a<<4 AS "a<<4",b,b>>2 AS "b>>2" FROM zpbit;

DROP TABLE zpbit;


-- The following should fail
select 'X123'::bit & 'X12'::bit;
select 'B0111'::bit | 'B011'::bit;
select 'X023'::bit ^ 'B011101'::bit;

-- More position tests, checking all the boundary cases
SELECT POSITION('B1010'::bit IN 'B0000101'::bit);   -- 0
SELECT POSITION('B1010'::bit IN 'B00001010'::bit);  -- 5
SELECT POSITION('B1010'::bit IN 'B00000101'::bit);  -- 0
SELECT POSITION('B1010'::bit IN 'B000001010'::bit);  -- 6

SELECT POSITION('B'::bit IN 'B00001010'::bit);  -- 1
SELECT POSITION('B0'::bit IN 'B'::bit);  -- 0
SELECT POSITION('B'::bit IN 'B'::bit);  -- 0
SELECT POSITION('B101101'::bit IN 'B001011011011011000'::bit);  -- 3
SELECT POSITION('B10110110'::bit IN 'B001011011011010'::bit);  -- 3
SELECT POSITION('B1011011011011'::bit IN 'B001011011011011'::bit);  -- 3
SELECT POSITION('B1011011011011'::bit IN 'B00001011011011011'::bit);  -- 5

SELECT POSITION('B11101011'::bit IN 'B11101011'::bit); -- 1
SELECT POSITION('B11101011'::bit IN 'B011101011'::bit); -- 2
SELECT POSITION('B11101011'::bit IN 'B00011101011'::bit); -- 4
SELECT POSITION('B11101011'::bit IN 'B0000011101011'::bit); -- 6

SELECT POSITION('B111010110'::bit IN 'B111010110'::bit); -- 1
SELECT POSITION('B111010110'::bit IN 'B0111010110'::bit); -- 2
SELECT POSITION('B111010110'::bit IN 'B000111010110'::bit); -- 4
SELECT POSITION('B111010110'::bit IN 'B00000111010110'::bit); -- 6

SELECT POSITION('B111010110'::bit IN 'B11101011'::bit); -- 0
SELECT POSITION('B111010110'::bit IN 'B011101011'::bit); -- 0
SELECT POSITION('B111010110'::bit IN 'B00011101011'::bit); -- 0
SELECT POSITION('B111010110'::bit IN 'B0000011101011'::bit); -- 0

SELECT POSITION('B111010110'::bit IN 'B111010110'::bit); -- 1
SELECT POSITION('B111010110'::bit IN 'B0111010110'::bit); -- 2
SELECT POSITION('B111010110'::bit IN 'B000111010110'::bit); -- 4
SELECT POSITION('B111010110'::bit IN 'B00000111010110'::bit); -- 6

SELECT POSITION('B111010110'::bit IN 'B000001110101111101011'::bit); -- 0
SELECT POSITION('B111010110'::bit IN 'B0000001110101111101011'::bit); -- 0
SELECT POSITION('B111010110'::bit IN 'B000000001110101111101011'::bit); -- 0
SELECT POSITION('B111010110'::bit IN 'B00000000001110101111101011'::bit); -- 0

SELECT POSITION('B111010110'::bit IN 'B0000011101011111010110'::bit); -- 14
SELECT POSITION('B111010110'::bit IN 'B00000011101011111010110'::bit); -- 15
SELECT POSITION('B111010110'::bit IN 'B0000000011101011111010110'::bit); -- 17
SELECT POSITION('B111010110'::bit IN 'B000000000011101011111010110'::bit); -- 19

SELECT POSITION('B000000000011101011111010110'::bit IN 'B000000000011101011111010110'::bit); -- 1
SELECT POSITION('B00000000011101011111010110'::bit IN 'B000000000011101011111010110'::bit); -- 2
SELECT POSITION('B0000000000011101011111010110'::bit IN 'B000000000011101011111010110'::bit); -- 0

*** src/backend/utils/adt/varbit.c.old    Sun Nov  5 12:03:04 2000
--- src/backend/utils/adt/varbit.c    Sun Nov  5 19:12:10 2000
***************
*** 73,79 ****
          bit_not_hex = false;
      else
      {
!         elog(ERROR, "zpbit_in: %s is not a valid bitstring", s);
          bit_not_hex = false;    /* keep compiler quiet */
      }

--- 73,79 ----
          bit_not_hex = false;
      else
      {
!         elog(ERROR, "zpbit_in: The bit string %s must start with B or X", s);
          bit_not_hex = false;    /* keep compiler quiet */
      }

***************
*** 299,305 ****
          bit_not_hex = false;
      else
      {
!         elog(ERROR, "varbit_in: %s is not a valid bitstring", s);
          bit_not_hex = false;    /* keep compiler quiet */
      }

--- 299,305 ----
          bit_not_hex = false;
      else
      {
!         elog(ERROR, "varbit_in: The bit string %s must start with B or X", s);
          bit_not_hex = false;    /* keep compiler quiet */
      }

***************
*** 744,749 ****
--- 744,752 ----
                 *ps;

      bitlen = VARBITLEN(arg);
+     /* Do we have an upper bound? */
+     if (l==-1)
+         l = bitlen;
      e = s + l;
      s1 = Max(s, 1);
      e1 = Min(e, bitlen + 1);
***************
*** 1039,1045 ****
   * do a right shift (i.e. towards the end of the string)
   */
  Datum
! bitshiftright(PG_FUNCTION_ARGS)
  {
      VarBit       *arg = PG_GETARG_VARBIT_P(0);
      int32        shft = PG_GETARG_INT32(1);
--- 1042,1048 ----
   * do a right shift (i.e. towards the end of the string)
   */
  Datum
! zpbitshiftright(PG_FUNCTION_ARGS)
  {
      VarBit       *arg = PG_GETARG_VARBIT_P(0);
      int32        shft = PG_GETARG_INT32(1);
***************
*** 1097,1102 ****
--- 1100,1166 ----
      PG_RETURN_VARBIT_P(result);
  }

+ /* bitshiftright
+  * do a right shift (i.e. towards the end of the string)
+  */
+ Datum
+ /*varbitshiftright(PG_FUNCTION_ARGS)*/
+ bitshiftright(PG_FUNCTION_ARGS)
+ {
+     VarBit       *arg = PG_GETARG_VARBIT_P(0);
+     int32        shft = PG_GETARG_INT32(1);
+     VarBit       *result;
+     int            byte_shift,
+                 byte_len,
+                 ishift,
+                 len;
+     bits8       *p,
+                *r;
+
+     /* Negative shift is a shift to the left */
+     if (shft < 0)
+         PG_RETURN_DATUM(DirectFunctionCall2(bitshiftleft,
+                             VarBitPGetDatum(arg),
+                             Int32GetDatum(-shft)));
+
+     /* When we have a varying bit string, the string may get longer */
+     len = VARBITLEN(arg) + shft;
+     byte_len = VARBITTOTALLEN(len);
+
+     result = (VarBit *) palloc(byte_len);
+     VARATT_SIZEP(result) = byte_len;
+     VARBITLEN(result) = len;
+     r = VARBITS(result);
+
+     byte_shift = shft / BITS_PER_BYTE;
+     ishift = shft % BITS_PER_BYTE;
+     p = VARBITS(arg);
+
+     /* Set the first part of the result to 0 */
+     memset(r, 0, byte_shift);
+     r += byte_shift;
+
+     if (ishift == 0)
+     {
+         /* Special case: we can do a memcpy */
+         len = VARBITBYTES(arg);
+         memcpy(r, p, len);
+     }
+     else
+     {
+         if (r < VARBITEND(result))
+             *r = 0;                /* initialize first byte */
+         for (; r < VARBITEND(result); p++)
+         {
+             *r |= *p >> ishift;
+             if ((++r) < VARBITEND(result))
+                 *r = (*p << (BITS_PER_BYTE - ishift)) & BITMASK;
+         }
+     }
+
+     PG_RETURN_VARBIT_P(result);
+ }
+
  /* This is not defined in any standard. We retain the natural ordering of
   * bits here, as it just seems more intuitive.
   */
***************
*** 1216,1224 ****
                  p++;
                  if (p == VARBITEND(arg)) {
                      mask2 = end_mask << (BITS_PER_BYTE - is);
!                     is_match = mask2 == 0;
!                     elog(NOTICE,"S. %d %d em=%2x sm=%2x r=%d",
!                          i,is,end_mask,mask2,is_match);
                      break;
                  }
                  cmp = *s << (BITS_PER_BYTE - is);
--- 1280,1286 ----
                  p++;
                  if (p == VARBITEND(arg)) {
                      mask2 = end_mask << (BITS_PER_BYTE - is);
!                     is_match = (mask2 == 0);
                      break;
                  }
                  cmp = *s << (BITS_PER_BYTE - is);
*** src/include/utils/varbit.h.old    Sun Nov  5 20:20:40 2000
--- src/include/utils/varbit.h    Sun Nov  5 19:04:16 2000
***************
*** 80,86 ****
--- 80,88 ----
  extern Datum bitxor(PG_FUNCTION_ARGS);
  extern Datum bitnot(PG_FUNCTION_ARGS);
  extern Datum bitshiftleft(PG_FUNCTION_ARGS);
+ extern Datum zpbitshiftright(PG_FUNCTION_ARGS);
  extern Datum bitshiftright(PG_FUNCTION_ARGS);
+ /*extern Datum varbitshiftright(PG_FUNCTION_ARGS);*/
  extern Datum bitcat(PG_FUNCTION_ARGS);
  extern Datum bitsubstr(PG_FUNCTION_ARGS);
  extern Datum bitlength(PG_FUNCTION_ARGS);

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Transaction ID wraparound: problem and proposed solution
Next
From: Peter Eisentraut
Date:
Subject: Re: Unicode conversion (Re: [COMMITTERS] pgsql (configure.in))