Re: BIT/BIT VARYING status - Mailing list pgsql-hackers

From Adriaan Joubert
Subject Re: BIT/BIT VARYING status
Date
Msg-id 39FD3235.1B25DEBC@albourne.com
Whole thread Raw
In response to BIT/BIT VARYING status  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Re: BIT/BIT VARYING status
Re: Re: BIT/BIT VARYING status
Re: Re: BIT/BIT VARYING status
List pgsql-hackers
Tom Lane wrote:
> 
> I have made a first cut at completing integration of Adriaan Joubert's
> BIT code into the backend.  There are a couple little things left to
> do (for example, scalarltsel doesn't know what to do with BIT values)
> as well as some not-so-little things:
> 
> 1. SQL92 mentions a bitwise position function, which we do not have.

Sorry, I have been very busy, so only got down to implementing a
position function last night. It's a bit messy (lots of masks and
bit-twiddling), but I feel fairly happy now that it is doing the right
thing. I tested it with my own loadable types, as the integration into
postgres proper stumped my somewhat. The next oid up for a bit function
is in use already. Anyway, the patches are attached, and I'm hoping that
some friendly sole will integrate the new position function into
postgres proper.
> 2. We don't handle <bit string> and <hex string> literals correctly;
> the scanner converts them into integers which seems quite at variance
> with the spec's semantics.

This is still a problem that needs to be fixed. Also, it the parser did
not seem to be too happy about the 'position' syntax, but I may have it
wrong of course. I don;t know how to attach the position function to a
piece of syntax such as (position <substr> in <field>) either, so I'm
hoping that somebody can pick this up.

Also, i have started putting together a file for regression testing. I
noticed that the substring syntax does not seem to work:

SELECT SUBSTRING(b FROM 2 FOR 4)      FROM ZPBIT_TABLE;

gives:

ERROR:  Function 'substr(bit, int4, int4)' does not exist       Unable to identify a function that satisfies the given
argument
types       You may need to add explicit typecasts

and similar for a varying bit argument.

If somebody with better knowledge of postgres could do the integration,
please, I will finish off a regression test.

Thanks!

Adriaan*** src/backend/utils/adt/varbit.c.old    Sun Oct 29 11:05:11 2000
--- src/backend/utils/adt/varbit.c    Mon Oct 30 04:58:35 2000
***************
*** 1053,1060 ****     /* Negative shift is a shift to the left */     if (shft < 0)
PG_RETURN_DATUM(DirectFunctionCall2(bitshiftleft,
!                                             VarBitPGetDatum(arg),
!                                             Int32GetDatum(-shft)));      result = (VarBit *) palloc(VARSIZE(arg));
VARATT_SIZEP(result) = VARSIZE(arg);
 
--- 1053,1060 ----     /* Negative shift is a shift to the left */     if (shft < 0)
PG_RETURN_DATUM(DirectFunctionCall2(bitshiftleft,
!                             VarBitPGetDatum(arg),
!                             Int32GetDatum(-shft)));      result = (VarBit *) palloc(VARSIZE(arg));
VARATT_SIZEP(result)= VARSIZE(arg);
 
***************
*** 1145,1148 ****
--- 1145,1242 ----     result >>= VARBITPAD(arg);      PG_RETURN_INT32(result);
+ }
+ 
+ /* Determines the position of S1 in the bitstring S2 (1-based string).
+  * If S1 does not appear in S2 this function returns 0.
+  * If S1 is of length 0 this function returns 1.
+  */
+ Datum
+ bitposition(PG_FUNCTION_ARGS)
+ {
+     VarBit        *substr = PG_GETARG_VARBIT_P(0);
+     VarBit        *arg = PG_GETARG_VARBIT_P(1);
+     int            substr_length, 
+                 arg_length,
+                 i,
+                 is;
+     bits8        *s,                /* pointer into substring */
+                 *p;                /* pointer into arg */
+     bits8        cmp,            /* shifted substring byte to compare */ 
+                 mask1,          /* mask for substring byte shifted right */
+                 mask2,          /* mask for substring byte shifted left */
+                 end_mask,       /* pad mask for last substring byte */
+                 arg_mask;        /* pad mask for last argument byte */
+     bool        is_match;
+ 
+     /* Get the substring length */
+     substr_length = VARBITLEN(substr);
+     arg_length = VARBITLEN(arg);
+ 
+     /* Argument has 0 length or substring longer than argument, return 0 */
+     if (arg_length == 0 || substr_length > arg_length)
+         PG_RETURN_INT32(0);    
+     
+     /* 0-length means return 1 */
+     if (substr_length == 0)
+         PG_RETURN_INT32(1);
+ 
+     /* Initialise the padding masks */
+     end_mask = BITMASK << VARBITPAD(substr);
+     arg_mask = BITMASK << VARBITPAD(arg);
+     for (i = 0; i < VARBITBYTES(arg) - VARBITBYTES(substr) + 1; i++) 
+     {
+         for (is = 0; is < BITS_PER_BYTE; is++) {
+             is_match = true;
+             p = VARBITS(arg) + i;
+             mask1 = BITMASK >> is;
+             mask2 = ~mask1;
+             for (s = VARBITS(substr); 
+                  is_match && s < VARBITEND(substr); s++) 
+             {
+                 cmp = *s >> is;
+                 if (s == VARBITEND(substr) - 1) 
+                 {
+                     mask1 &= end_mask >> is;
+                     if (p == VARBITEND(arg) - 1) {
+                         /* Check that there is enough of arg left */
+                         if (mask1 & ~arg_mask) {
+                             is_match = false;
+                             break;
+                         }
+                         mask1 &= arg_mask;
+                     }
+                 }
+                 is_match = ((cmp ^ *p) & mask1) == 0;
+                 if (!is_match)
+                     break;
+                 // Move on to the next byte
+                 p++;
+                 if (p == VARBITEND(arg)) {
+                     mask2 = end_mask << (BITS_PER_BYTE - is);
+                     is_match = mask2 == 0;
+                     elog(NOTICE,"S. %d %d em=%2x sm=%2x r=%d",
+                          i,is,end_mask,mask2,is_match);
+                     break;
+                 }
+                 cmp = *s << (BITS_PER_BYTE - is);
+                 if (s == VARBITEND(substr) - 1) 
+                 {
+                     mask2 &= end_mask << (BITS_PER_BYTE - is);
+                     if (p == VARBITEND(arg) - 1) {
+                         if (mask2 & ~arg_mask) {
+                             is_match = false;
+                             break;
+                         }
+                         mask2 &= arg_mask;
+                     }
+                 }
+                 is_match = ((cmp ^ *p) & mask2) == 0;
+             }
+             /* Have we found a match */
+             if (is_match)
+                 PG_RETURN_INT32(i*BITS_PER_BYTE + is + 1);
+         }
+     }
+     PG_RETURN_INT32(0); }
*** src/include/utils/varbit.h.old    Sun Oct 29 11:04:58 2000
--- src/include/utils/varbit.h    Sun Oct 29 11:05:58 2000
***************
*** 87,91 ****
--- 87,92 ---- extern Datum bitoctetlength(PG_FUNCTION_ARGS); extern Datum bitfromint4(PG_FUNCTION_ARGS); extern Datum
bittoint4(PG_FUNCTION_ARGS);
+ extern Datum bitposition(PG_FUNCTION_ARGS);  #endif

pgsql-hackers by date:

Previous
From: Zeugswetter Andreas SB
Date:
Subject: AW: LIMIT in DECLARE CURSOR: request for comments
Next
From: Peter Mount
Date:
Subject: Current CVS broken?