Re: Index used incorrectly with regular expressions on 7.4.6 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Index used incorrectly with regular expressions on 7.4.6
Date
Msg-id 1590.1101955668@sss.pgh.pa.us
Whole thread Raw
In response to Index used incorrectly with regular expressions on 7.4.6  (Antti Salmela <asalmela@iki.fi>)
List pgsql-hackers
Antti Salmela <asalmela@iki.fi> writes:
> Index is used incorrectly if constant part of the string ends with \d,

Yeah, you're right --- that code predates our use of the new regexp
engine, and it didn't know that escapes aren't simply quoted characters.

Now that I look at it, it's got a multibyte problem too :-(

If you need a patch right away, here's what I applied to 7.4 branch.
        regards, tom lane

Index: selfuncs.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v
retrieving revision 1.147.2.3
diff -c -r1.147.2.3 selfuncs.c
*** selfuncs.c    27 Feb 2004 21:44:44 -0000    1.147.2.3
--- selfuncs.c    2 Dec 2004 02:35:48 -0000
***************
*** 3218,3223 ****
--- 3218,3225 ----     char       *match;     int            pos,                 match_pos,
+                 prev_pos,
+                 prev_match_pos,                 paren_depth;     char       *patt;     char       *rest;
***************
*** 3278,3288 ****      /* OK, allocate space for pattern */     match = palloc(strlen(patt) + 1);
!     match_pos = 0;      /* note start at pos 1 to skip leading ^ */
!     for (pos = 1; patt[pos]; pos++)     {         /*          * Check for characters that indicate multiple possible
matches         * here. XXX I suspect isalpha() is not an adequately
 
--- 3280,3292 ----      /* OK, allocate space for pattern */     match = palloc(strlen(patt) + 1);
!     prev_match_pos = match_pos = 0;      /* note start at pos 1 to skip leading ^ */
!     for (prev_pos = pos = 1; patt[pos]; )     {
+         int        len;
+          /*          * Check for characters that indicate multiple possible matches          * here. XXX I suspect
isalpha()is not an adequately
 
***************
*** 3297,3302 ****
--- 3301,3314 ----             break;          /*
+          * In AREs, backslash followed by alphanumeric is an escape, not
+          * a quoted character.  Must treat it as having multiple possible
+          * matches.
+          */
+         if (patt[pos] == '\\' && isalnum((unsigned char) patt[pos + 1]))
+             break;
+ 
+         /*          * Check for quantifiers.  Except for +, this means the preceding          * character is
optional,so we must remove it from the prefix          * too!
 
***************
*** 3305,3318 ****             patt[pos] == '?' ||             patt[pos] == '{')         {
!             if (match_pos > 0)
!                 match_pos--;
!             pos--;             break;         }         if (patt[pos] == '+')         {
!             pos--;             break;         }         if (patt[pos] == '\\')
--- 3317,3329 ----             patt[pos] == '?' ||             patt[pos] == '{')         {
!             match_pos = prev_match_pos;
!             pos = prev_pos;             break;         }         if (patt[pos] == '+')         {
!             pos = prev_pos;             break;         }         if (patt[pos] == '\\')
***************
*** 3322,3328 ****             if (patt[pos] == '\0')                 break;         }
!         match[match_pos++] = patt[pos];     }      match[match_pos] = '\0';
--- 3333,3346 ----             if (patt[pos] == '\0')                 break;         }
!         /* save position in case we need to back up on next loop cycle */
!         prev_match_pos = match_pos;
!         prev_pos = pos;
!         /* must use encoding-aware processing here */
!         len = pg_mblen(&patt[pos]);
!         memcpy(&match[match_pos], &patt[pos], len);
!         match_pos += len;
!         pos += len;     }      match[match_pos] = '\0';


pgsql-hackers by date:

Previous
From: "Andrew Dunstan"
Date:
Subject: Re: Please release (was Re: nodeAgg perf tweak)
Next
From: Bruce Momjian
Date:
Subject: Re: lwlocks and starvation