Thread: Regular Expression Matching problem...
I have a table containing regular expresion patterns for matching phone numbers with the mobile operators. For instance, my phone number is '+385911234567', and the regexp for this mobile operator is: "^\+38591\d{7}$". Now, when I do a regexp match in a single select, it behaves as expected: octopussy2=# select '+385911234567' ~ '^\\+38591\\d{7}$';?column? ----------t (1 row) Now, as I've mentioned, I have a table with operators and their patterns for phone numbers: octopussy2=# select * from operators;operator_id | operator_name | operator_phonenumber_pattern -------------+---------------+------------------------------ 1 | FreeSMSC | ^\\+38590\\d{6,7}$ 2 |VipNet | ^\\+38591\\d{7}$ 3 | T-Mobile | ^\\+3859[9|8]\\d{6,7}$ 4 | Tele2 | ^\\+38595\\d{7}$ (4 rows) Now, if I construct my query like this: octopussy2=# select '+385911234567', operator_phonenumber_pattern, '+385911234567' ~ operator_phonenumber_pattern from operators; ?column? | operator_phonenumber_pattern | ?column? ---------------+------------------------------+----------+385911234567 | ^\\+38590\\d{6,7}$ | f+385911234567 |^\\+38591\\d{7}$ | f+385911234567 | ^\\+3859[9|8]\\d{6,7}$ | f+385911234567 | ^\\+38595\\d{7}$ | f (4 rows) Why do I get all the "f"'s? I tought that the operator_id 2 should return "t", esp. when I wrote the first query it seems that the regex match was ok. Or I can't do regexp matching from the table columns? Mike -- Mario Splivalo Mob-Art mario.splivalo@mobart.hr "I can do it quick, I can do it cheap, I can do it well. Pick any two."
am 04.01.2006, um 15:08:45 +0100 mailte Mario Splivalo folgendes: > I have a table containing regular expresion patterns for matching phone > numbers with the mobile operators. > > For instance, my phone number is '+385911234567', and the regexp for > this mobile operator is: "^\+38591\d{7}$". > > Now, when I do a regexp match in a single select, it behaves as > expected: > > octopussy2=# select '+385911234567' ~ '^\\+38591\\d{7}$'; > ?column? > ---------- > t > (1 row) > > Now, as I've mentioned, I have a table with operators and their patterns > for phone numbers: > > octopussy2=# select * from operators; > operator_id | operator_name | operator_phonenumber_pattern > -------------+---------------+------------------------------ > 1 | FreeSMSC | ^\\+38590\\d{6,7}$ > 2 | VipNet | ^\\+38591\\d{7}$ > 3 | T-Mobile | ^\\+3859[9|8]\\d{6,7}$ > 4 | Tele2 | ^\\+38595\\d{7}$ > (4 rows) > > > Now, if I construct my query like this: > > octopussy2=# select '+385911234567', operator_phonenumber_pattern, > '+385911234567' ~ operator_phonenumber_pattern from operators; > > ?column? | operator_phonenumber_pattern | ?column? > ---------------+------------------------------+---------- > +385911234567 | ^\\+38590\\d{6,7}$ | f > +385911234567 | ^\\+38591\\d{7}$ | f > +385911234567 | ^\\+3859[9|8]\\d{6,7}$ | f > +385911234567 | ^\\+38595\\d{7}$ | f > (4 rows) > > > Why do I get all the "f"'s? I tought that the operator_id 2 should > return "t", esp. when I wrote the first query it seems that the regex > match was ok. Try this: test=# select '+385911234567', operator_phonenumber_pattern, '+385911234567' ~ replace(operator_phonenumber_pattern,'\\\\','\\')from operators; ?column? | operator_phonenumber_pattern | ?column? ---------------+------------------------------+----------+385911234567 | ^\\+38590\\d{6,7}$ | f+385911234567 |^\\+38591\\d{7}$ | t+385911234567 | ^\\+3859[9|8]\\d{6,7}$ | f+385911234567 | ^\\+38595\\d{7}$ | f (4 rows) HTH, Andreas -- Andreas Kretschmer (Kontakt: siehe Header) Heynitz: 035242/47212, D1: 0160/7141639 GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net=== Schollglas Unternehmensgruppe ===
On Wed, Jan 04, 2006 at 03:50:31PM +0100, A. Kretschmer wrote: > Try this: > > test=# select '+385911234567', operator_phonenumber_pattern, '+385911234567' ~ replace(operator_phonenumber_pattern,'\\\\','\\')from operators; > ?column? | operator_phonenumber_pattern | ?column? > ---------------+------------------------------+---------- > +385911234567 | ^\\+38590\\d{6,7}$ | f > +385911234567 | ^\\+38591\\d{7}$ | t > +385911234567 | ^\\+3859[9|8]\\d{6,7}$ | f > +385911234567 | ^\\+38595\\d{7}$ | f > (4 rows) What Andreas is saying is that the patterns in the table have too many backslashes. The original query was test=> select '+385911234567' ~ '^\\+38591\\d{7}$';?column? ----------t (1 row) but if you select just the pattern you get test=> select '^\\+38591\\d{7}$'; ?column? ----------------^\+38591\d{7}$ (1 row) which isn't what's in the table; somehow the patterns in the table are over-escaped. You might want to read up on how backslashes are handled in quoted strings, especially when those strings are used as regular expressions. If you're using 8.0 or later then also read about dollar-quoted strings. http://www.postgresql.org/docs/8.1/interactive/sql-syntax.html#SQL-SYNTAX-STRINGS http://www.postgresql.org/docs/8.1/interactive/functions-matching.html#FUNCTIONS-POSIX-REGEXP (Use the documentation for whatever version of PostgreSQL you're running; the above links have some comments that apply only to 8.1 and future versions). -- Michael Fuhr
On Wed, 2006-01-04 at 10:00 -0700, Michael Fuhr wrote: > What Andreas is saying is that the patterns in the table have too > many backslashes. The original query was > > test=> select '+385911234567' ~ '^\\+38591\\d{7}$'; > ?column? > ---------- > t > (1 row) > > but if you select just the pattern you get > > test=> select '^\\+38591\\d{7}$'; > ?column? > ---------------- > ^\+38591\d{7}$ > (1 row) I was a bit silly here. Dunno why I tought I need to escape the backslashes when entering regexp as data in table. Thank you for pointing that out. Mike