Thread: Regular Expression Matching problem...

Regular Expression Matching problem...

From
Mario Splivalo
Date:
I have a table containing regular expresion patterns for matching phone
numbers with the mobile operators.

For instance, my phone number is '+385911234567', and the regexp for
this mobile operator is: "^\+38591\d{7}$".

Now, when I do a regexp match in a single select, it behaves as
expected:

octopussy2=# select '+385911234567' ~ '^\\+38591\\d{7}$';?column?
----------t
(1 row)

Now, as I've mentioned, I have a table with operators and their patterns
for phone numbers:

octopussy2=# select * from operators;operator_id | operator_name | operator_phonenumber_pattern
-------------+---------------+------------------------------          1 | FreeSMSC      | ^\\+38590\\d{6,7}$          2
|VipNet        | ^\\+38591\\d{7}$          3 | T-Mobile      | ^\\+3859[9|8]\\d{6,7}$          4 | Tele2         |
^\\+38595\\d{7}$
(4 rows)


Now, if I construct my query like this:

octopussy2=# select '+385911234567', operator_phonenumber_pattern,
'+385911234567' ~ operator_phonenumber_pattern from operators;
  ?column?    | operator_phonenumber_pattern | ?column?
---------------+------------------------------+----------+385911234567 | ^\\+38590\\d{6,7}$           | f+385911234567
|^\\+38591\\d{7}$             | f+385911234567 | ^\\+3859[9|8]\\d{6,7}$       | f+385911234567 | ^\\+38595\\d{7}$
     | f
 
(4 rows)


Why do I get all the "f"'s? I tought that the operator_id 2 should
return "t", esp. when I wrote the first query it seems that the regex
match was ok.

Or I can't do regexp matching from the table columns?
Mike
-- 
Mario Splivalo
Mob-Art
mario.splivalo@mobart.hr

"I can do it quick, I can do it cheap, I can do it well. Pick any two."




Re: Regular Expression Matching problem...

From
"A. Kretschmer"
Date:
am  04.01.2006, um 15:08:45 +0100 mailte Mario Splivalo folgendes:
> I have a table containing regular expresion patterns for matching phone
> numbers with the mobile operators.
> 
> For instance, my phone number is '+385911234567', and the regexp for
> this mobile operator is: "^\+38591\d{7}$".
> 
> Now, when I do a regexp match in a single select, it behaves as
> expected:
> 
> octopussy2=# select '+385911234567' ~ '^\\+38591\\d{7}$';
>  ?column?
> ----------
>  t
> (1 row)
> 
> Now, as I've mentioned, I have a table with operators and their patterns
> for phone numbers:
> 
> octopussy2=# select * from operators;
>  operator_id | operator_name | operator_phonenumber_pattern
> -------------+---------------+------------------------------
>            1 | FreeSMSC      | ^\\+38590\\d{6,7}$
>            2 | VipNet        | ^\\+38591\\d{7}$
>            3 | T-Mobile      | ^\\+3859[9|8]\\d{6,7}$
>            4 | Tele2         | ^\\+38595\\d{7}$
> (4 rows)
> 
> 
> Now, if I construct my query like this:
> 
> octopussy2=# select '+385911234567', operator_phonenumber_pattern,
> '+385911234567' ~ operator_phonenumber_pattern from operators;
> 
>    ?column?    | operator_phonenumber_pattern | ?column?
> ---------------+------------------------------+----------
>  +385911234567 | ^\\+38590\\d{6,7}$           | f
>  +385911234567 | ^\\+38591\\d{7}$             | f
>  +385911234567 | ^\\+3859[9|8]\\d{6,7}$       | f
>  +385911234567 | ^\\+38595\\d{7}$             | f
> (4 rows)
> 
> 
> Why do I get all the "f"'s? I tought that the operator_id 2 should
> return "t", esp. when I wrote the first query it seems that the regex
> match was ok.

Try this:

test=# select '+385911234567', operator_phonenumber_pattern, '+385911234567' ~
replace(operator_phonenumber_pattern,'\\\\','\\')from operators;  ?column?    | operator_phonenumber_pattern |
?column?
---------------+------------------------------+----------+385911234567 | ^\\+38590\\d{6,7}$           | f+385911234567
|^\\+38591\\d{7}$             | t+385911234567 | ^\\+3859[9|8]\\d{6,7}$       | f+385911234567 | ^\\+38595\\d{7}$
     | f
 
(4 rows)


HTH, Andreas
-- 
Andreas Kretschmer    (Kontakt: siehe Header)
Heynitz:  035242/47212,      D1: 0160/7141639
GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net===    Schollglas Unternehmensgruppe    === 


Re: Regular Expression Matching problem...

From
Michael Fuhr
Date:
On Wed, Jan 04, 2006 at 03:50:31PM +0100, A. Kretschmer wrote:
> Try this:
> 
> test=# select '+385911234567', operator_phonenumber_pattern, '+385911234567' ~
replace(operator_phonenumber_pattern,'\\\\','\\')from operators;
 
>    ?column?    | operator_phonenumber_pattern | ?column?
> ---------------+------------------------------+----------
>  +385911234567 | ^\\+38590\\d{6,7}$           | f
>  +385911234567 | ^\\+38591\\d{7}$             | t
>  +385911234567 | ^\\+3859[9|8]\\d{6,7}$       | f
>  +385911234567 | ^\\+38595\\d{7}$             | f
> (4 rows)

What Andreas is saying is that the patterns in the table have too
many backslashes.  The original query was

test=> select '+385911234567' ~ '^\\+38591\\d{7}$';?column? 
----------t
(1 row)

but if you select just the pattern you get

test=> select '^\\+38591\\d{7}$';   ?column?    
----------------^\+38591\d{7}$
(1 row)

which isn't what's in the table; somehow the patterns in the table
are over-escaped.  You might want to read up on how backslashes are
handled in quoted strings, especially when those strings are used
as regular expressions.  If you're using 8.0 or later then also
read about dollar-quoted strings.

http://www.postgresql.org/docs/8.1/interactive/sql-syntax.html#SQL-SYNTAX-STRINGS
http://www.postgresql.org/docs/8.1/interactive/functions-matching.html#FUNCTIONS-POSIX-REGEXP

(Use the documentation for whatever version of PostgreSQL you're
running; the above links have some comments that apply only to 8.1
and future versions).

-- 
Michael Fuhr


Re: Regular Expression Matching problem...

From
Mario Splivalo
Date:
On Wed, 2006-01-04 at 10:00 -0700, Michael Fuhr wrote:

> What Andreas is saying is that the patterns in the table have too
> many backslashes.  The original query was
> 
> test=> select '+385911234567' ~ '^\\+38591\\d{7}$';
>  ?column? 
> ----------
>  t
> (1 row)
> 
> but if you select just the pattern you get
> 
> test=> select '^\\+38591\\d{7}$';
>     ?column?    
> ----------------
>  ^\+38591\d{7}$
> (1 row)

I was a bit silly here. Dunno why I tought I need to escape the
backslashes when entering regexp as data in table. Thank you for
pointing that out.
Mike