Thread: Postgres regexp matching failure?

Postgres regexp matching failure?

From
Mario Splivalo
Date:
I have an regular expression, wich works fine in Java or python, but I
don't seem to be able to make it work in postgres. Regular expression
needs to match everything begining with '+' (plus sign) or letters
'STOP', 'stop', 'StoP', or any other combination pronounced 'stop'.

Here is the python example:

>>> import re
>>> p = re.compile(r'^\s*(?:[\+|-]|(?:[sS][tT][oO][pP]\b)).*$')
>>> p.match('+mario').group(0)
'+mario'
>>> p.match('+mario works').group(0)
'+mario works'
>>> p.match('mario works').group(0)
Traceback (most recent call last): File "<stdin>", line 1, in ?
AttributeError: 'NoneType' object has no attribute 'group'
>>> p.match('stop works').group(0)
'stop works'
>>>

Now, here is what happens if I try this in postgres:

pulitzer2=# select '+mario' ~ '^\s*(?:[\+|-]|(?:[sS][tT][oO][pP]\b)).*
$';?column?
----------t
(1 row)

This one is ok.



pulitzer2=# select '+mario works' ~ '^\s*(?:[\
+|-]|(?:[sS][tT][oO][pP]\b)).*$';?column?
----------t
(1 row)



This one is also ok.

pulitzer2=# select 'mario works' ~ '^\s*(?:[\
+|-]|(?:[sS][tT][oO][pP]\b)).*$';?column?
----------f
(1 row)

Same as this one, also ok.


pulitzer2=# select 'stop works' ~ '^\s*(?:[\
+|-]|(?:[sS][tT][oO][pP]\b)).*$';?column?
----------f
(1 row)

Here, postgres should return true, but it gives me false. 

I'd appreciate any hints on what is wrong here.

Thank you in advance,
Mario
-- 
Mario Splivalo
Mob-Art
mario.splivalo@mobart.hr

"I can do it quick, I can do it cheap, I can do it well. Pick any two."




Re: Postgres regexp matching failure?

From
"Aaron Bono"
Date:
On 9/5/06, Mario Splivalo <mario.splivalo@mobart.hr> wrote:

pulitzer2=# select 'stop works' ~ '^\s*(?:[\
+|-]|(?:[sS][tT][oO][pP]\b)).*$';
?column?
----------
f
(1 row)

Here, postgres should return true, but it gives me false.

 
\b is a back-space - is that what you are wanting there?  If I remove it I get true.

==================================================================
   Aaron Bono
   Aranya Software Technologies, Inc.
   http://www.aranya.com
   http://codeelixir.com
==================================================================

Re: Postgres regexp matching failure?

From
Tom Lane
Date:
Mario Splivalo <mario.splivalo@mobart.hr> writes:
> Now, here is what happens if I try this in postgres:

> pulitzer2=# select '+mario' ~ '^\s*(?:[\+|-]|(?:[sS][tT][oO][pP]\b)).*$';

I'm thinking you've forgotten to double your backslashes.
        regards, tom lane


Re: Postgres regexp matching failure?

From
Mario Splivalo
Date:
On Tue, 2006-09-05 at 10:11 -0400, Tom Lane wrote:
> Mario Splivalo <mario.splivalo@mobart.hr> writes:
> > Now, here is what happens if I try this in postgres:
> 
> > pulitzer2=# select '+mario' ~ '^\s*(?:[\+|-]|(?:[sS][tT][oO][pP]\b)).*$';
> 
> I'm thinking you've forgotten to double your backslashes.
> 

That was the first thing I tried :)

What confused me was the difference I get when using python, or perl, or
Java, or any other regexp 'matcher', therefore I asked for help here.
Mario
-- 
Mario Splivalo
Mob-Art
mario.splivalo@mobart.hr

"I can do it quick, I can do it cheap, I can do it well. Pick any two."




Re: Postgres regexp matching failure?

From
Mario Splivalo
Date:
On Tue, 2006-09-05 at 08:42 -0500, Aaron Bono wrote:
> On 9/5/06, Mario Splivalo <mario.splivalo@mobart.hr> wrote:
>         
>         pulitzer2=# select 'stop works' ~ '^\s*(?:[\
>         +|-]|(?:[sS][tT][oO][pP]\b)).*$';
>         ?column?
>         ----------
>         f
>         (1 row)
>         
>         Here, postgres should return true, but it gives me false.
> 
>  
> \b is a back-space - is that what you are wanting there?  If I remove
> it I get true.

Actually, I'm not sure :) As I've mentioned, python/java/perl do as I
expected, postgres on the other hand doesn't. If \b was the backspace,
then I'd have trouble with '+mario test', and that one seems to be OK. 

I also tried doublebackslashing, all the same.

So, I guess it's obvious that postgres doesn't treat regular expressions
the same way as java/perl/pyton/php/awk/sed do...
Mario
-- 
Mario Splivalo
Mob-Art
mario.splivalo@mobart.hr

"I can do it quick, I can do it cheap, I can do it well. Pick any two."




Re: Postgres regexp matching failure?

From
Alvaro Herrera
Date:
Mario Splivalo wrote:
> On Tue, 2006-09-05 at 08:42 -0500, Aaron Bono wrote:
> > On 9/5/06, Mario Splivalo <mario.splivalo@mobart.hr> wrote:
> >         
> >         pulitzer2=# select 'stop works' ~ '^\s*(?:[\
> >         +|-]|(?:[sS][tT][oO][pP]\b)).*$';
> >         ?column?
> >         ----------
> >         f
> >         (1 row)
> >         
> >         Here, postgres should return true, but it gives me false.
> > 
> >  
> > \b is a back-space - is that what you are wanting there?  If I remove
> > it I get true.
> 
> Actually, I'm not sure :) As I've mentioned, python/java/perl do as I
> expected, postgres on the other hand doesn't. If \b was the backspace,
> then I'd have trouble with '+mario test', and that one seems to be OK. 

No, because the \b is inside the "stop" arm of the |.  You need to do
*both*, double backslashes and get rid of \b (or at least understand
what you actually mean with it ...)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Postgres regexp matching failure?

From
Mario Splivalo
Date:
On Tue, 2006-09-05 at 10:21 -0400, Alvaro Herrera wrote:
> Mario Splivalo wrote:
> > On Tue, 2006-09-05 at 08:42 -0500, Aaron Bono wrote:
> > > On 9/5/06, Mario Splivalo <mario.splivalo@mobart.hr> wrote:
> > >         
> > >         pulitzer2=# select 'stop works' ~ '^\s*(?:[\
> > >         +|-]|(?:[sS][tT][oO][pP]\b)).*$';
> > >         ?column?
> > >         ----------
> > >         f
> > >         (1 row)
> > >         
> > >         Here, postgres should return true, but it gives me false.
> > > 
> > >  
> > > \b is a back-space - is that what you are wanting there?  If I remove
> > > it I get true.
> > 
> > Actually, I'm not sure :) As I've mentioned, python/java/perl do as I
> > expected, postgres on the other hand doesn't. If \b was the backspace,
> > then I'd have trouble with '+mario test', and that one seems to be OK. 
> 
> No, because the \b is inside the "stop" arm of the |.  You need to do
> *both*, double backslashes and get rid of \b (or at least understand
> what you actually mean with it ...)
> 

I know this might not be the right place for this question, but, how
come (or better: why?) is above regexp macthed ok (ok as in 'the way I
expected') when employed from java/perl/python?
Mario



Re: Postgres regexp matching failure?

From
Alvaro Herrera
Date:
Mario Splivalo wrote:
> On Tue, 2006-09-05 at 10:21 -0400, Alvaro Herrera wrote:
> > Mario Splivalo wrote:
> > > On Tue, 2006-09-05 at 08:42 -0500, Aaron Bono wrote:
> > > > On 9/5/06, Mario Splivalo <mario.splivalo@mobart.hr> wrote:
> > > >         
> > > >         pulitzer2=# select 'stop works' ~ '^\s*(?:[\
> > > >         +|-]|(?:[sS][tT][oO][pP]\b)).*$';
> > > >         ?column?
> > > >         ----------
> > > >         f
> > > >         (1 row)
> > > >         
> > > >         Here, postgres should return true, but it gives me false.
> > > > 
> > > >  
> > > > \b is a back-space - is that what you are wanting there?  If I remove
> > > > it I get true.
> > > 
> > > Actually, I'm not sure :) As I've mentioned, python/java/perl do as I
> > > expected, postgres on the other hand doesn't. If \b was the backspace,
> > > then I'd have trouble with '+mario test', and that one seems to be OK. 
> > 
> > No, because the \b is inside the "stop" arm of the |.  You need to do
> > *both*, double backslashes and get rid of \b (or at least understand
> > what you actually mean with it ...)
> > 
> 
> I know this might not be the right place for this question, but, how
> come (or better: why?) is above regexp macthed ok (ok as in 'the way I
> expected') when employed from java/perl/python?

In Perl at least, \b is a word boundary.  In PostgreSQL (and probably
Tcl as well) it's a backslash AFAICT.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Postgres regexp matching failure?

From
Tom Lane
Date:
Mario Splivalo <mario.splivalo@mobart.hr> writes:
> So, I guess it's obvious that postgres doesn't treat regular expressions
> the same way as java/perl/pyton/php/awk/sed do...

When you get into stuff as arcane as word-boundary constraints, you'll
find that regexes are not NEARLY as well standardized as that flippant
complaint suggests.  For penance, actually try it in all six of those
languages and report back.

Postgres' regexp code is the same as Tcl's (it's Henry Spencer's package)
and if you check TFM you will find out that \y, or possibly \M, is what
you want.
        regards, tom lane


Re: Postgres regexp matching failure?

From
"Aaron Bono"
Date:
On 9/5/06, Alvaro Herrera <alvherre@commandprompt.com> wrote:

In Perl at least, \b is a word boundary.  In PostgreSQL (and probably
Tcl as well) it's a backslash AFAICT.

More specifically, Perl, Java and Python interpret \b as a backspace in the Character class only (got that from the Regular Expression pocket reference) but otherwise are word boundries.  I am not entirely sure what the Character class context is referring to but it does show that \b is interpreted differently under different contexts.  PostgreSQL may not have a way (or at least a very good way) to make this distinction.

Any other thoughts?

==================================================================
   Aaron Bono
   Aranya Software Technologies, Inc.
   http://www.aranya.com
   http://codeelixir.com
==================================================================

Re: Postgres regexp matching failure?

From
Mario Splivalo
Date:
On Tue, 2006-09-05 at 11:22 -0400, Tom Lane wrote:
> Mario Splivalo <mario.splivalo@mobart.hr> writes:
> > So, I guess it's obvious that postgres doesn't treat regular expressions
> > the same way as java/perl/pyton/php/awk/sed do...
> 
> When you get into stuff as arcane as word-boundary constraints, you'll
> find that regexes are not NEARLY as well standardized as that flippant
> complaint suggests.  For penance, actually try it in all six of those
> languages and report back.

So I did, and I was mistaken the first time. Java, perl, python and php
treat regexpes differently than awk/sed and differently than tcl. I can
post source snippets and results here if anyone is interested in those.

> Postgres' regexp code is the same as Tcl's (it's Henry Spencer's package)
> and if you check TFM you will find out that \y, or possibly \M, is what
> you want.

Thnx, I missed that part that 'postgres has same regexp code as Tcl'.
Mike
-- 
Mario Splivalo
Mob-Art
mario.splivalo@mobart.hr

"I can do it quick, I can do it cheap, I can do it well. Pick any two."