Thread: Finding broken regex'es

Finding broken regex'es

From
Enrico Weigelt
Date:
Hi folks,


I'm looking for some way to find broken regex'es in some column
to kick them off. For now I'm regularily fetching all regexes
from an PHP script, try an preg_match() and so find the broken
ones to later remove them.

Is there any way to do this directly within the db ? 


thx
-- 
---------------------------------------------------------------------Enrico Weigelt    ==   metux IT service -
http://www.metux.de/
---------------------------------------------------------------------Please visit the OpenSource QM Taskforce:
http://wiki.metux.de/public/OpenSource_QM_TaskforcePatches/ Fixes for a lot dozens of packages in dozens of
versions:http://patches.metux.de/
---------------------------------------------------------------------


Re: Finding broken regex'es

From
Erik Jones
Date:
On Oct 2, 2007, at 3:23 PM, Enrico Weigelt wrote:

>
> Hi folks,
>
>
> I'm looking for some way to find broken regex'es in some column
> to kick them off. For now I'm regularily fetching all regexes
> from an PHP script, try an preg_match() and so find the broken
> ones to later remove them.
>
> Is there any way to do this directly within the db ?

IIRC, if they're PERL compatible which it would seem from the php
function you're using, no.  Postgres supports POSIX regexes but not
(right now anyway) PERL regexes.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com




Re: Finding broken regex'es

From
Tom Lane
Date:
Erik Jones <erik@myemma.com> writes:
> IIRC, if they're PERL compatible which it would seem from the php
> function you're using, no.  Postgres supports POSIX regexes but not
> (right now anyway) PERL regexes.

Actually what we support are Tcl (Henry Spencer) regexes, which are
about as powerful as Perl's but have a few minor incompatibilities.
Tcl and Perl regexes are both supersets of the POSIX requirements.
        regards, tom lane


Re: Finding broken regex'es

From
Erik Jones
Date:
On Oct 2, 2007, at 10:48 PM, Tom Lane wrote:

> Erik Jones <erik@myemma.com> writes:
>> IIRC, if they're PERL compatible which it would seem from the php
>> function you're using, no.  Postgres supports POSIX regexes but not
>> (right now anyway) PERL regexes.
>
> Actually what we support are Tcl (Henry Spencer) regexes, which are
> about as powerful as Perl's but have a few minor incompatibilities.
> Tcl and Perl regexes are both supersets of the POSIX requirements.
>

Excellent point of information.  I was really just trying to focus on
why the Postgres regex engine wouldn't be equivalent to the matching
functionality of php's preg_* functions.  One example would be look
behind assertions.  Btw, am I crazy or do I remember someone
mentioning that support for Perl regexes possibly being added to
Postgres in the future.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com




Re: Finding broken regex'es

From
Tom Lane
Date:
Erik Jones <erik@myemma.com> writes:
> Btw, am I crazy or do I remember someone  
> mentioning that support for Perl regexes possibly being added to  
> Postgres in the future.

You can already get exact-Perl-behavior regexes by means of a plperl
wrapper function.  I can't really see any plausible argument for us
supporting both behaviors natively --- IMHO the three regex flavors
supported by Spencer's library are already two too many.
        regards, tom lane


Re: Finding broken regex'es

From
"Dawid Kuroczko"
Date:
On 10/2/07, Enrico Weigelt <weigelt@metux.de> wrote:
>
> Hi folks,
>
>
> I'm looking for some way to find broken regex'es in some column
> to kick them off. For now I'm regularily fetching all regexes
> from an PHP script, try an preg_match() and so find the broken
> ones to later remove them.
>
> Is there any way to do this directly within the db ?

Of course.  Exceptions is what You need!

CREATE FUNCTION regex_is_broken(r text) RETURNS boolean AS $$   BEGIN       PERFORM '' ~ r;       RETURN 'f';
EXCEPTION      WHEN INVALID_REGULAR_EXPRESSION THEN           RETURN 't';   END;
 
$$ LANGUAGE PLpgSQL STRICT IMMUTABLE;

...and then you could do something like:

DELETE FROM table WHERE regex_is_broken(rx_col);


You don't need PLpgSQL to prevent such invalid regexes in the
first place.  You could use CHECK constraint for it:

CREATE TABLE rx_check (   rx text CHECK ('' ~ rx IN ('t','f'))
);

postgres=> INSERT INTO rx_check (rx) VALUES ('.*');
INSERT 0 1
Time: 13.660 ms
postgres=> INSERT INTO rx_check (rx) VALUES ('234234');
INSERT 0 1
Time: 2.282 ms
postgres=> INSERT INTO rx_check (rx) VALUES ('par).*');
ERROR:  invalid regular expression: parentheses () not balanced
  Regards,    Dawid


Re: Finding broken regex'es

From
"Filip Rembiałkowski"
Date:
2007/10/3, Dawid Kuroczko <qnex42@gmail.com>:

> CREATE TABLE rx_check (
>     rx text CHECK ('' ~ rx IN ('t','f'))
> );

wow. This is beautiful :)


Re: Finding broken regex'es

From
"Dawid Kuroczko"
Date:
On 10/3/07, Filip Rembiałkowski <plk.zuber@gmail.com> wrote:
> 2007/10/3, Dawid Kuroczko <qnex42@gmail.com>:
>
> > CREATE TABLE rx_check (
> >     rx text CHECK ('' ~ rx IN ('t','f'))
> > );
>
> wow. This is beautiful :)

Personally I would wrap it around DOMAIN, i.e.:

CREATE DOMAIN regex AS   text CHECK ('' ~ VALUE IN (TRUE,FALSE));

And then use 'regex' type instead of 'text'.  For a nice look&feel:

CREATE TABLE rx_test (   rx regex
);

qnex=>insert into rx_test values ('.*');
INSERT 0 1
qnex=>insert into rx_test values ('qwe');
INSERT 0 1
qnex=>insert into rx_test values ('aaa(aaa');
ERROR:  invalid regular expression: parentheses () not balanced

Accessing field of OLD in trigger

From
Daniel Drotos
Date:
Hi,

I'm working on a row level plpgsql trigger running after delete, using 
a 8.0.3 server. It gets a parameter which is a field name of the OLD 
record. How can that field be accessed?

I'd like to do something like:

for recvar in 'select OLD.'||quote_ident(TG_ARGV[0])...

Daniel