Thread: Regular expression question with Postgres

Regular expression question with Postgres

From
Mike Christensen
Date:
I'm curious why this query returns 0:

SELECT 'AAA' ~ '^A{,4}$'

Yet, this query returns 1:

SELECT 'AAA' ~ '^A{0,4}$'

Is this a bug with the regular expression engine?  

Re: Regular expression question with Postgres

From
David G Johnston
Date:
Mike Christensen-2 wrote
> I'm curious why this query returns 0:
>
> SELECT 'AAA' ~ '^A{,4}$'
>
> Yet, this query returns 1:
>
> SELECT 'AAA' ~ '^A{0,4}$'
>
> Is this a bug with the regular expression engine?

Apparently since "{,#}" is not a valid regexp expression the engine simply
interprets it as a literal and says 'AAA' != 'A{,4}'

http://www.postgresql.org/docs/9.3/interactive/functions-matching.html#FUNCTIONS-POSIX-REGEXP

Table 9-13. Regular Expression Quantifiers

Note the all of the { } expressions have a lower bound (whether explicit or
implied).

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Regular-expression-question-with-Postgres-tp5812777p5812778.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Regular expression question with Postgres

From
Mike Christensen
Date:
Yea seems right.  I was testing the expression on Rubular (Which uses the Ruby parser) and it worked.  I guess Ruby allows this non-standard expression with the missing lower bounds.  Every reference I could find, though, agrees only the upper bound is optional.


On Thu, Jul 24, 2014 at 1:42 PM, David G Johnston <david.g.johnston@gmail.com> wrote:
Mike Christensen-2 wrote
> I'm curious why this query returns 0:
>
> SELECT 'AAA' ~ '^A{,4}$'
>
> Yet, this query returns 1:
>
> SELECT 'AAA' ~ '^A{0,4}$'
>
> Is this a bug with the regular expression engine?

Apparently since "{,#}" is not a valid regexp expression the engine simply
interprets it as a literal and says 'AAA' != 'A{,4}'

http://www.postgresql.org/docs/9.3/interactive/functions-matching.html#FUNCTIONS-POSIX-REGEXP

Table 9-13. Regular Expression Quantifiers

Note the all of the { } expressions have a lower bound (whether explicit or
implied).

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/Regular-expression-question-with-Postgres-tp5812777p5812778.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Regular expression question with Postgres

From
Tom Lane
Date:
Mike Christensen <mike@kitchenpc.com> writes:
> I'm curious why this query returns 0:
> SELECT 'AAA' ~ '^A{,4}$'

> Yet, this query returns 1:

> SELECT 'AAA' ~ '^A{0,4}$'

> Is this a bug with the regular expression engine?

Our regex documentation lists the following variants of bounds syntax:
    {m}
    {m,}
    {m,n}
Nothing about {,n}.  I rather imagine that the engine is deciding that
that's just literal text and not a bounds constraint ...

regression=# SELECT 'A{,4}' ~ '^A{,4}$';
 ?column?
----------
 t
(1 row)

... yup, apparently so.

A look at the POSIX standard says that it has the same idea of what
is a valid bounds constraint:

    When an ERE matching a single character or an ERE enclosed in
    parentheses is followed by an interval expression of the format
    "{m}", "{m,}", or "{m,n}", together with that interval expression
    it shall match what repeated consecutive occurrences of the ERE
    would match. The values of m and n are decimal integers in the
    range 0 <= m<= n<= {RE_DUP_MAX}, where m specifies the exact or
    minimum number of occurrences and n specifies the maximum number
    of occurrences. The expression "{m}" matches exactly m occurrences
    of the preceding ERE, "{m,}" matches at least m occurrences, and
    "{m,n}" matches any number of occurrences between m and n,
    inclusive.

            regards, tom lane


Re: Regular expression question with Postgres

From
Mike Christensen
Date:
Yea looks like Postgres has it right, well.. per POSIX standard anyway.  JavaScript also has it right, as does Python and .NET.  Ruby is just weird.


On Thu, Jul 24, 2014 at 1:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mike Christensen <mike@kitchenpc.com> writes:
> I'm curious why this query returns 0:
> SELECT 'AAA' ~ '^A{,4}$'

> Yet, this query returns 1:

> SELECT 'AAA' ~ '^A{0,4}$'

> Is this a bug with the regular expression engine?

Our regex documentation lists the following variants of bounds syntax:
    {m}
    {m,}
    {m,n}
Nothing about {,n}.  I rather imagine that the engine is deciding that
that's just literal text and not a bounds constraint ...

regression=# SELECT 'A{,4}' ~ '^A{,4}$';
 ?column?
----------
 t
(1 row)

... yup, apparently so.

A look at the POSIX standard says that it has the same idea of what
is a valid bounds constraint:

        When an ERE matching a single character or an ERE enclosed in
        parentheses is followed by an interval expression of the format
        "{m}", "{m,}", or "{m,n}", together with that interval expression
        it shall match what repeated consecutive occurrences of the ERE
        would match. The values of m and n are decimal integers in the
        range 0 <= m<= n<= {RE_DUP_MAX}, where m specifies the exact or
        minimum number of occurrences and n specifies the maximum number
        of occurrences. The expression "{m}" matches exactly m occurrences
        of the preceding ERE, "{m,}" matches at least m occurrences, and
        "{m,n}" matches any number of occurrences between m and n,
        inclusive.

                        regards, tom lane