Re: Pathological regexp match - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: Pathological regexp match
Date
Msg-id 20100129042142.GF1793@alvh.no-ip.org
Whole thread Raw
In response to Re: Pathological regexp match  (Michael Glaesemann <michael.glaesemann@myyearbook.com>)
Responses Re: Pathological regexp match  (Michael Glaesemann <michael.glaesemann@myyearbook.com>)
List pgsql-hackers
Michael Glaesemann wrote:

> However, as you point out, Postgres doesn't appear to take this into
> account:
> 
> postgres=# select regexp_replace('oooZQoooAoooQooQooQooo', $r$(Z(Q)
> [^Q]*A.*(\2))$r$, $s$X$s$);
>  regexp_replace
> ----------------
>  oooXooo
> (1 row)
> 
> postgres=# select regexp_replace('oooZQoooAoooQooQooQooo', $r$(Z(Q)
> [^Q]*A.*?(\2))$r$, $s$X$s$);
>  regexp_replace
> ----------------
>  oooXooo
> (1 row)

I think the reason for this is that the first * is greedy and thus the
entire expression is considered greedy.  The fact that you've made the
second * non-greedy does not ungreedify the RE ... Note the docs say:
The above rules associate greediness attributes not only withindividual quantified atoms, but with branches and entire
REsthatcontain quantified atoms. What that means is that thematching is done in such a way that the branch, or whole
RE,matchesthe longest or shortest possible substring as a whole.
 

It's late here so I'm not sure if this is what you're looking for:

alvherre=# select regexp_replace('oooZQoooAoooQooQooQooo', $r$(Z(Q)[^Q]*?A.*(\2))$r$, $s$X$s$);regexp_replace 
----------------oooXooQooQooo
(1 fila)

(Obviously the non-greediness has moved somewhere else) :-(

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: out-of-scope cursor errors
Next
From: Michael Glaesemann
Date:
Subject: Re: Pathological regexp match