Thread: regexp_replace behavior

regexp_replace behavior

From
Marcin Krawczyk
Date:
Hi list,

I'm trying to use regexp_replace to get rid of all occurrences of certain sub strings from my string.
What I'm doing is:

SELECT regexp_replace('F0301 305-149-101-0 F0302 {x1} 12W47 0635H {tt}{POL23423423}', E'\{.+\}', '', 'g')

so get rid of whatever is between { } along with these,

but it results in:
'F0301 305-149-101-0 F0302 '

how do I get it to be:
'F0301 305-149-101-0 F0302 12W47 0635H'

??

as I understood the docs, the g flag "specifies replacement of each matching substring rather than only the first one" 
what am I missing ?

regards
mk

Re: regexp_replace behavior

From
Alvaro Herrera
Date:
Marcin Krawczyk escribió:
> Hi list,
>
> I'm trying to use regexp_replace to get rid of all occurrences of
> certain sub strings from my string.
> What I'm doing is:
>
> SELECT regexp_replace('F0301 305-149-101-0 F0302 {x1} 12W47 0635H
> {tt}{POL23423423}', E'\{.+\}', '', 'g')
>
> so get rid of whatever is between { } along with these,
>
> but it results in:
> 'F0301 305-149-101-0 F0302 '
>
> how do I get it to be:
> 'F0301 305-149-101-0 F0302 12W47 0635H'
>
> ??
>
> as I understood the docs, the g flag "specifies replacement of each
> matching substring rather than only the first one"

The first \{.+\} match starts at the first { and ends at the last },
eating the {s and }s in the middle.  So there's only one match and that's
what's removed.

> what am I missing ?

You need a non-greedy quantifier.  Try
SELECT regexp_replace('F0301 305-149-101-0 F0302 {x1} 12W47 0635H {tt}{POL23423423}', E'\{.+?\}', '', 'g')

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: regexp_replace behavior

From
Marcin Krawczyk
Date:
Yes that's exactly what I needed. Thanks a lot.

pozdrowienia
mk


2012/11/20 Alvaro Herrera <alvherre@2ndquadrant.com>
Marcin Krawczyk escribió:
> Hi list,
>
> I'm trying to use regexp_replace to get rid of all occurrences of
> certain sub strings from my string.
> What I'm doing is:
>
> SELECT regexp_replace('F0301 305-149-101-0 F0302 {x1} 12W47 0635H
> {tt}{POL23423423}', E'\{.+\}', '', 'g')
>
> so get rid of whatever is between { } along with these,
>
> but it results in:
> 'F0301 305-149-101-0 F0302 '
>
> how do I get it to be:
> 'F0301 305-149-101-0 F0302 12W47 0635H'
>
> ??
>
> as I understood the docs, the g flag "specifies replacement of each
> matching substring rather than only the first one"

The first \{.+\} match starts at the first { and ends at the last },
eating the {s and }s in the middle.  So there's only one match and that's
what's removed.

> what am I missing ?

You need a non-greedy quantifier.  Try

 SELECT regexp_replace('F0301 305-149-101-0 F0302 {x1} 12W47 0635H {tt}{POL23423423}', E'\{.+?\}', '', 'g')

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services