Thread: replace inside regexp_replace

replace inside regexp_replace

From
Oliver Kohll
Date:
Hi,

I have some text

'here is [[my text]] to replace and [[some more]]'

which I want to transform to

'here is my_text to replace and some_more'

i.e. wherever there are double square brackets, remove them and replace spaces in the contents with underscores.

My attempt to do that is the regex

select regexp_replace(
'here is [[my text]] to replace and [[some more]]',
E'\\[\\[(.*?)\\]\\]',
replace(E'\\1', ' ', '_'),
'g'
);

which results in

'here is my text to replace and some more'

It half works, i.e. it removes the brackets but doesn't seem to process the inner replace. It's as if the select were just

select regexp_replace(
'here is [[my text]] to replace and [[some more]]',
E'\\[\\[(.*?)\\]\\]',
E'\\1',
'g'
);

I've a feeling I'm missing something fundamental, any idea what?

Thanks
Oliver

Re: replace inside regexp_replace

From
Francisco Olarte
Date:
Oliver:

On Mon, Jun 21, 2021 at 3:27 PM Oliver Kohll <oliver@agilechilli.com> wrote:
...
> My attempt to do that is the regex
> select regexp_replace(
> 'here is [[my text]] to replace and [[some more]]',
> E'\\[\\[(.*?)\\]\\]',
> replace(E'\\1', ' ', '_'),
> 'g'
> );
> which results in
> 'here is my text to replace and some more'

> It half works, i.e. it removes the brackets but doesn't seem to process the inner replace. It's as if the select were
just
> select regexp_replace(
> 'here is [[my text]] to replace and [[some more]]',
> E'\\[\\[(.*?)\\]\\]',
> E'\\1',
> 'g'
> );

> I've a feeling I'm missing something fundamental, any idea what?

You are assuming replace will magically work in a way it does not. The
inner replace is evaluated first:

> select replace(E'\\1', ' ', '_');
 replace
---------
 \1

and it's result is passed as 3rd argument to the outer replace, so
both select are equivalent.

What you want to do can be done in some languages passing a closure,
or a function, to their replace function, or with special forms ( like
the e modifier in perl s/// ), but I'm not sure it can be done.

On languages with basic regex support, like I think SQL is, you
normally have to either split the string in match/no match or do a
multiple match ( match something like (.*?)\[\[(.*?)\]\]  with two
captures ) and loop in the result aplying your second replacement (
which is what perl does behind the scenes, and other languages do )

In perl you can do it with something like:

$ perl -pe 's{\[\[(.*?)\]\]}{ $1=~s/ /_/gr}eg'
here is [[my text]] to replace and [[some more]]',
here is my_text to replace and some_more',

But note the magic e there.

In python you can use the function form:

re.sub(pattern, repl, string, count=0, flags=0)

Return the string obtained by replacing ......repl can be a string or
a function; if it is a string,....
If repl is a function, it is called for every non-overlapping
occurrence of pattern. The function takes a single match object
argument, and returns the replacement string.

An so on on other languages, but in sql

regexp_replace ( string text, pattern text, replacement text [, flags
text ] ) → text

The replacement is a plain text ( and AFAIK you cannot use functions
as values in sql ).

You could probably define your function doing that if you have any PL
installed in your DB.

Francisco Olarte.



Re: replace inside regexp_replace

From
hubert depesz lubaczewski
Date:
On Mon, Jun 21, 2021 at 02:27:22PM +0100, Oliver Kohll wrote:
> It half works, i.e. it removes the brackets but doesn't seem to process the
> inner replace. It's as if the select were just
> select regexp_replace(
> 'here is [[my text]] to replace and [[some more]]',
> E'\\[\\[(.*?)\\]\\]',
> E'\\1',
> 'g'
> );
> I've a feeling I'm missing something fundamental, any idea what?

\1 works only if it's argument to regexp_replace.
And regexp_replace can't call any other functions.

What you could do is:
$ select string_agg(x[1] || replace(x[2], ' ', '_') || x[3], '') from regexp_matches( 'here is [[my text]] to replace
and[[some more]] and maybe [[a bit longer]] too', '(.*?)\[\[(.*?)\]\](.*?)', 'g') x;
 
                           string_agg                            
─────────────────────────────────────────────────────────────────
 here is my_text to replace and some_more and maybe a_bit_longer
(1 row)

Or just use plperl, pl/python, or anything like this.

Best regards,

depesz




Re: replace inside regexp_replace

From
"David G. Johnston"
Date:
On Monday, June 21, 2021, Oliver Kohll <oliver@agilechilli.com> wrote:

select regexp_replace(
'here is [[my text]] to replace and [[some more]]',
E'\\[\\[(.*?)\\]\\]',
replace(E'\\1', ' ', '_'),
'g'
);

Side note, you seldom want to use “E” (escape) string literals with regexes (or in general really) using just the simple literal syntax removes the need for double-backslashing.

David J. 

Re: replace inside regexp_replace

From
Oliver Kohll
Date:
On Mon, 21 Jun 2021 at 15:09, Francisco Olarte <folarte@peoplecall.com> wrote:
Oliver:

On Mon, Jun 21, 2021 at 3:27 PM Oliver Kohll <oliver@agilechilli.com> wrote:
...
> My attempt to do that is the regex
> select regexp_replace(
> 'here is [[my text]] to replace and [[some more]]',
> E'\\[\\[(.*?)\\]\\]',
> replace(E'\\1', ' ', '_'),
> 'g'
> );
> which results in
> 'here is my text to replace and some more'

> It half works, i.e. it removes the brackets but doesn't seem to process the inner replace. It's as if the select were just
> select regexp_replace(
> 'here is [[my text]] to replace and [[some more]]',
> E'\\[\\[(.*?)\\]\\]',
> E'\\1',
> 'g'
> );

> I've a feeling I'm missing something fundamental, any idea what?

You are assuming replace will magically work in a way it does not. The
inner replace is evaluated first:

> select replace(E'\\1', ' ', '_');
 replace
---------
 \1

and it's result is passed as 3rd argument to the outer replace, so
both select are equivalent.

What you want to do can be done in some languages passing a closure,
or a function, to their replace function, or with special forms ( like
the e modifier in perl s/// ), but I'm not sure it can be done.

On languages with basic regex support, like I think SQL is, you
normally have to either split the string in match/no match or do a
multiple match ( match something like (.*?)\[\[(.*?)\]\]  with two
captures ) and loop in the result aplying your second replacement (
which is what perl does behind the scenes, and other languages do )

In perl you can do it with something like:

$ perl -pe 's{\[\[(.*?)\]\]}{ $1=~s/ /_/gr}eg'
here is [[my text]] to replace and [[some more]]',
here is my_text to replace and some_more',

But note the magic e there.

In python you can use the function form:

re.sub(pattern, repl, string, count=0, flags=0)

Return the string obtained by replacing ......repl can be a string or
a function; if it is a string,....
If repl is a function, it is called for every non-overlapping
occurrence of pattern. The function takes a single match object
argument, and returns the replacement string.

An so on on other languages, but in sql

regexp_replace ( string text, pattern text, replacement text [, flags
text ] ) → text

The replacement is a plain text ( and AFAIK you cannot use functions
as values in sql ).

You could probably define your function doing that if you have any PL
installed in your DB.

Francisco Olarte.

Right, thanks, I have a better understanding now. The calling app is written in Java so I will write a routine there to do it instead.

Cheers
Oliver