Thread: replace inside regexp_replace
Hi,
I have some text
'here is [[my text]] to replace and [[some more]]'
which I want to transform to
'here is my_text to replace and some_more'
i.e. wherever there are double square brackets, remove them and replace spaces in the contents with underscores.
My attempt to do that is the regex
select regexp_replace(
'here is [[my text]] to replace and [[some more]]',
E'\\[\\[(.*?)\\]\\]',
replace(E'\\1', ' ', '_'),
'g'
);
E'\\[\\[(.*?)\\]\\]',
replace(E'\\1', ' ', '_'),
'g'
);
which results in
'here is my text to replace and some more'
It half works, i.e. it removes the brackets but doesn't seem to process the inner replace. It's as if the select were just
select regexp_replace(
'here is [[my text]] to replace and [[some more]]',
E'\\[\\[(.*?)\\]\\]',
E'\\1',
'g'
);
E'\\[\\[(.*?)\\]\\]',
E'\\1',
'g'
);
I've a feeling I'm missing something fundamental, any idea what?
Thanks
Oliver
Oliver: On Mon, Jun 21, 2021 at 3:27 PM Oliver Kohll <oliver@agilechilli.com> wrote: ... > My attempt to do that is the regex > select regexp_replace( > 'here is [[my text]] to replace and [[some more]]', > E'\\[\\[(.*?)\\]\\]', > replace(E'\\1', ' ', '_'), > 'g' > ); > which results in > 'here is my text to replace and some more' > It half works, i.e. it removes the brackets but doesn't seem to process the inner replace. It's as if the select were just > select regexp_replace( > 'here is [[my text]] to replace and [[some more]]', > E'\\[\\[(.*?)\\]\\]', > E'\\1', > 'g' > ); > I've a feeling I'm missing something fundamental, any idea what? You are assuming replace will magically work in a way it does not. The inner replace is evaluated first: > select replace(E'\\1', ' ', '_'); replace --------- \1 and it's result is passed as 3rd argument to the outer replace, so both select are equivalent. What you want to do can be done in some languages passing a closure, or a function, to their replace function, or with special forms ( like the e modifier in perl s/// ), but I'm not sure it can be done. On languages with basic regex support, like I think SQL is, you normally have to either split the string in match/no match or do a multiple match ( match something like (.*?)\[\[(.*?)\]\] with two captures ) and loop in the result aplying your second replacement ( which is what perl does behind the scenes, and other languages do ) In perl you can do it with something like: $ perl -pe 's{\[\[(.*?)\]\]}{ $1=~s/ /_/gr}eg' here is [[my text]] to replace and [[some more]]', here is my_text to replace and some_more', But note the magic e there. In python you can use the function form: re.sub(pattern, repl, string, count=0, flags=0) Return the string obtained by replacing ......repl can be a string or a function; if it is a string,.... If repl is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string. An so on on other languages, but in sql regexp_replace ( string text, pattern text, replacement text [, flags text ] ) → text The replacement is a plain text ( and AFAIK you cannot use functions as values in sql ). You could probably define your function doing that if you have any PL installed in your DB. Francisco Olarte.
On Mon, Jun 21, 2021 at 02:27:22PM +0100, Oliver Kohll wrote: > It half works, i.e. it removes the brackets but doesn't seem to process the > inner replace. It's as if the select were just > select regexp_replace( > 'here is [[my text]] to replace and [[some more]]', > E'\\[\\[(.*?)\\]\\]', > E'\\1', > 'g' > ); > I've a feeling I'm missing something fundamental, any idea what? \1 works only if it's argument to regexp_replace. And regexp_replace can't call any other functions. What you could do is: $ select string_agg(x[1] || replace(x[2], ' ', '_') || x[3], '') from regexp_matches( 'here is [[my text]] to replace and[[some more]] and maybe [[a bit longer]] too', '(.*?)\[\[(.*?)\]\](.*?)', 'g') x; string_agg ───────────────────────────────────────────────────────────────── here is my_text to replace and some_more and maybe a_bit_longer (1 row) Or just use plperl, pl/python, or anything like this. Best regards, depesz
On Monday, June 21, 2021, Oliver Kohll <oliver@agilechilli.com> wrote:
select regexp_replace('here is [[my text]] to replace and [[some more]]',
E'\\[\\[(.*?)\\]\\]',
replace(E'\\1', ' ', '_'),
'g'
);
Side note, you seldom want to use “E” (escape) string literals with regexes (or in general really) using just the simple literal syntax removes the need for double-backslashing.
David J.
On Mon, 21 Jun 2021 at 15:09, Francisco Olarte <folarte@peoplecall.com> wrote:
Oliver:
On Mon, Jun 21, 2021 at 3:27 PM Oliver Kohll <oliver@agilechilli.com> wrote:
...
> My attempt to do that is the regex
> select regexp_replace(
> 'here is [[my text]] to replace and [[some more]]',
> E'\\[\\[(.*?)\\]\\]',
> replace(E'\\1', ' ', '_'),
> 'g'
> );
> which results in
> 'here is my text to replace and some more'
> It half works, i.e. it removes the brackets but doesn't seem to process the inner replace. It's as if the select were just
> select regexp_replace(
> 'here is [[my text]] to replace and [[some more]]',
> E'\\[\\[(.*?)\\]\\]',
> E'\\1',
> 'g'
> );
> I've a feeling I'm missing something fundamental, any idea what?
You are assuming replace will magically work in a way it does not. The
inner replace is evaluated first:
> select replace(E'\\1', ' ', '_');
replace
---------
\1
and it's result is passed as 3rd argument to the outer replace, so
both select are equivalent.
What you want to do can be done in some languages passing a closure,
or a function, to their replace function, or with special forms ( like
the e modifier in perl s/// ), but I'm not sure it can be done.
On languages with basic regex support, like I think SQL is, you
normally have to either split the string in match/no match or do a
multiple match ( match something like (.*?)\[\[(.*?)\]\] with two
captures ) and loop in the result aplying your second replacement (
which is what perl does behind the scenes, and other languages do )
In perl you can do it with something like:
$ perl -pe 's{\[\[(.*?)\]\]}{ $1=~s/ /_/gr}eg'
here is [[my text]] to replace and [[some more]]',
here is my_text to replace and some_more',
But note the magic e there.
In python you can use the function form:
re.sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing ......repl can be a string or
a function; if it is a string,....
If repl is a function, it is called for every non-overlapping
occurrence of pattern. The function takes a single match object
argument, and returns the replacement string.
An so on on other languages, but in sql
regexp_replace ( string text, pattern text, replacement text [, flags
text ] ) → text
The replacement is a plain text ( and AFAIK you cannot use functions
as values in sql ).
You could probably define your function doing that if you have any PL
installed in your DB.
Francisco Olarte.
Right, thanks, I have a better understanding now. The calling app is written in Java so I will write a routine there to do it instead.
Cheers
Oliver