Thread: Emulating flexible regex replace

Emulating flexible regex replace

From
twoflower
Date:
Hello,

my scenario is this: I have a *SEGMENT* table with two text fields, *source*
and *target*. From the user, I get the following input:

/source pattern/
/target pattern/

Where both patterns are regexes and moreover the target pattern contains
references to the source in the following way:

Supposing *source* matches the /source pattern/, the $/n/ expressions inside
the /target pattern/ correspond to the captured groups inside *source*.

Example:

Source: 123 source text
Target: 123 target text
Source pattern: ([0-9]+) source text
Target pattern: $1 target text

This yields a successful match since $1 in the /target pattern/ is replaced
by "123" from the first captured group in *source* and the resulting string,
"123 target text", matches the /target pattern/.

I would like to execute a query which for a given /source pattern/ and
/target pattern/ returns all rows from the *SEGMENT* table where *source*
matches the /source pattern/ and *target* matches the /target pattern/ after
it has its references replaced with the actual captured groups.

I believe this is not possible since *regexp_replace* expects a string as
its /replacement/ argument which is not enough in this case. This kind of
stuff is easy in e.g. C# where for regex replace you can provide a function
which receives the (in this case) reference index as its argument and you
can build the replacement string using external knowledge.

However, as I am no pro in Postgres, I may be missing something and
therefore I ask: is it possible to somehow mimic the behavior of
hypothetical *regexp_replace* which would accept a function of the
to-be-replaced value and would return the replacement string based on that.

And as I am thinking about it, even that would not suffice since that
function would need to access not only the to-be-replaced value but also the
corresponding source pattern match.

Still, isn't there some super clever way to do that?








--
View this message in context: http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824034.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Emulating flexible regex replace

From
Tom Lane
Date:
twoflower <standa.kurik@gmail.com> writes:
> Supposing *source* matches the /source pattern/, the $/n/ expressions inside
> the /target pattern/ correspond to the captured groups inside *source*.

> Example:

> Source: 123 source text
> Target: 123 target text
> Source pattern: ([0-9]+) source text
> Target pattern: $1 target text

> This yields a successful match since $1 in the /target pattern/ is replaced
> by "123" from the first captured group in *source* and the resulting string,
> "123 target text", matches the /target pattern/.

> I would like to execute a query which for a given /source pattern/ and
> /target pattern/ returns all rows from the *SEGMENT* table where *source*
> matches the /source pattern/ and *target* matches the /target pattern/ after
> it has its references replaced with the actual captured groups.

> I believe this is not possible since *regexp_replace* expects a string as
> its /replacement/ argument which is not enough in this case.

Well, you could pull out the source text captures with regexp_matches,
escape them somehow (don't think there's a built-in function for that),
insert them into the target pattern with regexp_replace, and then apply
the target pattern with a simple regexp match operator.  Kinda tedious,
but hardly "not possible".

A lot of people feel that this sort of text-mashing requirement is best
handled in plperl, but if you don't want to use that for some reason,
it's surely possible in plpgsql.

            regards, tom lane


Re: Emulating flexible regex replace

From
David G Johnston
Date:
twoflower wrote
> Source: 123 source text
> Target: 123 target text
> Source pattern: ([0-9]+) source text
> Target pattern: $1 target text
>
> Still, isn't there some super clever way to do that?

You use "\1" instead of "$1"

SELECT regexp_replace('123 abc','(\d+)\s(\w+)','\1 def'); --output: '123
def'

http://www.postgresql.org/docs/9.3/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP

9.7.3 - paragraph beginning "The regexp_replace function provides
substitution..."

David J.






--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824034p5824046.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Emulating flexible regex replace

From
David G Johnston
Date:
David G Johnston wrote
>
> twoflower wrote
>> Source: 123 source text
>> Target: 123 target text
>> Source pattern: ([0-9]+) source text
>> Target pattern: $1 target text
>>
>> Still, isn't there some super clever way to do that?
> You use "\1" instead of "$1"
>
> SELECT regexp_replace('123 abc','(\d+)\s(\w+)','\1 def'); --output: '123
> def'
>
> http://www.postgresql.org/docs/9.3/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP
>
> 9.7.3 - paragraph beginning "The regexp_replace function provides
> substitution..."
>
> David J.

<reading this a few more times>

Is it possible to express the WHERE clause as:

regexp_replace(source, source_pattern, target_pattern) = target

maybe with a substring check instead of equals?

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824034p5824065.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Emulating flexible regex replace

From
blackbee045
Date:
This yields a successful match since $1 in the /target pattern/ is replaced
> by "123" from the first captured group in *source* and the resulting
> string,
> "123 target text", matches the /target pattern/.

> I would like to execute a query which for a given /source pattern/ and
> /target pattern/ returns all rows from the *SEGMENT* table where *source*
> matches the /source pattern/ and *target* matches the /target pattern/
> after
> it has its references replaced with the actual captured groups.

> I believe this is not possible since *regexp_replace* expects a string as
> its /replacement/ argument which is not enough in this case.



-----
GuL
--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824034p5824107.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.