Thread: Emulating flexible regex replace
Hello, my scenario is this: I have a *SEGMENT* table with two text fields, *source* and *target*. From the user, I get the following input: /source pattern/ /target pattern/ Where both patterns are regexes and moreover the target pattern contains references to the source in the following way: Supposing *source* matches the /source pattern/, the $/n/ expressions inside the /target pattern/ correspond to the captured groups inside *source*. Example: Source: 123 source text Target: 123 target text Source pattern: ([0-9]+) source text Target pattern: $1 target text This yields a successful match since $1 in the /target pattern/ is replaced by "123" from the first captured group in *source* and the resulting string, "123 target text", matches the /target pattern/. I would like to execute a query which for a given /source pattern/ and /target pattern/ returns all rows from the *SEGMENT* table where *source* matches the /source pattern/ and *target* matches the /target pattern/ after it has its references replaced with the actual captured groups. I believe this is not possible since *regexp_replace* expects a string as its /replacement/ argument which is not enough in this case. This kind of stuff is easy in e.g. C# where for regex replace you can provide a function which receives the (in this case) reference index as its argument and you can build the replacement string using external knowledge. However, as I am no pro in Postgres, I may be missing something and therefore I ask: is it possible to somehow mimic the behavior of hypothetical *regexp_replace* which would accept a function of the to-be-replaced value and would return the replacement string based on that. And as I am thinking about it, even that would not suffice since that function would need to access not only the to-be-replaced value but also the corresponding source pattern match. Still, isn't there some super clever way to do that? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824034.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
twoflower <standa.kurik@gmail.com> writes: > Supposing *source* matches the /source pattern/, the $/n/ expressions inside > the /target pattern/ correspond to the captured groups inside *source*. > Example: > Source: 123 source text > Target: 123 target text > Source pattern: ([0-9]+) source text > Target pattern: $1 target text > This yields a successful match since $1 in the /target pattern/ is replaced > by "123" from the first captured group in *source* and the resulting string, > "123 target text", matches the /target pattern/. > I would like to execute a query which for a given /source pattern/ and > /target pattern/ returns all rows from the *SEGMENT* table where *source* > matches the /source pattern/ and *target* matches the /target pattern/ after > it has its references replaced with the actual captured groups. > I believe this is not possible since *regexp_replace* expects a string as > its /replacement/ argument which is not enough in this case. Well, you could pull out the source text captures with regexp_matches, escape them somehow (don't think there's a built-in function for that), insert them into the target pattern with regexp_replace, and then apply the target pattern with a simple regexp match operator. Kinda tedious, but hardly "not possible". A lot of people feel that this sort of text-mashing requirement is best handled in plperl, but if you don't want to use that for some reason, it's surely possible in plpgsql. regards, tom lane
twoflower wrote > Source: 123 source text > Target: 123 target text > Source pattern: ([0-9]+) source text > Target pattern: $1 target text > > Still, isn't there some super clever way to do that? You use "\1" instead of "$1" SELECT regexp_replace('123 abc','(\d+)\s(\w+)','\1 def'); --output: '123 def' http://www.postgresql.org/docs/9.3/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP 9.7.3 - paragraph beginning "The regexp_replace function provides substitution..." David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824034p5824046.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
David G Johnston wrote > > twoflower wrote >> Source: 123 source text >> Target: 123 target text >> Source pattern: ([0-9]+) source text >> Target pattern: $1 target text >> >> Still, isn't there some super clever way to do that? > You use "\1" instead of "$1" > > SELECT regexp_replace('123 abc','(\d+)\s(\w+)','\1 def'); --output: '123 > def' > > http://www.postgresql.org/docs/9.3/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP > > 9.7.3 - paragraph beginning "The regexp_replace function provides > substitution..." > > David J. <reading this a few more times> Is it possible to express the WHERE clause as: regexp_replace(source, source_pattern, target_pattern) = target maybe with a substring check instead of equals? David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824034p5824065.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
This yields a successful match since $1 in the /target pattern/ is replaced > by "123" from the first captured group in *source* and the resulting > string, > "123 target text", matches the /target pattern/. > I would like to execute a query which for a given /source pattern/ and > /target pattern/ returns all rows from the *SEGMENT* table where *source* > matches the /source pattern/ and *target* matches the /target pattern/ > after > it has its references replaced with the actual captured groups. > I believe this is not possible since *regexp_replace* expects a string as > its /replacement/ argument which is not enough in this case. ----- GuL -- View this message in context: http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824034p5824107.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.