Thread: replace text occurrences loaded from table

replace text occurrences loaded from table

From
jan zimmek
Date:
hello,

i am actually trying to replace all occurences in a text column with some value, but the occurrences to replace are
definedin a table. this is a simplified version of my schema: 

create temporary table tmp_vars as select var from (values('ABC'),('XYZ'),('VAR123')) entries (var);
create temporary table tmp_messages as select message from (values('my ABC is XYZ'),('the XYZ is very VAR123'))
messages(message); 

select * from tmp_messages;

my ABC is XYZ -- row 1
the XYZ is very VAR123 -- row 2

now i need to somehow update the rows in tmp_messages, so that after the update i get the following:

select * from tmp_messages;

my XXX is XXX -- row 1
the XXX is very XXX -- row 2

i have implemented a solution in plpgsql by doing a nested for-loop over tmp_vars and tmp_messages, but i would like to
knowif there is a more efficient way to solve this problem ? 


best regards
jan


Re: replace text occurrences loaded from table

From
"David Johnston"
Date:
> -----Original Message-----
> From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-
> owner@postgresql.org] On Behalf Of jan zimmek
> Sent: Tuesday, October 30, 2012 7:45 AM
> To: pgsql-sql@postgresql.org
> Subject: [SQL] replace text occurrences loaded from table
> 
> hello,
> 
> i am actually trying to replace all occurences in a text column with some
> value, but the occurrences to replace are defined in a table. this is a
> simplified version of my schema:
> 
> create temporary table tmp_vars as select var from
> (values('ABC'),('XYZ'),('VAR123')) entries (var); create temporary table
> tmp_messages as select message from (values('my ABC is XYZ'),('the XYZ is
> very VAR123')) messages (message);
> 
> select * from tmp_messages;
> 
> my ABC is XYZ -- row 1
> the XYZ is very VAR123 -- row 2
> 
> now i need to somehow update the rows in tmp_messages, so that after the
> update i get the following:
> 
> select * from tmp_messages;
> 
> my XXX is XXX -- row 1
> the XXX is very XXX -- row 2
> 
> i have implemented a solution in plpgsql by doing a nested for-loop over
> tmp_vars and tmp_messages, but i would like to know if there is a more
> efficient way to solve this problem ?
> 

You may want to consider creating an alternating regular expression and
using "regexp_replace(...)" one time per message instead of "replace(...)"
three times

Not Tested: regexp_replace(message, 'ABC|XYZ|VAR123', 'XXX', 'g')

This should at least reduce the amount of overhead checking each expression
against each message would incur.

If you need even better performance you would need to find some way to
"index" the message contents so that for each expression the index can be
used to quickly identify the subset of messages that are going to be
altered.  The full-text-search capabilities of PostgreSQL will probably help
here though I am not familiar with them personally.

Since you have not shared the true context of your request no alternatives
can be suggested.  Also, your ability to implement certain algorithms is
influenced by the version of PostgreSQL that you are running and which you
have also not provided.

David J.