Thread: workaround for lack of REPLACE() function

workaround for lack of REPLACE() function

From
"Agent155 Support"
Date:
Hi,
 
I looked around for a definitive answer to this question but to no avail.  It appears that PostgreSQL does not implement the SQL function REPLACE() , which, in MySQL works like this:
 
REPLACE(str,from_str,to_str)
Returns the string str with all all occurrences of the string from_str replaced by the string to_str:
 
What do folks usually do when they have to do a global search/replace on a big table?
 
Thanks,
 
Matt

Re: workaround for lack of REPLACE() function

From
Tom Lane
Date:
"Agent155 Support" <matt@planetnet.com> writes:
> What do folks usually do when they have to do a global search/replace on a =
> big table?

You can code pretty much any text transformation you'd like in plperl or
pltcl, both of which languages are very strong on string manipulations.
So there's not been a lot of concern about the lack of a SQL-level
substitution operator.

IIRC, SQL99 does specify some sort of substring replacement function,
and Thomas recently implemented it for 7.3.  But it's not very bright
and I suspect people will keep falling back on plperl or pltcl to do
anything nontrivial.

            regards, tom lane

Re: workaround for lack of REPLACE() function

From
Joe Conway
Date:
Tom Lane wrote:
> "Agent155 Support" <matt@planetnet.com> writes:
>
>>What do folks usually do when they have to do a global search/replace on a =
>>big table?
>
>
> You can code pretty much any text transformation you'd like in plperl or
> pltcl, both of which languages are very strong on string manipulations.
> So there's not been a lot of concern about the lack of a SQL-level
> substitution operator.
>
> IIRC, SQL99 does specify some sort of substring replacement function,
> and Thomas recently implemented it for 7.3.  But it's not very bright
> and I suspect people will keep falling back on plperl or pltcl to do
> anything nontrivial.
>

I think Thomas did just recently commit the SQL99 OVERLAY function, but
similar to Tom's comment, I don't like the way SQL99 defines it. I've
written a replace() C function (along with a couple of other string
manipulation functions) for my own use. If you'd like a copy let me know
and I'll gladly send it to you.

I have thought about sending it in as a contrib, but wasn't sure if
there was enough interest to warrant it.

Joe


Re: workaround for lack of REPLACE() function

From
Thomas Lockhart
Date:
...
> I think Thomas did just recently commit the SQL99 OVERLAY function, but
> similar to Tom's comment, I don't like the way SQL99 defines it. I've
> written a replace() C function (along with a couple of other string
> manipulation functions) for my own use. If you'd like a copy let me know
> and I'll gladly send it to you.

OK, what don't you like about it? If you can define some functionality
that we *should* have, then it is likely to go into the main distro.
Either as an extension to existing functions or as a separate function.

"Style" counts for not-much, but "can't do it with what we have" counts
for a lot.

                   - Thomas

Re: workaround for lack of REPLACE() function

From
Joe Conway
Date:
Thomas Lockhart wrote:
>>I think Thomas did just recently commit the SQL99 OVERLAY function, but
>>similar to Tom's comment, I don't like the way SQL99 defines it. I've
>>written a replace() C function (along with a couple of other string
>>manipulation functions) for my own use. If you'd like a copy let me know
>>and I'll gladly send it to you.
>
> OK, what don't you like about it? If you can define some functionality
> that we *should* have, then it is likely to go into the main distro.
> Either as an extension to existing functions or as a separate function.
>
> "Style" counts for not-much, but "can't do it with what we have" counts
> for a lot.

Hmmm, making justify my comment ;-)

Well, OVERLAY is defined as:
     overlay(string placing string from integer [for integer])
and replace() is defined (by me at least) as:
     replace(inputstring, old-substr, new-substr)

OVERLAY requires that I know the "from" position and possibly the "for"
in advance. Other functions (such as strpos() and substr()) can be used
to help, but consider the following:

test=# create table strtest(f1 text);
CREATE TABLE
test=# insert into strtest values('/usr/local/pgsql/data');
INSERT 124955 1
test=# select replace(f1,'/local','') from strtest;
      replace
-----------------
  /usr/pgsql/data
(1 row)

Now, how can I do this with overlay()? If I happen to know in advance
that my only input string is '/usr/local/pgsql/data', then I can do:

test=# select overlay(f1 placing '' from 5 for 6) from strtest;
      overlay
-----------------
  /usr/pgsql/data
(1 row)

But what if now I do:
test=# insert into strtest values('/m1/usr/local/pgsql/data');
INSERT 124957 1

Now

test=# select replace(f1,'/local','') from strtest;
       replace
--------------------
  /usr/pgsql/data
  /m1/usr/pgsql/data
(2 rows)

works fine, but

test=# select overlay(f1 placing '' from 5 for 6) from strtest;
       overlay
--------------------
  /usr/pgsql/data
  /m1/cal/pgsql/data
(2 rows)

doesn't give the desired result. Of course you can work around this, but
it starts to get ugly:

test=# select overlay(f1 placing '' from strpos(f1,'/local') for 6) from
strtest;
       overlay
--------------------
  /usr/pgsql/data
  /m1/usr/pgsql/data
(2 rows)

But now what happens if you wanted to replace all of the '/' characters
with '\'?

test=# select replace(f1,'/','\\') from strtest;
          replace
--------------------------
  \usr\local\pgsql\data
  \m1\usr\local\pgsql\data
(2 rows)

You can't do this at all with overlay(), unless you want to write a
PL/pgSQL function and loop through each string. I started out with
exactly this, using strpos() and substr(), but I thought a C function
was cleaner, and it is certainly faster.


BTW, the other functions already in the string manipulation module are:

to_hex -- Accepts bigint and returns it as equivilent hex string
   to_hex(bigint inputnum) RETURNS text

test=# select to_hex(123456789::bigint);
  to_hex
---------
  75bcd15
(1 row)

and

extract_tok -- Extracts and returns individual token from delimited
                text
   extract_tok(text inputstring, text delimiter, int posn) RETURNS text

test=# select extract_tok(extract_tok('f=1&g=3&h=4','&',2),'=',2);
  extract_tok
-------------
  3
(1 row)

extract_tok() is actually already in dblink (dblink_strtok), because it
is useful in that context, but it probably belongs in a contrib for
string manipulation instead. In fact, now that I think about it, so is
replace() (dblink_replace).

Regards,

Joe


Re: workaround for lack of REPLACE() function

From
Thomas Lockhart
Date:
(crossposted to -hackers, should follow up on that list)

> Well, OVERLAY is defined as:
>      overlay(string placing string from integer [for integer])
> and replace() is defined (by me at least) as:
>      replace(inputstring, old-substr, new-substr)

OK.

> OVERLAY requires that I know the "from" position and possibly the "for"
> in advance. Other functions (such as strpos() and substr()) can be used
> to help...

Right. So you can do your example pretty easily:

thomas=# select overlay(f1 placing '' from position('/local' in f1)
thomas-# for length('/local')) from strtest;
      overlay
--------------------
 /usr/pgsql/data
 /m1/usr/pgsql/data

And if you don't like that much typing you can do:

thomas=# create function replace(text, text, text) returns text as '
thomas'# select overlay($1 placing $3 from position($2 in $1) for
length($2));
thomas'# ' language 'sql';
CREATE FUNCTION
thomas=# select replace(f1, '/local', '') from strtest;
      replace
--------------------
 /usr/pgsql/data
 /m1/usr/pgsql/data

> But now what happens if you wanted to replace all of the '/' characters
> with '\'?...
> You can't do this at all with overlay(), unless you want to write a
> PL/pgSQL function and loop through each string. I started out with
> exactly this, using strpos() and substr(), but I thought a C function
> was cleaner, and it is certainly faster.

OK, this is in the "can't do it what we have" category. Should we have
it accept a regular expression rather than a simple string? In either
case it should probably go into the main distro. Except that I see
"REPLACE" is mentioned as a reserved word in SQL99. But has no other
mention in my copy of the draft standard. Anyone else have an idea what
it might be used for in the standard?

The other functions look useful too, unless to_char() and varbit can be
evolved to support this functionality.

                     - Thomas