Home > mailing lists

Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date	July 30, 2021 21:38:24
Msg-id	1124806.1627681104@sss.pgh.pa.us Whole thread Raw
In response to	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace (Gilles Darold <gilles@darold.net>)
Responses	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
List	pgsql-hackers

Tree view

Gilles Darold <gilles@darold.net> writes:
> Le 26/07/2021 à 21:56, Tom Lane a écrit :
>> I'm inclined to just drop the regexp_replace additions.  I don't think
>> that the extra parameters Oracle provides here are especially useful.
>> They're definitely not useful enough to justify creating compatibility
>> hazards for.

> I would not say that being able to replace the Nth occurrence of a
> pattern matching is not useful but i agree that this is not a common
> case with replacement. Both Oracle [1] and IBM DB2 [2] propose this form
> and I have though that we can not have compatibility issues because of
> the different data type at the 4th parameter.

Well, here's an example of the potential issues:

regression=# create function rr(text,text,text,text) returns text
regression-# language sql as $$select 'text'$$;
CREATE FUNCTION
regression=# create function rr(text,text,text,int4) returns text
language sql as $$select 'int4'$$;
CREATE FUNCTION
regression=# select rr('a','b','c','d');
  rr  
------
 text
(1 row)

regression=# select rr('a','b','c',42);
  rr  
------
 int4
(1 row)

So far so good, but:

regression=# prepare rr as select rr('a','b','c',$1);
PREPARE
regression=# execute rr(12);  
  rr  
------
 text
(1 row)

So somebody trying to use the 4-parameter Oracle form from, say, JDBC
would get bit if they were sloppy about specifying parameter types.

The one saving grace is that digits aren't valid regexp flags,
so the outcome would be something like

regression=# select regexp_replace('a','b','c','12');
ERROR:  invalid regular expression option: "1"

which'd be less difficult to debug than silent misbehavior.
Conversely, if you thought you were passing flags but it somehow
got interpreted as a start position, that would fail too:

regression=# prepare rri as select rr('a','b','c', $1::int);
PREPARE
regression=# execute rri('gi');
ERROR:  invalid input syntax for type integer: "gi"
LINE 1: execute rri('gi');
                    ^

Still, I bet a lot that we'd see periodic bug reports complaining
that it doesn't work.

> Anyway, maybe we can just
> rename the function even if I would prefer that regexp_replace() be
> extended. For example:
>     regexp_replace(source, pattern, replacement [, flags ]);
>     regexp_substitute(source, pattern, replacement [, position ] [,
> occurrence ] [, flags ]);

Hmm.  Of course the entire selling point of this patch seems to be
bug-compatibility with Oracle, so using different names is largely
defeating the point :-(

Maybe we should just hold our noses and do it.  The point that
you'd get a recognizable failure if the wrong function were chosen
reassures me a little bit.  We've seen a lot of cases where this
sort of ambiguity results in the system just silently doing something
different from what you expected, and I was afraid that that could
happen here.

            regards, tom lane

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 30 July 2021, 21:33:34
Subject: Re: Split xlog.c

From: Tom Lane
Date: 30 July 2021, 21:49:09
Subject: Re: [PATCH] Hooks at XactCommand level

Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace - Mailing list pgsql-hackers

Previous

Next