Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation - Mailing list pgsql-bugs

From Bruce Momjian
Subject Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date
Msg-id 201005290201.o4T21bk04708@momjian.us
Whole thread Raw
In response to Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation  (Daniele Varrazzo <daniele.varrazzo@gmail.com>)
Responses Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
List pgsql-bugs
Daniele Varrazzo wrote:
> "If there is no match to the pattern, the function returns no rows" is
> easily overlooked as "it returns null", or some other behaviour that
> don't change the returned set. The point is, because the function is
> listed in the string function, you would expect the function to
> manipulate text, not the dataset. The function as it is is not safe to
> be used in a construct
>
>     SELECT foo, bar, regexp_matches(bar, pattern) FROM table;
>
> unless you really wanted:
>
>     SELECT foo, bar, regexp_matches(bar, pattern) FROM table WHERE bar
> ~ pattern;
>
> otherwise you have to take measures to be able to deal with records in
> which the pattern is not matched, for example:
>
>     SELECT foo, bar, regexp_matches(bar, pattern || '|') FROM table;
>
> the latter still doesn't work when bar is NULL: in this case the
> record is dropped anyway, so I don't think it can be proposed as
> general solution.
>
> The characteristics of returning a set of text[] is useful when the
> user wants all the matches, not only the first one: the behaviour is
> selected specifying the flag 'g' as third argument.
>
> >From this point of view, I hope it can be stated that in its current
> form the regexp_matches() has not the most optimal interface. Please
> accept my apology for the tone being too rude in my previous message.

I found the description in the documentation quite confusing also.  I
have created the attached documention patch which is clearer about the
behavior of regexp_matches().

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com
Index: doc/src/sgml/func.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/func.sgml,v
retrieving revision 1.513
diff -c -c -r1.513 func.sgml
*** doc/src/sgml/func.sgml    7 Apr 2010 06:12:52 -0000    1.513
--- doc/src/sgml/func.sgml    29 May 2010 01:59:32 -0000
***************
*** 3450,3463 ****
       It has the syntax
       <function>regexp_matches</function>(<replaceable>string</>, <replaceable>pattern</>
       <optional>, <replaceable>flags</> </optional>).
!      If there is no match to the <replaceable>pattern</>, the function returns
!      no rows.  If there is a match, the function returns a text array whose
       <replaceable>n</>'th element is the substring matching the
       <replaceable>n</>'th parenthesized subexpression of the pattern
       (not counting <quote>non-capturing</> parentheses; see below for
!      details).  If the pattern does not contain any parenthesized
!      subexpressions, then the result is a single-element text array containing
!      the substring matching the whole pattern.
       The <replaceable>flags</> parameter is an optional text
       string containing zero or more single-letter flags that change the
       function's behavior.  Flag <literal>g</> causes the function to find
--- 3450,3466 ----
       It has the syntax
       <function>regexp_matches</function>(<replaceable>string</>, <replaceable>pattern</>
       <optional>, <replaceable>flags</> </optional>).
!      The function can return no rows, one row, or multiple rows (see
!      the <literal>g</> flag below).  If the <replaceable>pattern</>
!      does not match, the function returns no rows.  If the pattern
!      contains no parenthesized subexpressions, then each row
!      returned is a single-element text array containing the substring
!      matching the whole pattern.  If the pattern contains parenthesized
!      subexpressions, the function returns a text array whose
       <replaceable>n</>'th element is the substring matching the
       <replaceable>n</>'th parenthesized subexpression of the pattern
       (not counting <quote>non-capturing</> parentheses; see below for
!      details).
       The <replaceable>flags</> parameter is an optional text
       string containing zero or more single-letter flags that change the
       function's behavior.  Flag <literal>g</> causes the function to find

pgsql-bugs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: psql or pgbouncer bug?
Next
From: Greg Stark
Date:
Subject: Re: BUG #5480: Autovacuum interferes with operations (e.g. truncate) on very large databases