Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation - Mailing list pgsql-bugs

From Bruce Momjian
Subject Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date
Msg-id 201006011440.o51EejP16631@momjian.us
Whole thread Raw
In response to Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation  (Bruce Momjian <bruce@momjian.us>)
List pgsql-bugs
Robert Haas wrote:
> On Mon, May 31, 2010 at 10:11 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > Robert Haas wrote:
> >> On Sat, May 29, 2010 at 5:00 PM, Bruce Momjian <bruce@momjian.us> wrote:
> >> > I have updated the patch, attached, to clarify that this returns text
> >> > arrays, and that you can force it to always return one row using
> >> > COALESCE() and a '|' pattern (the later suggested by Daniele Varrazzo).
> >>
> >> I don't find this part to be something we should include in the
> >> documentation. ?If we want to include a workaround, how about defining
> >> a non-SRF that just calls the SRF and returns the first row?
> >
> > Remember this has to return one row for no matches, so a simple SRF will
> > not work. ?I also have not seen enough demand for another function. ?A
> > single doc mention seemed the appropriate level of detail for this.
>
> Well, we can debate later whether to add another function to core, but
> what I meant was that the user having the problem could create a
> user-defined function that calls regexp_matches() and returns the
> first row, or NULL.
>
> But actually here's an even simpler workaround, which is IMHO less
> ugly than the original one:
>
> SELECT foo, bar, (SELECT regexp_matches(bar, pattern)) FROM table;

Good idea.   Simplified patch attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +
Index: doc/src/sgml/func.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/func.sgml,v
retrieving revision 1.513
diff -c -c -r1.513 func.sgml
*** doc/src/sgml/func.sgml    7 Apr 2010 06:12:52 -0000    1.513
--- doc/src/sgml/func.sgml    1 Jun 2010 14:40:22 -0000
***************
*** 3445,3463 ****
     </para>

      <para>
!      The <function>regexp_matches</> function returns all of the captured
!      substrings resulting from matching a POSIX regular expression pattern.
!      It has the syntax
       <function>regexp_matches</function>(<replaceable>string</>, <replaceable>pattern</>
       <optional>, <replaceable>flags</> </optional>).
!      If there is no match to the <replaceable>pattern</>, the function returns
!      no rows.  If there is a match, the function returns a text array whose
       <replaceable>n</>'th element is the substring matching the
       <replaceable>n</>'th parenthesized subexpression of the pattern
       (not counting <quote>non-capturing</> parentheses; see below for
!      details).  If the pattern does not contain any parenthesized
!      subexpressions, then the result is a single-element text array containing
!      the substring matching the whole pattern.
       The <replaceable>flags</> parameter is an optional text
       string containing zero or more single-letter flags that change the
       function's behavior.  Flag <literal>g</> causes the function to find
--- 3445,3466 ----
     </para>

      <para>
!      The <function>regexp_matches</> function returns a text array of
!      all of the captured substrings resulting from matching a POSIX
!      regular expression pattern.  It has the syntax
       <function>regexp_matches</function>(<replaceable>string</>, <replaceable>pattern</>
       <optional>, <replaceable>flags</> </optional>).
!      The function can return no rows, one row, or multiple rows (see
!      the <literal>g</> flag below).  If the <replaceable>pattern</>
!      does not match, the function returns no rows.  If the pattern
!      contains no parenthesized subexpressions, then each row
!      returned is a single-element text array containing the substring
!      matching the whole pattern.  If the pattern contains parenthesized
!      subexpressions, the function returns a text array whose
       <replaceable>n</>'th element is the substring matching the
       <replaceable>n</>'th parenthesized subexpression of the pattern
       (not counting <quote>non-capturing</> parentheses; see below for
!      details).
       The <replaceable>flags</> parameter is an optional text
       string containing zero or more single-letter flags that change the
       function's behavior.  Flag <literal>g</> causes the function to find
***************
*** 3490,3495 ****
--- 3493,3508 ----
  </programlisting>
     </para>

+    <para>
+     It is possible to force <function>regexp_matches()</> to always
+     return one row by using a sub-select;  this is particularly useful
+     in a <literal>SELECT</> target list when you want all rows
+     returned, even non-matching ones:
+ <programlisting>
+ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
+ </programlisting>
+    </para>
+
      <para>
       The <function>regexp_split_to_table</> function splits a string using a POSIX
       regular expression pattern as a delimiter.  It has the syntax

pgsql-bugs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Next
From: "Kevin Grittner"
Date:
Subject: Re: BUG #5484: sum() bug