Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation - Mailing list pgsql-bugs

From Robert Haas
Subject Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date
Msg-id AANLkTilIkP5cqXLiVahSCOYuyU6gMu8Bdcany2yHtzOH@mail.gmail.com
Whole thread Raw
In response to BUG #5469: regexp_matches() has poor behaviour and more poor documentation  ("Daniele Varrazzo" <daniele.varrazzo@gmail.com>)
Responses Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation  (Daniele Varrazzo <daniele.varrazzo@gmail.com>)
List pgsql-bugs
On Mon, May 24, 2010 at 9:16 AM, Daniele Varrazzo
<daniele.varrazzo@gmail.com> wrote:
> regexp_matches() has been recently discussed
> (http://archives.postgresql.org/pgsql-bugs/2010-04/msg00026.php): it is a
> setof function and as such it can drop results.
>
> Unfortunately it is an useful function to newcomers who use SQL, use regexps
> but don't arrive to read the chapter 34.4.7. (i.e. Extending SQL -> Query
> Language (SQL) Functions -> SQL Functions Returning Sets) and are not so
> enlightened to know that "setof text[]" means "if it doesn't match, it drops
> the record". They just expect the function to be a LIKE on steroids.
>
> Please describe the behavior in the documentation of the function (i.e.
> table 9-6. and section 9.7.3), possibly provide a function with a saner
> interface, i.e. returning a text[] of the first match or NULL on no match,
> or document a workaround (suitable for an user knowing regexps but not
> setof-returning functions) to make the function not dropping record (e.g. I
> fixed the "bug" adding a "|" at the end of the pattern, so that the function
> returns an array of NULL in case of no match: I don't think it is a trivial
> workaround).

I'm not sure that it's very productive to refer to the behavior of our
code as insane.  We do document this in section 9.7.3, pretty
explicitly:

"The regexp_matches function returns all of the captured substrings
resulting from matching a POSIX regular expression pattern. It has the
syntax regexp_matches(string, pattern  [, flags  ]). If there is no
match to the pattern, the function returns no rows. If there is a
match, the function returns a text array whose n'th element is the
substring matching the n'th parenthesized subexpression of the pattern
(not counting "non-capturing" parentheses; see below for details)."

I think that's pretty clear.  Your mileage may vary, of course.

I'm less confident than what we have in table 9-6 (other string
functions, in section 9.4, string functions and operators) is clear on
first reading, but neither do I immediately know how to improve it.
Perhaps instead of critiquing our insanity you could provide some
specific suggestions for improvement.

Similarly, if you think we should have another function besides
regexp_matches(), rather than just complaining that we don't, it would
be more useful to suggest a name and a specific behavior and ideally
maybe even provide a patch (or just the docs portion of a patch) -
especially if you can point to a specific use-case that is hard to do
with the SRF but would be easier with a function with a different
interface.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

pgsql-bugs by date:

Previous
From: Robert Haas
Date:
Subject: Re: Ola
Next
From: Mark Kirkwood
Date:
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation