Re: patch adding new regexp functions - Mailing list pgsql-patches

From Peter Eisentraut
Subject Re: patch adding new regexp functions
Date
Msg-id 200702151057.46660.peter_e@gmx.net
Whole thread Raw
In response to Re: patch adding new regexp functions  (Jeremy Drake <pgsql@jdrake.com>)
Responses Re: patch adding new regexp functions
Re: patch adding new regexp functions
List pgsql-patches
Jeremy Drake wrote:
> regexp_matches uses a text[] for the match groups.  If you specify
> the global flag, it could return multiple matches.  Couple this with
> the requested feature of pre- and postmatch returns (with its own
> flag) and the return would turn into some sort of nasty array of
> tuples, or multiple arrays.  It seems much cleaner to me to return a
> set of the matches found, and I find which order the matches occur in
> to be much less important than the fact that they did occur and their
> contents.

Then the fact that the flag-less matches function returns an array would
be a mistake.  They have to return the same category of object.

> regexp_split returns setof text.  This has, in my opinion, a much
> greater case to return an array.  However, there are several issues
> with this approach:

Any programming language I have ever seen returns the result of a
regular expression split as a structure with order.  That in turn
implies that there are use cases for having the order, which your
proposed function could not address.

> # My experience with the array code leads me to believe that building
> up an array is an expensive proposition.  I know I could code it
> smarter so that the array is only constructed in the end.

You can make any code arbitrarily fast if it doesn't have to give the
right answer.

> # With a set-returning function, it is possible to add a LIMIT
> clause, to prevent splitting up more of the string than is necessary.

You can also add such functionality to a function in form of a
parameter.  In fact, relying on a LIMIT clause to do this seems pretty
fragile.  We argue elsewhere that LIMIT without ORDER BY is not
well-defined, and while it's hard to imagine in the current
implementation why the result of a set returning function would come
back in arbitrary order, it is certainly possible in theory, so you
still need to order the result set if you want reliable limits, but
that is not possible of the order is lost in the result.

>  It is also immediately possible to insert them into a table, or do
> grouping on them, or call a function on each value.  Most of the time
> when I do a split, I intend to do something like this with each split
> value.

These sort of arguments remind me of the contrib/xml2 module, which also
has a very, uh, pragmatic API.  Sure, these operations may seem useful
to you.  But when we offer a function as part of the core API, it is
also important that we offer a clean design that allows other users to
combine reasonably orthogonal functionality into tools that are useful
to them.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

pgsql-patches by date:

Previous
From: Jeremy Drake
Date:
Subject: Re: patch adding new regexp functions
Next
From: Magnus Hagander
Date:
Subject: Move cursor support for pl/pgsql