Re: patch adding new regexp functions - Mailing list pgsql-patches
From | Peter Eisentraut |
---|---|
Subject | Re: patch adding new regexp functions |
Date | |
Msg-id | 200702151057.46660.peter_e@gmx.net Whole thread Raw |
In response to | Re: patch adding new regexp functions (Jeremy Drake <pgsql@jdrake.com>) |
Responses |
Re: patch adding new regexp functions
Re: patch adding new regexp functions |
List | pgsql-patches |
Jeremy Drake wrote: > regexp_matches uses a text[] for the match groups. If you specify > the global flag, it could return multiple matches. Couple this with > the requested feature of pre- and postmatch returns (with its own > flag) and the return would turn into some sort of nasty array of > tuples, or multiple arrays. It seems much cleaner to me to return a > set of the matches found, and I find which order the matches occur in > to be much less important than the fact that they did occur and their > contents. Then the fact that the flag-less matches function returns an array would be a mistake. They have to return the same category of object. > regexp_split returns setof text. This has, in my opinion, a much > greater case to return an array. However, there are several issues > with this approach: Any programming language I have ever seen returns the result of a regular expression split as a structure with order. That in turn implies that there are use cases for having the order, which your proposed function could not address. > # My experience with the array code leads me to believe that building > up an array is an expensive proposition. I know I could code it > smarter so that the array is only constructed in the end. You can make any code arbitrarily fast if it doesn't have to give the right answer. > # With a set-returning function, it is possible to add a LIMIT > clause, to prevent splitting up more of the string than is necessary. You can also add such functionality to a function in form of a parameter. In fact, relying on a LIMIT clause to do this seems pretty fragile. We argue elsewhere that LIMIT without ORDER BY is not well-defined, and while it's hard to imagine in the current implementation why the result of a set returning function would come back in arbitrary order, it is certainly possible in theory, so you still need to order the result set if you want reliable limits, but that is not possible of the order is lost in the result. > It is also immediately possible to insert them into a table, or do > grouping on them, or call a function on each value. Most of the time > when I do a split, I intend to do something like this with each split > value. These sort of arguments remind me of the contrib/xml2 module, which also has a very, uh, pragmatic API. Sure, these operations may seem useful to you. But when we offer a function as part of the core API, it is also important that we offer a clean design that allows other users to combine reasonably orthogonal functionality into tools that are useful to them. -- Peter Eisentraut http://developer.postgresql.org/~petere/
pgsql-patches by date: