Re: [HACKERS] writing new regexp functions - Mailing list pgsql-patches

From Jeremy Drake
Subject Re: [HACKERS] writing new regexp functions
Date
Msg-id Pine.BSO.4.64.0702041254011.28908@resin.csoft.net
Whole thread Raw
In response to Re: [HACKERS] writing new regexp functions  (David Fetter <david@fetter.org>)
List pgsql-patches
On Sun, 4 Feb 2007, David Fetter wrote:

> On Fri, Feb 02, 2007 at 07:01:33PM -0800, Jeremy Drake wrote:
>
> > Let me know if you see any bugs or issues with this code, and I am
> > open to suggestions for further regression tests ;)
>
> > Things that I still want to look into:
> > * regexp flags (a la regexp_replace).
>
> One more text field at the end is how the regexp_replace() one does
> it.

That's how I did it.

> > * maybe make regexp_matches return setof whatever, if given a 'g' flag
> >   return all matches in string.
>
> This is doable with current machinery, albeit a little clumsily.

I have implemented this too.

> > * maybe a join function that works as an aggregate
> >    SELECT join(',', col) FROM tbl
> >   currently can be written as
> >    SELECT array_to_string(ARRAY(SELECT col FROM tbl), ',')
>
> The array_accum() aggregate in the docs works OK for this purpose.

I have not tackled this yet, I think it may be better to stick with the
ARRAY() construct for now.


So, here is the new version of the code, and also a new version of the
patch to core, which fixes some compile warnings that I did not see at
first because I was using ICC rather than GCC.

Here is the README.regexp_ext from the tar file:


This package contains regexp functions beyond those currently provided
in core PostgreSQL, utilizing the regexp engine built into core.  This
is still a work-in-progress.

The most recent version of this code can be found at
 http://www.jdrake.com/postgresql/regexp/regexp_ext.tar.gz
and the prerequisite patch to PostgreSQL core, which has been submitted
for review, can be found at
 http://www.jdrake.com/postgresql/regexp/regexp-export.patch

The .tar.gz file expects to be untarred in contrib/.  I have made some
regression tests that can be run using 'make installcheck' as normal for
contrib.  I think they exercise the corner cases in the code, but I may
very well have missed some.  It requires the above mentioned patch to
core to compile, as it takes advantage of new exported functions from
src/backend/utils/adt/regexp.c.

Let me know if you see any bugs or issues with this code, and I am open to
suggestions for further regression tests ;)

Functions implemented in this module:
* regexp_split(str text, pattern text) RETURNS SETOF text
  regexp_split(str text, pattern text, flags text) RETURNS SETOF text
   returns each section of the string delimited by the pattern.
* regexp_matches(str text, pattern text) RETURNS text[]
   returns all capture groups when matching pattern against string in an array
* regexp_matches(str text, pattern text, flags text) RETURNS SETOF
    (prematch text, fullmatch text, matches text[], postmatch text)
   returns all capture groups when matching pattern against string in an array.
   also returns the entire match in fullmatch.  if the 'g' option is given,
   returns all matches in the string.  if the 'r' option is given, also return
   the text before and after the match in prematch and postmatch respectively.

See the regression tests for more details about usage and return values.

Recent changes:
* I have put the pattern after the string in all of the functions, as
  discussed on the pgsql-hackers mailing list.

* regexp flags (a la regexp_replace).

* make regexp_matches return setof whatever, if given a 'g' flag return
  all matches in string.

Things that I still want to look into:
* maybe a join function that works as an aggregate
   SELECT join(',', col) FROM tbl
  currently can be written as
   SELECT array_to_string(ARRAY(SELECT col FROM tbl), ',')


--
Philogeny recapitulates erogeny; erogeny recapitulates philogeny.

Attachment

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: [pgsql-patches] Recalculating OldestXmin in a long-running vacuum
Next
From: Andrew Dunstan
Date:
Subject: Re: [HACKERS] \copy (query) delimiter syntax error