Re: [HACKERS] writing new regexp functions - Mailing list pgsql-patches

From Jeremy Drake
Subject Re: [HACKERS] writing new regexp functions
Date
Msg-id Pine.BSO.4.64.0702011914480.28908@resin.csoft.net
Whole thread Raw
List pgsql-patches
On Thu, 1 Feb 2007, Jeremy Drake wrote:

> On Thu, 1 Feb 2007, Tom Lane wrote:
>
> > Jeremy Drake <pgsql@jdrake.com> writes:
> > > Is there some specific reason that these functions are static,
> >
> > Yeah: not cluttering the global namespace.
>
> > Is there a reason for not putting your new code itself into regexp.c?
>
> Not really, I just figured it would be cleaner/easier to write it as an
> extension.  I also figure that it is unlikely that every regexp function
> that anyone could possibly want will be implemented in core in that one
> file.
<snip>

> Anyway, the particular thing I was writing was a function like
> substring(str FROM pattern) which instead of returning just the first
> match group, would return an array of text containing all of the match
> groups.  I exported the functions in my sandbox, and wrote a module with a
> function that does this.

I have attached the patch I have put together, which does the following:
* Expose the previously static RE_* functions from regexp.c which wrap
  the code in src/backend/regex with postgres-style errors, string
  conversion, and caching of patterns.

* expose regex_flavor guc var, which is needed to know how to interpret
  patterns when compiling them

* Add a couple more RE_* functions in regexp.c to provide access
  to different levels of the process, which were necessary to avoid
  duplicating effort elsewhere.

* Update replace_text_regexp in varlena.c to use newly exposed functions
  from regexp.c instead of duplicating error handling code from there.

Also attached is the function I wrote to retrieve all of the capture
groups in a pattern match in a text[].  I also intend to put together a
function analogous to split_part which will take a string and a pattern to
split on, and return setof text.

Let me know if I should work under the assumption of the attached patch
and write the functions for contrib or pgfoundry, or to put the functions
in regexp.c and try to get them in core, or both? (it made my life a lot
easier working on the function to not have to restart the postmaster every
time I recompiled it, may be nice for the future to be able to make
extensions like this...)

--
To err is human, to forgive, beyond the scope of the Operating System.

Attachment

pgsql-patches by date:

Previous
From: ITAGAKI Takahiro
Date:
Subject: Error correction for n_dead_tuples
Next
From: Bruce Momjian
Date:
Subject: Re: Enums patch v2