Re: patch adding new regexp functions - Mailing list pgsql-patches

From Jeremy Drake
Subject Re: patch adding new regexp functions
Date
Msg-id Pine.BSO.4.64.0702170005560.18849@resin.csoft.net
Whole thread Raw
In response to Re: patch adding new regexp functions  (Peter Eisentraut <peter_e@gmx.net>)
Responses Re: patch adding new regexp functions
Re: patch adding new regexp functions
List pgsql-patches
On Sat, 17 Feb 2007, Peter Eisentraut wrote:

> Jeremy Drake wrote:
> > In case you haven't noticed, I am rather averse to making this return
> > text[] because it is much easier in my experience to use the results
> > when returned in SETOF rather than text[],
>
> The primary use case I know for string splitting is parsing
> comma/pipe/whatever separated fields into a row structure, and the way
> I see it your API proposal makes that exceptionally difficult.

For this case see string_to_array:
http://developer.postgresql.org/pgdocs/postgres/functions-array.html
select string_to_array('a|b|c', '|');
 string_to_array
-----------------
 {a,b,c}
(1 row)


> I don't know what your use case is, though.  All of this is missing
> actual use cases.

The particular use case I had for this function was at a previous
employer, and I am not sure exactly how much detail is appropriate to
divulge.  Basically, the project was doing some text processing inside of
postgres, and getting all of the words from a string into a table with
some processing (excluding stopwords and so forth) as efficiently as
possible was a big concern.

The regexp_split function code was based on some code that a friend of
mine wrote which used PCRE rather than postgres' internal regexp support.
I don't know exactly what his use-case was, but he probably had
one because he wrote the function and had it returning SETOF text ;)
Perhaps he can share a general idea of what it was (nudge nudge)?

> > While, if you
> > really really wanted a text[], you could use the (fully documented)
> > ARRAY(select resultstr from regexp_split(...) order by startpos)
> > construct.
>
> I think, however, that we should be providing simple primitives that can
> be combined into complex expressions rather than complex primitives
> that have to be dissected apart to get simple results.

The most simple primitive is string_to_array(text, text) returns text[],
but it was not sufficient for our needs.

> > > As for the regexp_matches() function, it seems to me that it
> > > returns too much information at once.  What is the use case for
> > > getting all of prematch, fullmatch, matches, and postmatch in one
> > > call?
> >
> > It was requested by David Fetter:
> > http://archives.postgresql.org/pgsql-hackers/2007-02/msg00056.php
> >
> > It was not horribly difficult to provide, and it seemed reasonable to
> > me. I have no need for them personally.
>
> David Fetter has also repeated failed to offer a use case for this, so I
> hesitate to accept this.

I have no strong opinion either way, so I will let those who do argue it
out and wait for the dust to settle ;)

--
The Law, in its majestic equality, forbids the rich, as well as the
poor, to sleep under the bridges, to beg in the streets, and to steal
bread.
        -- Anatole France

pgsql-patches by date:

Previous
From: David Fetter
Date:
Subject: Re: patch adding new regexp functions
Next
From: Peter Eisentraut
Date:
Subject: Re: patch adding new regexp functions