Alvaro Herrera-9 wrote
> Björn Harrtell wrote:
>> I've written a variant of regexp_matches called regexp_matches_positions
>> which instead of returning matching substrings will return matching
>> positions. I found use of this when processing OCR scanned text and
>> wanted
>> to prioritize matches based on their position.
>
> Interesting. I didn't read the patch but I wonder if it would be of
> more general applicability to return more info in a fell swoop a
> function returning a set (position, length, text of match), rather than
> an array. So instead of first calling one function to get the match and
> then their positions, do it all in one pass.
>
> (See pg_event_trigger_dropped_objects for a simple example of a function
> that returns in that fashion. There are several others but AFAIR that's
> the simplest one.)
Confused as to your thinking. Like regexp_matches this returns "SETOF
type[]". In this case integer but text for the matches. I could see adding
a generic function that returns a SETOF named composite (match varchar[],
position int[], length int[]) and the corresponding type. I'm not imagining
a situation where you'd want the position but not the text and so having to
evaluate the regexp twice seems wasteful. The length is probably a waste
though since it can readily be gotten from the text and is less often
needed. But if it's pre-calculated anyway...
My question is what position is returned in a multiple-match situation? The
supplied test only covers the simple, non-global, situation. It needs to
exercise empty sub-matches and global searches. One theory is that the
first array slot should cover the global position of match zero (i.e., the
entire pattern) within the larger document while sub-matches would be
relative offsets within that single match. This conflicts, though, with the
fact that _matches only returns array elements for () items and never for
the full match - the goal in this function being parallel un-nesting. But as
nesting is allowed it is still possible to have occur.
How does this resolve in the patch?
SELECT regexp_matches('abcabc','((a)(b)(c))','g');
David J.
--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Patch-regexp-matches-variant-returning-an-array-of-matching-positions-tp5789321p5789414.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.