Thread: Non-capturing expressions

Non-capturing expressions

From
Thom Brown
Date:
Hi all,

It must be that I haven't had enough caffeine today, but I can't figure out why the following expression captures the non-capturing part of the text:

# SELECT regexp_matches('postgres','(?:g)r');
 regexp_matches 
----------------
 {gr}
(1 row)

I'm expecting '{r}' in the output as I thought this would use ARE mode by default.

Thom

Re: Non-capturing expressions

From
Thom Brown
Date:
On 25 October 2014 11:49, Francisco Olarte <folarte@peoplecall.com> wrote:
Hi Thom:

On Sat, Oct 25, 2014 at 11:24 AM, Thom Brown <thom@linux.com> wrote:
It must be that I haven't had enough caffeine today, but I can't figure out why the following expression captures the non-capturing part of the text:
# SELECT regexp_matches('postgres','(?:g)r');
 regexp_matches 
----------------
 {gr}
(1 row)

Section 9.7.3, search for 'If the pattern contains no parenthesized subexpressions, then each row returned is a single-element text array containing the substring matching the whole pattern.'

Ah, I knew I missed something:

# SELECT regexp_matches('postgres','(?:g)(r)');
 regexp_matches 
----------------
 {r}
(1 row)

Although I can see it's redundant in this form.
 

I'm expecting '{r}' in the output as I thought this would use ARE mode by default.

Why r ? Your pattern is exactly the same as 'gr'. NOTHING gets captured. To get that you'll need the opposite 'g(r)' to capture it. By default nothing gets captured, the (?:...) construction is used because (....) does GROUPING and CAPTURING, and sometimes you want grouping WITHOUT capturing.

I'm familiar with regular expression syntax, just famliarising myself with PostgreSQL's syntax flavour.

Thanks

Thom

Re: Non-capturing expressions

From
Francisco Olarte
Date:
Hi Thom:

On Sat, Oct 25, 2014 at 11:55 AM, Thom Brown <thom@linux.com> wrote:
Ah, I knew I missed something:

# SELECT regexp_matches('postgres','(?:g)(r)');
...snip, snip...

Yes. It's one fo the things I strongly dislike of some of the semantics of postgres ( and others ) regular engine functions. Their return value ''semantics''  depends on data, which makes them difficult to use properly when the pattern argument is unknown. I would prefer to have it always return a list with the full match in the first element, the grouped captures behind it ( i.e., {gr} for '(?:g)r', {gr,g} for '(g)r' . But I think it's dessigned more for interactive use with constant patterns than for programmatic use. 
....

I'm familiar with regular expression syntax, just famliarising myself with PostgreSQL's syntax flavour.

 
Sorry, got confused by the question, and by the fact  that I do not know of any regular expression engine with an access function which when presented with non-capturing-group1+unmarked2 returns unmarked2. Even in perl I do not know how to it.


Regards.
  Francisco Olarte.