Re: Can Postgres beat Oracle for regexp_count? - Mailing list pgsql-general

From David G. Johnston
Subject Re: Can Postgres beat Oracle for regexp_count?
Date
Msg-id CAKFQuwZ8PUnDWC645Oa1hhbT4LqRO2Kc3GuYyxNiuRU3OLhT1Q@mail.gmail.com
Whole thread Raw
In response to Re: Can Postgres beat Oracle for regexp_count?  (Shaozhong SHI <shishaozhong@gmail.com>)
List pgsql-general
On Wed, Feb 2, 2022 at 10:26 PM Shaozhong SHI <shishaozhong@gmail.com> wrote:

select regexp_matches('My High Street', '([A-Z][a-z]+[\s]*)+', 'g')
It is intended to match 'My High Street, but it turned out only 'Street' was matched. 


I'm too tired to find the documentation for why you saw your result but basically you only have a single capturing parentheses pair and since you've quantified that you end up with just the last capture that was found - Street.  If you want to capture the entire found expression you need to capture the quantifier.  So put parentheses around the entire regexp.

select regexp_matches('My High Street', '(([A-Z][a-z]+[\s]*)+)', 'g')

You now have a two element array, slots filled left-to-right based upon the opening parenthesis.  So {"My High Street",Street}

To get rid of the undesired Street and only return a single element array you need to make the inner parentheses non-capturing.

select regexp_matches('My High Street', '((?:[A-Z][a-z]+[\s]*)+)', 'g')

David J.

pgsql-general by date:

Previous
From: Shaozhong SHI
Date:
Subject: Re: Can Postgres beat Oracle for regexp_count?
Next
From: Tom Lane
Date:
Subject: Re: Can Postgres beat Oracle for regexp_count?