Thread: BUG #18956: Observing an issue in regexp_count()

BUG #18956: Observing an issue in regexp_count()

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18956
Logged by:          Anudeep Vattikonda
Email address:      anudeepvattikonda0404@gmail.com
PostgreSQL version: 17.5
Operating system:   Mac
Description:

Hi
I am trying to run the below query
select REGEXP_COUNT('cat at the flat', '\Bat\b') ;
I was expecting it to return 2 but I see Postgres is returning 0. I see that
there are two matches, cat and flat. All it should do is to look for the
word at whose left side shoudn't be a word boundary while the right side
should be a word boundary
Thank you


Re: BUG #18956: Observing an issue in regexp_count()

From
Tom Lane
Date:
hubert depesz lubaczewski <depesz@depesz.com> writes:
> On Thu, Jun 12, 2025 at 08:03:25AM +0000, PG Bug reporting form wrote:
>> I am trying to run the below query
>> select REGEXP_COUNT('cat at the flat', '\Bat\b') ;
>> I was expecting it to return 2 but I see Postgres is returning 0. I see that
>> there are two matches, cat and flat. All it should do is to look for the
>> word at whose left side shoudn't be a word boundary while the right side
>> should be a word boundary

> What makes you think that \B/\b has anything to do with word boundary?

Indeed, they do not.

> As far as I can tell pg regexps have nothing related to word boundaries.

Sure we do, see "Regular Expression Constraint Escapes":

https://www.postgresql.org/docs/current/functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE

Unfortunately, since these are all way outside the POSIX regexp
standard, different systems have implemented these extensions
differently.  I don't doubt that \B/\b mean word boundaries
in some other system.

            regards, tom lane



Re: BUG #18956: Observing an issue in regexp_count()

From
hubert depesz lubaczewski
Date:
On Thu, Jun 12, 2025 at 09:54:46AM -0400, Tom Lane wrote:
> hubert depesz lubaczewski <depesz@depesz.com> writes:
> > On Thu, Jun 12, 2025 at 08:03:25AM +0000, PG Bug reporting form wrote:
> >> I am trying to run the below query
> >> select REGEXP_COUNT('cat at the flat', '\Bat\b') ;
> >> I was expecting it to return 2 but I see Postgres is returning 0. I see that
> >> there are two matches, cat and flat. All it should do is to look for the
> >> word at whose left side shoudn't be a word boundary while the right side
> >> should be a word boundary
> 
> > What makes you think that \B/\b has anything to do with word boundary?
> 
> Indeed, they do not.
> 
> > As far as I can tell pg regexps have nothing related to word boundaries.
> 
> Sure we do, see "Regular Expression Constraint Escapes":
> 
> https://www.postgresql.org/docs/current/functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE
> 
> Unfortunately, since these are all way outside the POSIX regexp
> standard, different systems have implemented these extensions
> differently.  I don't doubt that \B/\b mean word boundaries
> in some other system.

Oh, Missed that. Thanks.

So the regexp can be rewritten to:

=$ select REGEXP_COUNT('cat at the flat', '\Yat\M');
 regexp_count
──────────────
            2
(1 row)

Best regards,

depesz