Thread: BUG #18956: Observing an issue in regexp_count()
The following bug has been logged on the website: Bug reference: 18956 Logged by: Anudeep Vattikonda Email address: anudeepvattikonda0404@gmail.com PostgreSQL version: 17.5 Operating system: Mac Description: Hi I am trying to run the below query select REGEXP_COUNT('cat at the flat', '\Bat\b') ; I was expecting it to return 2 but I see Postgres is returning 0. I see that there are two matches, cat and flat. All it should do is to look for the word at whose left side shoudn't be a word boundary while the right side should be a word boundary Thank you
hubert depesz lubaczewski <depesz@depesz.com> writes: > On Thu, Jun 12, 2025 at 08:03:25AM +0000, PG Bug reporting form wrote: >> I am trying to run the below query >> select REGEXP_COUNT('cat at the flat', '\Bat\b') ; >> I was expecting it to return 2 but I see Postgres is returning 0. I see that >> there are two matches, cat and flat. All it should do is to look for the >> word at whose left side shoudn't be a word boundary while the right side >> should be a word boundary > What makes you think that \B/\b has anything to do with word boundary? Indeed, they do not. > As far as I can tell pg regexps have nothing related to word boundaries. Sure we do, see "Regular Expression Constraint Escapes": https://www.postgresql.org/docs/current/functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE Unfortunately, since these are all way outside the POSIX regexp standard, different systems have implemented these extensions differently. I don't doubt that \B/\b mean word boundaries in some other system. regards, tom lane
On Thu, Jun 12, 2025 at 09:54:46AM -0400, Tom Lane wrote: > hubert depesz lubaczewski <depesz@depesz.com> writes: > > On Thu, Jun 12, 2025 at 08:03:25AM +0000, PG Bug reporting form wrote: > >> I am trying to run the below query > >> select REGEXP_COUNT('cat at the flat', '\Bat\b') ; > >> I was expecting it to return 2 but I see Postgres is returning 0. I see that > >> there are two matches, cat and flat. All it should do is to look for the > >> word at whose left side shoudn't be a word boundary while the right side > >> should be a word boundary > > > What makes you think that \B/\b has anything to do with word boundary? > > Indeed, they do not. > > > As far as I can tell pg regexps have nothing related to word boundaries. > > Sure we do, see "Regular Expression Constraint Escapes": > > https://www.postgresql.org/docs/current/functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE > > Unfortunately, since these are all way outside the POSIX regexp > standard, different systems have implemented these extensions > differently. I don't doubt that \B/\b mean word boundaries > in some other system. Oh, Missed that. Thanks. So the regexp can be rewritten to: =$ select REGEXP_COUNT('cat at the flat', '\Yat\M'); regexp_count ────────────── 2 (1 row) Best regards, depesz