Re: Match 2 words and more - Mailing list pgsql-general

From Alvaro Herrera
Subject Re: Match 2 words and more
Date
Msg-id 202111280049.bmeoep6puysk@alvherre.pgsql
Whole thread Raw
In response to Match 2 words and more  (Shaozhong SHI <shishaozhong@gmail.com>)
Responses Re: Match 2 words and more
List pgsql-general
On 2021-Nov-28, Shaozhong SHI wrote:

> this is supposed to find those to have 2 words and more.
> 
> select name FROM a_table where "STREET_NAME" ~ '^[[:alpha:]+ ]+[:alpha:]+$';
> 
> But, it finds only one word as well.

How about something like this?

'^([[:<:]][[:alpha:]]+[[:>:]]( |$)){2}$'

You have:
- the ^ is a constraint that matches start of string
- you have a ( ... ){2}$ construct which means "match exactly twice" and
  then match end-of-string
- Inside the parens of that construct, you match:
  - [[:<:]] which means start-of-word
  - [[:alpha:]]+ which means "a non-empty set of alphabetical chars"
  - [[:>:]] which means end-of-word
  - ( |$) for "either a space or end-of-string"

You can perhaps simplify by removing the [[:<:]] and [[:>:]]
constraints, so '^([[:alpha:]]+( |$)){2}$'

To mean "between two and four", change the {2} to {2,4}.  If you want
"two or more", try {2,}.

You could change the ( |$) to ([[:white:]]+|$) in order to accept more
than one space between words, or combinations of space and tabs and
newlines and so on.

With a decent set of data, you could probably notice some other problems
in this regexp, but at least it should be a decent start.

> It appears that regex is not robust.

Nah.

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/



pgsql-general by date:

Previous
From: Rob Sargent
Date:
Subject: Re: Match 2 words and more
Next
From: Guyren Howe
Date:
Subject: Re: Match 2 words and more