On 2021-Nov-28, Shaozhong SHI wrote:
> this is supposed to find those to have 2 words and more.
>
> select name FROM a_table where "STREET_NAME" ~ '^[[:alpha:]+ ]+[:alpha:]+$';
>
> But, it finds only one word as well.
How about something like this?
'^([[:<:]][[:alpha:]]+[[:>:]]( |$)){2}$'
You have:
- the ^ is a constraint that matches start of string
- you have a ( ... ){2}$ construct which means "match exactly twice" and
then match end-of-string
- Inside the parens of that construct, you match:
- [[:<:]] which means start-of-word
- [[:alpha:]]+ which means "a non-empty set of alphabetical chars"
- [[:>:]] which means end-of-word
- ( |$) for "either a space or end-of-string"
You can perhaps simplify by removing the [[:<:]] and [[:>:]]
constraints, so '^([[:alpha:]]+( |$)){2}$'
To mean "between two and four", change the {2} to {2,4}. If you want
"two or more", try {2,}.
You could change the ( |$) to ([[:white:]]+|$) in order to accept more
than one space between words, or combinations of space and tabs and
newlines and so on.
With a decent set of data, you could probably notice some other problems
in this regexp, but at least it should be a decent start.
> It appears that regex is not robust.
Nah.
--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/