Thread: Match 2 words and more
this is supposed to find those to have 2 words and more.
select name FROM a_table where "STREET_NAME" ~ '^[[:alpha:]+ ]+[:alpha:]+$';
But, it finds only one word as well.
It appears that regex is not robust.
Can anyone shed light on this?
Regards,
David
> On Nov 27, 2021, at 5:27 PM, Shaozhong SHI <shishaozhong@gmail.com> wrote: > > > this is supposed to find those to have 2 words and more. > > select name FROM a_table where "STREET_NAME" ~ '^[[:alpha:]+ ]+[:alpha:]+$'; > > But, it finds only one word as well. > > It appears that regex is not robust. > > Can anyone shed light on this? > > Regards, > Show the results of the query. Describe how those results differ from what you desire or from what you expect. > David
On 2021-Nov-28, Shaozhong SHI wrote: > this is supposed to find those to have 2 words and more. > > select name FROM a_table where "STREET_NAME" ~ '^[[:alpha:]+ ]+[:alpha:]+$'; > > But, it finds only one word as well. How about something like this? '^([[:<:]][[:alpha:]]+[[:>:]]( |$)){2}$' You have: - the ^ is a constraint that matches start of string - you have a ( ... ){2}$ construct which means "match exactly twice" and then match end-of-string - Inside the parens of that construct, you match: - [[:<:]] which means start-of-word - [[:alpha:]]+ which means "a non-empty set of alphabetical chars" - [[:>:]] which means end-of-word - ( |$) for "either a space or end-of-string" You can perhaps simplify by removing the [[:<:]] and [[:>:]] constraints, so '^([[:alpha:]]+( |$)){2}$' To mean "between two and four", change the {2} to {2,4}. If you want "two or more", try {2,}. You could change the ( |$) to ([[:white:]]+|$) in order to accept more than one space between words, or combinations of space and tabs and newlines and so on. With a decent set of data, you could probably notice some other problems in this regexp, but at least it should be a decent start. > It appears that regex is not robust. Nah. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
On Nov 27, 2021, at 16:27 , Shaozhong SHI <shishaozhong@gmail.com> wrote:
select name FROM a_table where "STREET_NAME" ~ '^[[:alpha:]+ ]+[:alpha:]+$’;
The simplest thing that does what you says is I think:
select name FROM a_table where "STREET_NAME" ~ ‘^([[:alpha:]]+\s)+[[:alpha:]]+$’;
On Sat, Nov 27, 2021 at 5:27 PM Shaozhong SHI <shishaozhong@gmail.com> wrote:
this is supposed to find those to have 2 words and more.select name FROM a_table where "STREET_NAME" ~ '^[[:alpha:]+ ]+[:alpha:]+$';But, it finds only one word as well.It appears that regex is not robust.Can anyone shed light on this?
You put the space inside the first character class (the plus sign within that class is also a literal plus, not a "one or more" indicator) thus you've made the space itself optional. Did you maybe mean to use ( ) parens instead of [ ]?
When asking regex questions it is usually more helpful to formulate them like so:
SELECT 'this should match' ~ '<regex here>';
David J.
On 11/27/21 6:49 PM, Alvaro Herrera wrote:
This was my attitude when I first started to learn computer programming. My code is perfect, it can't be my fault!! There must be a compiler bug!!!
Just incomprehensible line noise!!
[snip]On 2021-Nov-28, Shaozhong SHI wrote:
It appears that regex is not robust.
This was my attitude when I first started to learn computer programming. My code is perfect, it can't be my fault!! There must be a compiler bug!!!
Nah.
Just incomprehensible line noise!!
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
On 2021-11-28 00:27:34 +0000, Shaozhong SHI wrote: > this is supposed to find those to have 2 words and more. > > select name FROM a_table where "STREET_NAME" ~ '^[[:alpha:]+ ]+[:alpha:]+$'; I think you meant select name FROM a_table where "STREET_NAME" ~ '^[[:alpha:]+ ]+[[:alpha:]]+$'; Note the extra two brackets. The character classes (like [:alpha:] or [:digit:] can only be used within bracket expressions. So you have to put brackets around the second [[:alpha:], too (like you did for the first one). But if you look more closely at the first one, you will notice that it doesn't do what you want, either: It matches any non-empty sequence of alpabetic characters, plus signs and spaces. But you almost certainly don't want to match a plus sign, and you don't want space and alphabetic characters to be interchangable. You want some alphabetic characters followed by a space. So this becomes select name FROM a_table where "STREET_NAME" ~ '^([[:alpha:]]+ )+[[:alpha:]]+$'; hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On 2021-11-28 00:27:34 +0000, Shaozhong SHI wrote: > It appears that regex is not robust. It does reliably what you tell it to do. I would agree that after almost 50 years of extending it, the syntax has become a bit of a mess. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
Am 28.11.21 um 01:27 schrieb Shaozhong SHI: > this is supposed to find those to have 2 words and more. > > select name FROM a_table where "STREET_NAME" ~ '^[[:alpha:]+ > ]+[:alpha:]+$'; > > But, it finds only one word as well. > > It appears that regex is not robust. > > Can anyone shed light on this? > > Regards, > > David Hi, It's robust, but syntax is sometimes weired for words I would use something like (contains numbers too) "STREET_NAME" ~ '(\w+\s+)+\w+'; or alpha only "STREET_NAME" ~ '([[:alpha:]]+\s+)+[[:alpha:]]+' regards Thomas