Thread: Quick Regex Question
Hi all, I don't understand the last result: select 'Ho Ho Ho' ~* '^Ho'; returns true select 'Ho Ho Ho' ~* ' Ho'; returns true select 'Ho Ho Ho' ~* '[^ ]Ho'; returns false (Please note there is a space between ^ and ]) From my limited experience of regex, the last one is searching for either 'Ho' preceeeded by space or 'Ho' at the beginning of a string. How come it returns false? Thanks. P.S. The Ho's are Santa type Ho's - Not the other kind.
hi > select 'Ho Ho Ho' ~* '^Ho'; returns true > select 'Ho Ho Ho' ~* ' Ho'; returns true > select 'Ho Ho Ho' ~* '[^ ]Ho'; returns false (Please note there is a > space between ^ and ]) "A /bracket expression/ is a list of characters enclosed in []. It normally matches any single character from the list (but see below). If the list begins with ^, it matches any single character /not/ from the rest of the list." from: http://www.postgresql.org/docs/8.3/static/functions-matching.html#POSIX-BRACKET-EXPRESSIONS Regards Florian -- Media Ventures GmbH Jabber-ID faumeier@mabber.de Telefon +49 (0) 2236 480 10 22
Howard Cole wrote: > Hi all, > > I don't understand the last result: > > select 'Ho Ho Ho' ~* '^Ho'; returns true > select 'Ho Ho Ho' ~* ' Ho'; returns true > select 'Ho Ho Ho' ~* '[^ ]Ho'; returns false (Please note there is a > space between ^ and ]) > > From my limited experience of regex, the last one is searching for either > 'Ho' preceeeded by space or > 'Ho' at the beginning of a string. No, it's searching for not-space, the ^ inverts the meaning of the square brackets. You probably want something like '(^Ho)|( Ho)' -- Richard Huxton Archonet Ltd
On Thu, 20 Dec 2007 09:56:00 +0000 Howard Cole <howardnews@selestial.com> wrote: > Hi all, > > I don't understand the last result: > > select 'Ho Ho Ho' ~* '^Ho'; returns true There is actualli a Ho at the beginning of the string. > select 'Ho Ho Ho' ~* ' Ho'; returns true There are actually 2 ' Ho' > select 'Ho Ho Ho' ~* '[^ ]Ho'; returns false (Please note there is > a space between ^ and ]) There is no some character excluding space plus Ho. What's missing is you're asking for some character before Ho. The first Ho doesn't have a character preceding it. The 2 other Ho have one... but it is a space and you don't want it. -- Ivan Sergio Borgonovo http://www.webthatworks.it
Florian, Richard, Ivan. Fantastic response thank you very much.
Richard Huxton wrote: > Howard Cole wrote: >> Hi all, >> >> I don't understand the last result: >> >> select 'Ho Ho Ho' ~* '^Ho'; returns true >> select 'Ho Ho Ho' ~* ' Ho'; returns true >> select 'Ho Ho Ho' ~* '[^ ]Ho'; returns false (Please note there is a >> space between ^ and ]) >> >> From my limited experience of regex, the last one is searching for >> either >> 'Ho' preceeeded by space or >> 'Ho' at the beginning of a string. > > No, it's searching for not-space, the ^ inverts the meaning of the > square brackets. You probably want something like '(^Ho)|( Ho)' > Your expression works fine Richard, as does '(^| )ho', but can you tell me why '[ ^]ho' doesn't work?
am Thu, dem 20.12.2007, um 10:36:08 +0000 mailte Howard Cole folgendes: > Your expression works fine Richard, as does '(^| )ho', but can you tell > me why '[ ^]ho' doesn't work? With ^ you means an anchor, but within the brackets it's a simple char. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
On Thu, Dec 20, 2007 at 11:51:34AM +0100, A. Kretschmer wrote: > am Thu, dem 20.12.2007, um 10:36:08 +0000 mailte Howard Cole folgendes: > > Your expression works fine Richard, as does '(^| )ho', but can you tell > > me why '[ ^]ho' doesn't work? > > With ^ you means an anchor, but within the brackets it's a simple char. Err no, it inverts the test. [^ ] means any character *except* a space. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy
Attachment
Martijn van Oosterhout wrote: > On Thu, Dec 20, 2007 at 11:51:34AM +0100, A. Kretschmer wrote: >> am Thu, dem 20.12.2007, um 10:36:08 +0000 mailte Howard Cole folgendes: >>> Your expression works fine Richard, as does '(^| )ho', but can you tell >>> me why '[ ^]ho' doesn't work? >> With ^ you means an anchor, but within the brackets it's a simple char. > > Err no, it inverts the test. [^ ] means any character *except* a space. But only if it's the first character within the brackets. Which is the opposite of how "-" behaves inside square-brackets of course. Aren't regexps fun :-) -- Richard Huxton Archonet Ltd
am Thu, dem 20.12.2007, um 12:03:57 +0100 mailte Martijn van Oosterhout folgendes: > On Thu, Dec 20, 2007 at 11:51:34AM +0100, A. Kretschmer wrote: > > am Thu, dem 20.12.2007, um 10:36:08 +0000 mailte Howard Cole folgendes: > > > Your expression works fine Richard, as does '(^| )ho', but can you tell > > > me why '[ ^]ho' doesn't work? > > > > With ^ you means an anchor, but within the brackets it's a simple char. > > Err no, it inverts the test. [^ ] means any character *except* a space. I know, but only if the ^ at the beginning, or no? Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
Martijn van Oosterhout wrote: > On Thu, Dec 20, 2007 at 11:51:34AM +0100, A. Kretschmer wrote: > >> am Thu, dem 20.12.2007, um 10:36:08 +0000 mailte Howard Cole folgendes: >> >>> Your expression works fine Richard, as does '(^| )ho', but can you tell >>> me why '[ ^]ho' doesn't work? >>> >> With ^ you means an anchor, but within the brackets it's a simple char. >> > > Err no, it inverts the test. [^ ] means any character *except* a space. > > Have a nice day, > Hi Marijn, Andreas, I think Andreas is right, note the ordering of characters in the above example as [ ^] rather than [^ ]. So if the '^' is taken as literal '^', can I check for the beginning of a string in the brackets, or am I forced to use the (^| ) syntax? Is it just me or are regular expressions crazy? Howard
Howard Cole wrote: > Martijn van Oosterhout wrote: >> On Thu, Dec 20, 2007 at 11:51:34AM +0100, A. Kretschmer wrote: >> >>> am Thu, dem 20.12.2007, um 10:36:08 +0000 mailte Howard Cole >>> folgendes: >>> >>>> Your expression works fine Richard, as does '(^| )ho', but can you >>>> tell me why '[ ^]ho' doesn't work? >>>> >>> With ^ you means an anchor, but within the brackets it's a simple char. >>> >> >> Err no, it inverts the test. [^ ] means any character *except* a space. >> >> Have a nice day, >> > Hi Marijn, Andreas, > > I think Andreas is right, note the ordering of characters in the above > example as [ ^] rather than [^ ]. > So if the '^' is taken as literal '^', can I check for the beginning > of a string in the brackets, or am I forced to use the (^| ) syntax? > > Is it just me or are regular expressions crazy? > > Howard > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > Sorry - I have just read the relevant section of the manual again and it is starting to make sense. I shall use the (^| ) syntax as suggested. Thanks for all the help.
<Snip> Howard Cole wrote: >> > Hi Marijn, Andreas, > > I think Andreas is right, note the ordering of characters in the above > example as [ ^] rather than [^ ]. > So if the '^' is taken as literal '^', can I check for the beginning > of a string in the brackets, Why do you need to? Check for the beginning of the string BEFORE the set brackets. The point of set brackets is "match from a set of chars". Since "beginning of string" can only match one place, it has no meaning as a member of a set. Or in other words, if it has meaning, it needs to be matched FIRST out of the set, and therefore you can just remove from the set and put before the set brackets. > or am I forced to use the (^| ) syntax? > > Is it just me or are regular expressions crazy? Complicated, not crazy. Terry > > Howard > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >
Terry Fielder wrote: > Why do you need to? Check for the beginning of the string BEFORE the > set brackets. The point of set brackets is "match from a set of > chars". Since "beginning of string" can only match one place, it has > no meaning as a member of a set. Or in other words, if it has > meaning, it needs to be matched FIRST out of the set, and therefore > you can just remove from the set and put before the set brackets. >> or am I forced to use the (^| ) syntax? >> >> Is it just me or are regular expressions crazy? > Complicated, not crazy. > > Terry Hmm. Still think they are crazy - sometimes the characters are interpreted as literals - other times not? Thats crazy in my book! It would make more sense to me if you had to escape the characters inside the [ ] as they seem to be everywhere else. There is possibly a good reason for this - But perhaps they are just crazy!!! ;) I am trying to match the beginning of a name, so to search for 'how' in 'Howard Cole' should match 'col' in 'Howard Cole' should match 'ole' in 'Howard Cole' should NOT match, So using ~* '(^| )col' works for me! As would '(^col| col)' etc. Just as an aside, is there a function that escapes my search string so that any special regex characters are replaced? For example, if I was going to search for 'howard.cole' in the search string it would convert to 'howard[:.:]cole' or 'howard\.cole' - and then convert that into a postgres compatible string!
Howard Cole wrote: > Hmm. Still think they are crazy - sometimes the characters are interpreted > as literals - other times not? Thats crazy in my book! Yeah. ^, like a lot of other chars, means different things when at the beggining of a [] (where it means "negate the character class") than any other position inside the [] (where it means "a literal ^") than outside [] (where it means "anchor to beginning of string"). > I am trying to match the beginning of a name, so to search for > 'how' in 'Howard Cole' should match > 'col' in 'Howard Cole' should match > 'ole' in 'Howard Cole' should NOT match, > > So using ~* '(^| )col' works for me! As would '(^col| col)' etc. I think you are looking for [[:<:]] which means "beginning of word": alvherre=# select 'Howard Cole' ~* '[[:<:]]ole'; ?column? ---------- f (1 row) alvherre=# select 'Howard Cole' ~* '[[:<:]]col'; ?column? ---------- t (1 row) I use to know the symbol as \< on other regex engines. It is also known as \m on Postgres. It is not specified by the standard, so be careful with it. Note double backslash is needed: alvherre=# select 'Howard Cole' ~* e'\\mcol'; ?column? ---------- t (1 row) alvherre=# select 'Howard Cole' ~* e'\\mole'; ?column? ---------- f (1 row) > Just as an aside, is there a function that escapes my search string so that > any special regex characters are replaced? For example, if I was going to > search for 'howard.cole' in the search string it would convert to > 'howard[:.:]cole' or 'howard\.cole' - and then convert that into a postgres > compatible string! Hmm, I have no idea about that. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support