Thread: Quick Regex Question

Quick Regex Question

From
Howard Cole
Date:
Hi all,

I don't understand the last result:

select 'Ho Ho Ho' ~* '^Ho'; returns true
select 'Ho Ho Ho' ~* ' Ho'; returns true
select 'Ho Ho Ho' ~* '[^ ]Ho'; returns false (Please note there is a
space between ^ and ])

 From my limited experience of regex, the last one is searching for either
    'Ho' preceeeded by space or
    'Ho' at the beginning of a string.

How come it returns false?

Thanks.

P.S. The Ho's are Santa type Ho's - Not the other kind.

Re: Quick Regex Question

From
Florian Aumeier
Date:
hi
> select 'Ho Ho Ho' ~* '^Ho'; returns true
> select 'Ho Ho Ho' ~* ' Ho'; returns true
> select 'Ho Ho Ho' ~* '[^ ]Ho'; returns false (Please note there is a
> space between ^ and ])

"A /bracket expression/ is a list of characters enclosed in []. It
normally matches any single character from the list (but see below). If
the list begins with ^, it matches any single character /not/ from the
rest of the list."

from:
http://www.postgresql.org/docs/8.3/static/functions-matching.html#POSIX-BRACKET-EXPRESSIONS

Regards
Florian

--
Media Ventures GmbH
Jabber-ID faumeier@mabber.de
Telefon +49 (0) 2236 480 10 22


Re: Quick Regex Question

From
Richard Huxton
Date:
Howard Cole wrote:
> Hi all,
>
> I don't understand the last result:
>
> select 'Ho Ho Ho' ~* '^Ho'; returns true
> select 'Ho Ho Ho' ~* ' Ho'; returns true
> select 'Ho Ho Ho' ~* '[^ ]Ho'; returns false (Please note there is a
> space between ^ and ])
>
>  From my limited experience of regex, the last one is searching for either
>    'Ho' preceeeded by space or
>    'Ho' at the beginning of a string.

No, it's searching for not-space, the ^ inverts the meaning of the
square brackets. You probably want something like '(^Ho)|( Ho)'

--
   Richard Huxton
   Archonet Ltd

Re: Quick Regex Question

From
Ivan Sergio Borgonovo
Date:
On Thu, 20 Dec 2007 09:56:00 +0000
Howard Cole <howardnews@selestial.com> wrote:

> Hi all,
>
> I don't understand the last result:
>
> select 'Ho Ho Ho' ~* '^Ho'; returns true

There is actualli a Ho at the beginning of the string.

> select 'Ho Ho Ho' ~* ' Ho'; returns true

There are actually 2 ' Ho'

> select 'Ho Ho Ho' ~* '[^ ]Ho'; returns false (Please note there is
> a space between ^ and ])

There is no some character excluding space plus Ho.
What's missing is you're asking for some character before Ho.
The first Ho doesn't have a character preceding it.
The 2 other Ho have one... but it is a space and you don't want it.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: Quick Regex Question

From
Howard Cole
Date:
Florian, Richard, Ivan.

Fantastic response thank you very much.



Re: Quick Regex Question

From
Howard Cole
Date:
Richard Huxton wrote:
> Howard Cole wrote:
>> Hi all,
>>
>> I don't understand the last result:
>>
>> select 'Ho Ho Ho' ~* '^Ho'; returns true
>> select 'Ho Ho Ho' ~* ' Ho'; returns true
>> select 'Ho Ho Ho' ~* '[^ ]Ho'; returns false (Please note there is a
>> space between ^ and ])
>>
>>  From my limited experience of regex, the last one is searching for
>> either
>>    'Ho' preceeeded by space or
>>    'Ho' at the beginning of a string.
>
> No, it's searching for not-space, the ^ inverts the meaning of the
> square brackets. You probably want something like '(^Ho)|( Ho)'
>
Your expression works fine Richard, as does '(^| )ho', but can you tell
me why '[ ^]ho' doesn't work?

Re: Quick Regex Question

From
"A. Kretschmer"
Date:
am  Thu, dem 20.12.2007, um 10:36:08 +0000 mailte Howard Cole folgendes:
> Your expression works fine Richard, as does '(^| )ho', but can you tell
> me why '[ ^]ho' doesn't work?

With ^ you means an anchor, but within the brackets it's a simple char.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: Quick Regex Question

From
Martijn van Oosterhout
Date:
On Thu, Dec 20, 2007 at 11:51:34AM +0100, A. Kretschmer wrote:
> am  Thu, dem 20.12.2007, um 10:36:08 +0000 mailte Howard Cole folgendes:
> > Your expression works fine Richard, as does '(^| )ho', but can you tell
> > me why '[ ^]ho' doesn't work?
>
> With ^ you means an anchor, but within the brackets it's a simple char.

Err no, it inverts the test. [^ ] means any character *except* a space.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
>  -- John F Kennedy

Attachment

Re: Quick Regex Question

From
Richard Huxton
Date:
Martijn van Oosterhout wrote:
> On Thu, Dec 20, 2007 at 11:51:34AM +0100, A. Kretschmer wrote:
>> am  Thu, dem 20.12.2007, um 10:36:08 +0000 mailte Howard Cole folgendes:
>>> Your expression works fine Richard, as does '(^| )ho', but can you tell
>>> me why '[ ^]ho' doesn't work?
>> With ^ you means an anchor, but within the brackets it's a simple char.
>
> Err no, it inverts the test. [^ ] means any character *except* a space.

But only if it's the first character within the brackets.

Which is the opposite of how "-" behaves inside square-brackets of course.

Aren't regexps fun :-)

--
   Richard Huxton
   Archonet Ltd

Re: Quick Regex Question

From
"A. Kretschmer"
Date:
am  Thu, dem 20.12.2007, um 12:03:57 +0100 mailte Martijn van Oosterhout folgendes:
> On Thu, Dec 20, 2007 at 11:51:34AM +0100, A. Kretschmer wrote:
> > am  Thu, dem 20.12.2007, um 10:36:08 +0000 mailte Howard Cole folgendes:
> > > Your expression works fine Richard, as does '(^| )ho', but can you tell
> > > me why '[ ^]ho' doesn't work?
> >
> > With ^ you means an anchor, but within the brackets it's a simple char.
>
> Err no, it inverts the test. [^ ] means any character *except* a space.

I know, but only if the ^ at the beginning, or no?


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: Quick Regex Question

From
Howard Cole
Date:
Martijn van Oosterhout wrote:
> On Thu, Dec 20, 2007 at 11:51:34AM +0100, A. Kretschmer wrote:
>
>> am  Thu, dem 20.12.2007, um 10:36:08 +0000 mailte Howard Cole folgendes:
>>
>>> Your expression works fine Richard, as does '(^| )ho', but can you tell
>>> me why '[ ^]ho' doesn't work?
>>>
>> With ^ you means an anchor, but within the brackets it's a simple char.
>>
>
> Err no, it inverts the test. [^ ] means any character *except* a space.
>
> Have a nice day,
>
Hi Marijn, Andreas,

I think Andreas is right, note the ordering of characters in the above
example as [ ^] rather than [^ ].
So if the '^' is taken as literal '^', can I check for the beginning of
a string in the brackets, or am I forced to use the (^| ) syntax?

Is it just me or are regular expressions crazy?

Howard

Re: Quick Regex Question

From
Howard Cole
Date:
Howard Cole wrote:
> Martijn van Oosterhout wrote:
>> On Thu, Dec 20, 2007 at 11:51:34AM +0100, A. Kretschmer wrote:
>>
>>> am  Thu, dem 20.12.2007, um 10:36:08 +0000 mailte Howard Cole
>>> folgendes:
>>>
>>>> Your expression works fine Richard, as does '(^| )ho', but can you
>>>> tell me why '[ ^]ho' doesn't work?
>>>>
>>> With ^ you means an anchor, but within the brackets it's a simple char.
>>>
>>
>> Err no, it inverts the test. [^ ] means any character *except* a space.
>>
>> Have a nice day,
>>
> Hi Marijn, Andreas,
>
> I think Andreas is right, note the ordering of characters in the above
> example as [ ^] rather than [^ ].
> So if the '^' is taken as literal '^', can I check for the beginning
> of a string in the brackets, or am I forced to use the (^| ) syntax?
>
> Is it just me or are regular expressions crazy?
>
> Howard
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>
Sorry - I have just read the relevant section of the manual again and it
is starting to make sense. I shall use the (^| ) syntax as suggested.
Thanks for all the help.

Re: Quick Regex Question

From
Terry Fielder
Date:
<Snip>
Howard Cole wrote:
>>
> Hi Marijn, Andreas,
>
> I think Andreas is right, note the ordering of characters in the above
> example as [ ^] rather than [^ ].
> So if the '^' is taken as literal '^', can I check for the beginning
> of a string in the brackets,
Why do you need to?  Check for the beginning of the string BEFORE the
set brackets.  The point of set brackets is "match from a set of
chars".  Since "beginning of string" can only match one place, it has no
meaning as a member of a set.  Or in other words, if it has meaning, it
needs to be matched FIRST out of the set, and therefore you can just
remove from the set and put before the set brackets.
> or am I forced to use the (^| ) syntax?

>
> Is it just me or are regular expressions crazy?
Complicated, not crazy.

Terry

>
> Howard
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>

Re: Quick Regex Question

From
Howard Cole
Date:
Terry Fielder wrote:
> Why do you need to?  Check for the beginning of the string BEFORE the
> set brackets.  The point of set brackets is "match from a set of
> chars".  Since "beginning of string" can only match one place, it has
> no meaning as a member of a set.  Or in other words, if it has
> meaning, it needs to be matched FIRST out of the set, and therefore
> you can just remove from the set and put before the set brackets.
>> or am I forced to use the (^| ) syntax?
>>
>> Is it just me or are regular expressions crazy?
> Complicated, not crazy.
>
> Terry
Hmm. Still think they are crazy - sometimes the characters are
interpreted as literals - other times not? Thats crazy in my book! It
would make more sense to me if you had to escape the characters inside
the [ ] as they seem to be everywhere else. There is possibly a good
reason for this - But perhaps they are just crazy!!!
;)

I am trying to match the beginning of a name, so to search for
'how' in 'Howard Cole' should match
'col' in 'Howard Cole' should match
'ole' in 'Howard Cole' should NOT match,

So using ~* '(^| )col' works for me! As would '(^col| col)' etc.

Just as an aside, is there a function that escapes my search string so
that any special regex characters are replaced? For example, if I was
going to search for 'howard.cole' in the search string it would convert
to 'howard[:.:]cole' or 'howard\.cole' - and then convert that into a
postgres compatible string!





Re: Quick Regex Question

From
Alvaro Herrera
Date:
Howard Cole wrote:

> Hmm. Still think they are crazy - sometimes the characters are interpreted
> as literals - other times not? Thats crazy in my book!

Yeah.  ^, like a lot of other chars, means different things when at the
beggining of a [] (where it means "negate the character class") than
any other position inside the [] (where it means "a literal ^") than
outside [] (where it means "anchor to beginning of string").

> I am trying to match the beginning of a name, so to search for
> 'how' in 'Howard Cole' should match
> 'col' in 'Howard Cole' should match
> 'ole' in 'Howard Cole' should NOT match,
>
> So using ~* '(^| )col' works for me! As would '(^col| col)' etc.

I think you are looking for [[:<:]] which means "beginning of word":

alvherre=# select 'Howard Cole' ~* '[[:<:]]ole';
 ?column?
----------
 f
(1 row)

alvherre=# select 'Howard Cole' ~* '[[:<:]]col';
 ?column?
----------
 t
(1 row)

I use to know the symbol as \< on other regex engines.  It is also
known as \m on Postgres.  It is not specified by the standard, so be
careful with it.  Note double backslash is needed:

alvherre=# select 'Howard Cole' ~* e'\\mcol';
 ?column?
----------
 t
(1 row)

alvherre=# select 'Howard Cole' ~* e'\\mole';
 ?column?
----------
 f
(1 row)



> Just as an aside, is there a function that escapes my search string so that
> any special regex characters are replaced? For example, if I was going to
> search for 'howard.cole' in the search string it would convert to
> 'howard[:.:]cole' or 'howard\.cole' - and then convert that into a postgres
> compatible string!

Hmm, I have no idea about that.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support