Re: Regexp match with accented character problem - Mailing list pgsql-novice

From Laslo Forro
Subject Re: Regexp match with accented character problem
Date
Msg-id AANLkTikUhCF0bU-gWO9T8H-TcSjdcBb2HCkN12qYf3iN@mail.gmail.com
Whole thread Raw
In response to Re: Regexp match with accented character problem  (Laslo Forro <getforum@gmail.com>)
List pgsql-novice
And one more thing:

this all is strange.

test=# select * from text where a_text ~* E'\\mmacskacic\\W\\s';
    title     |          a_text          
--------------+--------------------------
 A macskacicó | A bah macskacicóca
 A macskacicó | A bah macskacicó és a ló
(2 rows)

Strange, because I expect the 'macskacic+NON_WORD+WSPACE' pattern.
The corresponding perl regexp does not match:
macskacic\W\s

I am really lost.

And stop spamming.

On Tue, Jun 8, 2010 at 2:28 PM, Laslo Forro <getforum@gmail.com> wrote:
more:
having the string 'macskacicóca' it matches:
\\mmacskacic\\Wca'
so it matches:
\\mmacskacic\\W'
indicating that 'ó' is a non alphanumeric character, but strange enough 

but it doesn't:
\\mmacskacic\\W\\M
unless \M is with * quantifier.

Any idea or hint is highly appreciated.

Thanx in advance, 
Laslo

On Tue, Jun 8, 2010 at 1:59 PM, Laslo Forro <getforum@gmail.com> wrote:
Perhaps helps:

'ó' matches 
\M
\M\M\M
\.*

but not \M\M\M\M or \M\M\M\W

These match:
E'\\mmacskacicó\M*'
E'\\mmacskacicó\s*'
E'\\mmacskacicó\W*'

with * quantifier. But not with + quantifier, or w/o any quantifier.
Also matches:

E'\\mmacskacicó\\Y'     (!!!)
E'\\mmacskacicó$'

The text is typed via psql using urxvt terminal.
Perhaps some unicode - wide charater kind of mess?


On Tue, Jun 8, 2010 at 1:26 PM, Laslo Forro <getforum@gmail.com> wrote:
That might be a problem that 'ó' is not recognized as \w
Actually I do not know which class 'ó' is in. It matches:

test=# select * from texts where title ~* E'\\mmacskacic\\M';
    title     |           a_text           
--------------+----------------------------
 A macskacicó | A blah blah macskacicónak.
(1 row)

As if the end-of-word is at the last 'c' . ???

If the hex. code of 'ó' is 97 (dec.151) could someone hint me how to insert it into the expression?

On Tue, Jun 8, 2010 at 1:17 PM, Laslo Forro <getforum@gmail.com> wrote:
Thanks a lot, anyway!


On Tue, Jun 8, 2010 at 12:56 PM, Thom Brown <thombrown@gmail.com> wrote:
On 8 June 2010 11:54, Laslo Forro <getforum@gmail.com> wrote:
> test=# \l
>                                   List of databases
>    Name    |  Owner   | Encoding |  Collation  |    Ctype    |   Access
> privileges
> -----------+----------+----------+-------------+-------------+-----------------------
>  postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
>  template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres
>                                                              :
> postgres=CTc/postgres
>  template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres
>                                                              :
> postgres=CTc/postgres
>  test      | salmonix | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
> (5 rows)
>

Okay, I'm not sure what the problem is there then. :S  Hopefully
someone else can shed some light on it for you.

Thom





pgsql-novice by date:

Previous
From: Laslo Forro
Date:
Subject: Re: Regexp match with accented character problem
Next
From: Thom Brown
Date:
Subject: Re: Regexp match with accented character problem