Re: BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8) - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8)
Date
Msg-id 11041.1329244091@sss.pgh.pa.us
Whole thread Raw
In response to BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8)  (albert.cieszkowski@cc.com.pl)
Responses Re: BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8)
Re: BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8)
List pgsql-bugs
albert.cieszkowski@cc.com.pl writes:
> peimp=> select 'Świnoujście' ~* '\mŚwinoujście\M';
>  ?column?
> ----------
>  f
> (1 row)

Oh, I see the reason for this: the code in cclass() in regc_locale.c
doesn't go further up than U+00FF, so no codes above that will be
thought to be letters (or members of any other character class).
Clearly we need to go further when we are dealing with UTF8.
I'm not sure what a sane limit would be though.

(It would be nice if there were a more efficient way to get this
information than laboriously iterating through all the possible
character codes.  It doesn't look like we're even trying to cache
the results, ick.)

            regards, tom lane

pgsql-bugs by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: BUG #6458: LIKE different to =
Next
From: calestyo@scientia.net
Date:
Subject: BUG #6459: logging_collector=off but log_filename set inhibits logoutpu