Re: \w doesn't match non-ASCII letters - Mailing list pgsql-bugs

From Tom Lane
Subject Re: \w doesn't match non-ASCII letters
Date
Msg-id 3801.1087231716@sss.pgh.pa.us
Whole thread Raw
In response to Re: \w doesn't match non-ASCII letters  (Markus Bertheau <twanger@bluetwanger.de>)
List pgsql-bugs
Markus Bertheau <twanger@bluetwanger.de> writes:
> Is there something planned to support UTF-8 in regexps?

It'd be relatively easy to use the <wctype.h> functions here if we
were convinced that pg_mb2wchar() generated exactly the same
wide-character encoding as the C library is expecting for the current
LC_CTYPE setting.  In the absence of such a guarantee I think we'd
have to convert the pg_wchar back to multibyte form and then apply
mbstowcs(), which is rather painful, not least because our wide
character support doesn't seem to have any function for converting
back to multibyte form ...

Tatsuo, any thoughts here?

            regards, tom lane

pgsql-bugs by date:

Previous
From: Markus Bertheau
Date:
Subject: Re: \w doesn't match non-ASCII letters
Next
From: "PostgreSQL Bugs List"
Date:
Subject: BUG #1164: Informix compatibility ecpg