Thread: UTF-8 and Regular expression

UTF-8 and Regular expression

From
Håvard Wahl Kongsgård
Date:
Hi, in 8.4 how does the regular expression functions in postgresql handle special UTF-8 characters?

for example:
SELECT name,substring(name from E'\\w+\\s(\\w+)$') from nodes;
fails to select characters like ü ø æ å

--
Håvard Wahl Kongsgård

http://havard.security-review.net/

Re: UTF-8 and Regular expression

From
Tom Lane
Date:
=?ISO-8859-1?Q?H=E5vard_Wahl_Kongsg=E5rd?= <haavard.kongsgaard@gmail.com> writes:
> Hi, in 8.4 how does the regular expression functions in postgresql handle
> special UTF-8 characters?

Badly :-(

> for example:
> SELECT name,substring(name from E'\\w+\\s(\\w+)$') from nodes;
> fails to select characters like � � � �

Should work in 9.0, but no chance in earlier releases.  You need to use
a single-byte encoding such as LATIN1 if you need to do this in older
releases.  In any release, make sure you're using an LC_COLLATE setting
that's appropriate for the language and encoding.

            regards, tom lane