Re: UTF-8 and Regular expression - Mailing list pgsql-general

From Tom Lane
Subject Re: UTF-8 and Regular expression
Date
Msg-id 911.1306877181@sss.pgh.pa.us
Whole thread Raw
In response to UTF-8 and Regular expression  (Håvard Wahl Kongsgård <haavard.kongsgaard@gmail.com>)
List pgsql-general
=?ISO-8859-1?Q?H=E5vard_Wahl_Kongsg=E5rd?= <haavard.kongsgaard@gmail.com> writes:
> Hi, in 8.4 how does the regular expression functions in postgresql handle
> special UTF-8 characters?

Badly :-(

> for example:
> SELECT name,substring(name from E'\\w+\\s(\\w+)$') from nodes;
> fails to select characters like � � � �

Should work in 9.0, but no chance in earlier releases.  You need to use
a single-byte encoding such as LATIN1 if you need to do this in older
releases.  In any release, make sure you're using an LC_COLLATE setting
that's appropriate for the language and encoding.

            regards, tom lane

pgsql-general by date:

Previous
From: Pete Chown
Date:
Subject: Consistency of distributed transactions
Next
From: Merlin Moncure
Date:
Subject: Re: Function Column Expansion Causes Inserts To Fail