Thread: BUG #5766: regexp \y doesn't work properly when a word starts on ends with a UTF-8 char

BUG #5766: regexp \y doesn't work properly when a word starts on ends with a UTF-8 char

From

"Grzegorz Daniluk"

Date:

24 November 2010, 15:52:16

The following bug has been logged online:

Bug reference:      5766
Logged by:          Grzegorz Daniluk
Email address:      gdaniluk@gmail.com
PostgreSQL version: 9.0.1
Operating system:   Windows 7 64-bit
Description:        regexp \y doesn't work properly when a word starts on
ends with a UTF-8 char
Details:

select regexp_replace('Foo PasaÅ¼ Bar', E'\\yPasaÅ¼\\y', '');

Above query doesn't replace the word 'PasaÅ¼'. It returns full 'Foo PasaÅ¼
Bar' string, when the correct behavior is to return 'Foo  Bar'.

When the 'Å¼' is replaced with normal ASCII character like 'z',
regexp_replace works as expected.

My db details:
ENCODING = 'UTF8'
LC_COLLATE = 'Polish_Poland.1250'
LC_CTYPE = 'Polish_Poland.1250'

Re: BUG #5766: regexp \y doesn't work properly when a word starts on ends with a UTF-8 char

From

Tom Lane

Date:

24 November 2010, 16:57:22

"Grzegorz Daniluk" <gdaniluk@gmail.com> writes:
> select regexp_replace('Foo PasaÅ¼ Bar', E'\\yPasaÅ¼\\y', '');

> Above query doesn't replace the word 'PasaÅ¼'. It returns full 'Foo PasaÅ¼
> Bar' string, when the correct behavior is to return 'Foo  Bar'.

Is this problem limited to \y, or do other regex operations that depend
on locale-specific character classification also not work for you?

            regards, tom lane