Thread: BUG #5766: regexp \y doesn't work properly when a word starts on ends with a UTF-8 char
BUG #5766: regexp \y doesn't work properly when a word starts on ends with a UTF-8 char
From
"Grzegorz Daniluk"
Date:
The following bug has been logged online: Bug reference: 5766 Logged by: Grzegorz Daniluk Email address: gdaniluk@gmail.com PostgreSQL version: 9.0.1 Operating system: Windows 7 64-bit Description: regexp \y doesn't work properly when a word starts on ends with a UTF-8 char Details: select regexp_replace('Foo Pasaż Bar', E'\\yPasaż\\y', ''); Above query doesn't replace the word 'Pasaż'. It returns full 'Foo Pasaż Bar' string, when the correct behavior is to return 'Foo Bar'. When the 'ż' is replaced with normal ASCII character like 'z', regexp_replace works as expected. My db details: ENCODING = 'UTF8' LC_COLLATE = 'Polish_Poland.1250' LC_CTYPE = 'Polish_Poland.1250'
Re: BUG #5766: regexp \y doesn't work properly when a word starts on ends with a UTF-8 char
From
Tom Lane
Date:
"Grzegorz Daniluk" <gdaniluk@gmail.com> writes: > select regexp_replace('Foo Pasaż Bar', E'\\yPasaż\\y', ''); > Above query doesn't replace the word 'Pasaż'. It returns full 'Foo Pasaż > Bar' string, when the correct behavior is to return 'Foo Bar'. Is this problem limited to \y, or do other regex operations that depend on locale-specific character classification also not work for you? regards, tom lane