Thread: BUG #5743: Regexp engine fails to case-insensitively match multi-byte codepoints
BUG #5743: Regexp engine fails to case-insensitively match multi-byte codepoints
From
"Vlad Romascanu"
Date:
The following bug has been logged online: Bug reference: 5743 Logged by: Vlad Romascanu Email address: vromascanu@accurev.com PostgreSQL version: 8.4.3 Operating system: Windows, Linux Description: Regexp engine fails to case-insensitively match multi-byte codepoints Details: Already reported in 2006 but seems to have fallen through the cracks (I can find no followup.) Problem still exists in v8.4.3. Problem still appears to be pg_wc_tolower downcasting to char before calling tolower() (instead of calling towlower().) This one of several inconsistencies unfortunately still present in case-insensitive regexp vs. LOWER(str) [str_lower] treatment (including char to wchar conversion using MultiByteToWideChar/mbstowcs vs. char2wchar, or towlower vs. pg_wc_tolower.) Current workaround is to use LOWER(str) ~ LOWER('regexp').
Re: BUG #5743: Regexp engine fails to case-insensitively match multi-byte codepoints
From
Tom Lane
Date:
"Vlad Romascanu" <vromascanu@accurev.com> writes: > Description: Regexp engine fails to case-insensitively match > multi-byte codepoints > Already reported in 2006 but seems to have fallen through the cracks (I can > find no followup.) Problem still exists in v8.4.3. It's fixed in 9.0, at least for cases using UTF8 encoding. regards, tom lane