Thread: BUG #3433: regexp \m and \M don't work for cyrillic

BUG #3433: regexp \m and \M don't work for cyrillic

From
"Andriy Rysin"
Date:
The following bug has been logged online:

Bug reference:      3433
Logged by:          Andriy Rysin
Email address:      arysin@gmail.com
PostgreSQL version: 8.2.4
Operating system:   Linux
Description:        regexp \m and \M don't work for cyrillic
Details:

psql krym
krym=> \encoding
UTF8
krym=> create table test (txt varchar);
CREATE TABLE
krym=> insert into test values ('latin');
INSERT 0 1
krym=> insert into test values ('кирилиця');
INSERT 0 1
krym=> select * from test;
   txt
----------
 latin
 кирилиця
(2 rows)

krym=> select * from test where txt ~* E'\\mla';
  txt
-------
 latin
(1 row)

krym=> select * from test where txt ~* E'\\mкир';
 txt
-----
(0 rows)

escaping specials in regular expressions \m and \M for beginning of word and
end of word work for latin symbols bug don't for cyrillic

Re: BUG #3433: regexp \m and \M don't work for cyrillic

From
Tom Lane
Date:
"Andriy Rysin" <arysin@gmail.com> writes:
> escaping specials in regular expressions \m and \M for beginning of word and
> end of word work for latin symbols bug don't for cyrillic

Sorry, the locale-specific regex features only work on single-byte
characters at the moment.  In any case you'd need to be using a Russian
locale (maybe you are, but you didn't say).  I'd expect this feature
to work with Cyrillic letters in ru_RU locale + KOI8 encoding, but not
elsewhere.

            regards, tom lane

Re: BUG #3433: regexp \m and \M don't work for cyrillic

From
"Andriy Rysin"
Date:
MjAwNy83LzcsIFRvbSBMYW5lIDx0Z2xAc3NzLnBnaC5wYS51cz46Cj4KPiAi
QW5kcml5IFJ5c2luIiA8YXJ5c2luQGdtYWlsLmNvbT4gd3JpdGVzOgo+ID4g
ZXNjYXBpbmcgc3BlY2lhbHMgaW4gcmVndWxhciBleHByZXNzaW9ucyBcbSBh
bmQgXE0gZm9yIGJlZ2lubmluZyBvZiB3b3JkCj4gYW5kCj4gPiBlbmQgb2Yg
d29yZCB3b3JrIGZvciBsYXRpbiBzeW1ib2xzIGJ1ZyBkb24ndCBmb3IgY3ly
aWxsaWMKPgo+IFNvcnJ5LCB0aGUgbG9jYWxlLXNwZWNpZmljIHJlZ2V4IGZl
YXR1cmVzIG9ubHkgd29yayBvbiBzaW5nbGUtYnl0ZQo+IGNoYXJhY3RlcnMg
YXQgdGhlIG1vbWVudC4gIEluIGFueSBjYXNlIHlvdSdkIG5lZWQgdG8gYmUg
dXNpbmcgYSBSdXNzaWFuCj4gbG9jYWxlIChtYXliZSB5b3UgYXJlLCBidXQg
eW91IGRpZG4ndCBzYXkpLiAgSSdkIGV4cGVjdCB0aGlzIGZlYXR1cmUKPiB0
byB3b3JrIHdpdGggQ3lyaWxsaWMgbGV0dGVycyBpbiBydV9SVSBsb2NhbGUg
KyBLT0k4IGVuY29kaW5nLCBidXQgbm90Cj4gZWxzZXdoZXJlLgoKCkhpIFRv
bSwKCkkgd2FzIHVzaW5nIGVuX1VTLlVURi04IGxvY2FsZSBidXQgeW91J3Jl
IHJpZ2h0IGV2ZW4gaWYgSSBjcmVhdGUgbXkgY2x1c3Rlcgp3aXRoIHVrX1VB
LlVURi04IHN0aWxsIFxtIHdvdWxkIG5vdCB3b3JrIGZvciBjeXJpbGxpYyBi
dXQgd291bGQgY29udGludWUgdG8Kd29yayBmb3IgbGF0aW4gY2hhcnMuIEkg
Y2FuJ3Qgd29yayB3aXRoIHNpbmdsZS1ieXRlIGVuY29kaW5ncyBhcyBJIGhh
dmUgc29tZQpzeW1ib2xzIGZyb20gVW5pY29kZSBpbiBteSBwcm9qZWN0IGFu
ZCBldmVyeXRoaW5nIGVsc2UgaXMgaW4gVW5pY29kZSBzbwpjb252ZXJ0aW5n
IGRhdGEgZm9ydGggYW5kIGJhY2sgd291bGQgYmUgcXVpdGUgYSBkcmFnLgoK
U28gY3VycmVudGx5IG15IG9ubHkgd29ya2Fyb3VuZCBmb3IgXG0gaXMgdG8g
dXNlIChefFteWzphbHBoYTpdXSkgdGhvdWdoCls6YWxwaGE6XSBldmVuIGlu
IHVrX1VBLlVURi04IG1lYW5zIGxhdGluIGNoYXJhY3RlciwgdGh1cyBJIGhh
dmUgdG8gc3BlY2lmeQpzeW1ib2xzIGRpcmVjdGx5LCBlLmcuIChefFte0LAt
0Y/RltGU0ZfSkV0pIHdoaWNoIG1heSBiZSBnb29kIGlmIEkgZG9uJ3QgY2Fy
ZSB0bwpzZXBhcmF0ZSBSdXNzaWFuIGFuZCBVa3JhaW5pYW4gYnV0IGlmIEkg
ZG8gSSdkIGhhdmUgdG8gYmUgZXZlbiBtb3JlIHNwZWNpZmljCmZvciBwdXJl
IFVrcmFpbmlhbjogKF58W17QsC3RjNGO0Y/RltGU0ZfSkV0pIChhc3N1bWlu
ZyBJIHJlbWVtYmVyIGFib3V0CmNhc2Utc2Vuc2l0aXZpdHkgb2YgbXkgcmVn
ZXhwIGFuZCBhc3N1bWluZyBJIGtub3cgVVRGLTggY29kZXMpLgoKVGhvdWdo
IEkgYWdyZWUgSSBtaXNzZWQgdGhlIGZhY3QgdGhhdCBcbSBpcyBsb2NhbGUt
c3BlY2lmaWMgKGFzIGl0IGhhcyB0bwprbm93IHByb3BlciBub24td29yZCBh
bmQgd29yZCBjaGFycyBmb3IgbG9jYWxlKSBhbmQgdGh1cyBjYW4ndCB3b3Jr
IGZvciBhbGwKbG9jYWxlcyBldmVuIGlmIHVzaW5nIFVuaWNvZGUgYW5kIG15
IG9yaWdpbmFsIHRlc3QgaW4gZW5fVVMgbG9jYWxlIHdhcyBub3QKdmFsaWQs
IGl0IHN0aWxsIHdvdWxkIGJlIG5pY2UgdG8gaGF2ZSB0d28gdGhpbmdzOgox
KSBtdWx0aWJ5dGUgc3VwcG9ydCBmb3IgbG9jYWxlLXNwZWNpZmljIHJlZ2V4
cHMgbGlrZSBcbSBhbmQgWzphbHBoYTpdCjIpIGJlIGFibGUgdG8gdGVsbCBy
ZWdleHAgd2hpY2ggTENfQ1RZUEUgdG8gdXNlIGZvciBzcGVjaWZpYyBpbnZv
Y2F0aW9uIGF0Cmxlc3Qgb24gU1FMLXN0YXRlbWVudCBsZXZlbCwgdGhpcyB3
b3VsZCBiZSBleHRyZW1lbHkgdXNlZnVsIGZvcgptdWx0aS1saW5ndWFsIHBy
b2plY3RzLCBlLmcuIGRpY3Rpb25hcmllcyAod2hpY2ggaXMgdGhlIHR5cGUg
b2YgbXkgcHJvamVjdApCVFcpLCBob3BlZnVsbHkgdGhleSBhcmUgbm90IHRv
IHRpZ2h0bHkgY29ubmVjdGVkIHRvIExDX0NUWVBFIG9mIHRoZQpjbHVzdGVy
LgpJIHVuZGVyc3RhbmQgdGhvdWdoIHRoYXQgdGhlc2UgdHdvIG5vdCBxdWl0
ZSBqdXN0IGJ1ZyBmaXhlcyBhbmQgd2lsbCByZXF1aXJlCnNvbWUgZWZmb3J0
IHRvIGltcGxlbWVudC4KClRoYW5rcywKQW5kcml5Cg==