Re: BUG #1931: ILIKE and LIKE fails on Turkish locale - Mailing list pgsql-bugs
From | Victor Snezhko |
---|---|
Subject | Re: BUG #1931: ILIKE and LIKE fails on Turkish locale |
Date | |
Msg-id | upsdoxr70.fsf@indorsoft.ru Whole thread Raw |
In response to | Re: BUG #1931: ILIKE and LIKE fails on Turkish locale (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: BUG #1931: ILIKE and LIKE fails on Turkish locale
|
List | pgsql-bugs |
Tom Lane <tgl@sss.pgh.pa.us> writes: > Victor Snezhko <snezhko@indorsoft.ru> writes: >> 2) When I try to create a stored procedure to create table (e.g., for >> creating table only if it doesn't already exist), it fails to >> compile if i use cyrillic letter "=C8" (unicode id: 0x0445, utf-8 >> representation is D1 85) with the following weird error: > >> ERROR: expected dot between identifiers: =C8 >> CONTEXT: compile of PL/pgSQL function "createoraltertable" near line 2 > >> the following query fails: > >> CREATE OR REPLACE FUNCTION TestFunction() >> RETURNS int >> AS $$ >> BEGIN >> SELECT =C8 FROM test_table; >> RETURN 0; >> END; >> $$ LANGUAGE plpgsql; > > I tried to duplicate this without success on Fedora Core 5. Either your > ru_RU.utf8 locale works differently from Fedora's, or something else is > wrong.=20=20 > Looking at the code, the only very plausible theory is that > isspace() is doing something different than we expect it to. I wonder > whether you have the database encoding set to something else than what > the postmaster's LC_CTYPE locale expects? No, the database encoding is UTF-8, the cluster was built with ru_RU.UTF-8 locale, and I haven't modified any defaults. It's on FreeBSD 6.1, and it's quite possible that it has something wrong with utf-8 locale, I saw some complaints at least about collation. However, isspace doesn't seem like a culprit to me. Let's ensure that I have done everything right to come to this conclusion. I have applied the following patch to pl_funcs.c: But, if I'm doing everything right, isspace seems not to be the culprit to me. I have run the above query on the server with the following applied patch (the server is 8.1.4, but I don't see any changes to the plpgsql_convert_ident() in revs 1.46-1.54): *** src/pl/plpgsql/src/pl_funcs.c.orig Wed Nov 23 00:23:30 2005 --- src/pl/plpgsql/src/pl_funcs.c Fri Sep 22 14:48:09 2006 *************** *** 363,368 **** --- 363,381 ---- { const char *sstart =3D s; int identctr =3D 0; + char *isspacebuf; + int i; + + isspacebuf =3D (char*)malloc(strlen(s) + 1); + for (i=3D0; s[i]; ++i) { + if (isspace(s[i])) + isspacebuf[i] =3D 't'; + else + isspacebuf[i] =3D 'f'; + } + + ereport(LOG, + (errmsg("plpgsql_convert_ident: ident=3D%s, isspace array= =3D%s, isspace(0)=3D%c", s, isspacebuf, isspace(0)?'t':'f'))); /* Outer loop over identifiers */ while (*s) and got the following line in the logfile: LOG: plpgsql_convert_ident: ident=3D=C8, isspace array=3Dff, isspace(0)= =3Df The symbol after "input=3D" is that 0x0445 (D1 85 in utf-8) letter which I mentioned above. I also double-checked this: got a stack trace (by inserting NULL-pointer assignment instead of reporting "expected dot between identifiers") and saw that the value passed to the plpgsql_convert_ident points to the following bytes: D1 85 00. So, isspace returns false on all three of them. I have also run the same query on another cluster (and the same binaries), initialized with a single-byte encoding: ru_RU.KOI8-R, where everything works. isspace returns the same results: LOG: plpgsql_convert_ident: ident=3D=C8, isspace array=3Df, isspace(0)=3Df LOG: plpgsql_convert_ident: ident=3Dtest_table, isspace array=3Dffffffffff= , isspace(0)=3Df Strange... --=20 WBR, Victor V. Snezhko E-mail: snezhko@indorsoft.ru
pgsql-bugs by date: