Re: BUG #1931: ILIKE and LIKE fails on Turkish locale - Mailing list pgsql-bugs

From Victor Snezhko
Subject Re: BUG #1931: ILIKE and LIKE fails on Turkish locale
Date
Msg-id upsdoxr70.fsf@indorsoft.ru
Whole thread Raw
In response to Re: BUG #1931: ILIKE and LIKE fails on Turkish locale  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #1931: ILIKE and LIKE fails on Turkish locale
List pgsql-bugs
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Victor Snezhko <snezhko@indorsoft.ru> writes:
>> 2) When I try to create a stored procedure to create table (e.g., for
>>    creating table only if it doesn't already exist), it fails to
>>    compile if i use cyrillic letter "=C8" (unicode id: 0x0445, utf-8
>>    representation is D1 85) with the following weird error:
>
>> ERROR:  expected dot between identifiers: =C8
>> CONTEXT:  compile of PL/pgSQL function "createoraltertable" near line 2
>
>> the following query fails:
>
>> CREATE OR REPLACE FUNCTION TestFunction()
>> RETURNS int
>> AS $$
>> BEGIN
>>   SELECT =C8 FROM test_table;
>>   RETURN 0;
>> END;
>> $$ LANGUAGE plpgsql;
>
> I tried to duplicate this without success on Fedora Core 5.  Either your
> ru_RU.utf8 locale works differently from Fedora's, or something else is
> wrong.=20=20
> Looking at the code, the only very plausible theory is that
> isspace() is doing something different than we expect it to.  I wonder
> whether you have the database encoding set to something else than what
> the postmaster's LC_CTYPE locale expects?

No, the database encoding is UTF-8, the cluster was built with
ru_RU.UTF-8 locale, and I haven't modified any defaults.

It's on FreeBSD 6.1, and it's quite possible that it has something
wrong with utf-8 locale, I saw some complaints at least about
collation.

However, isspace doesn't seem like a culprit to me. Let's ensure that
I have done everything right to come to this conclusion. I have
applied the following patch to pl_funcs.c:


But, if I'm doing everything right, isspace seems not to be the
culprit to me. I have run the above query on the server with the
following applied patch (the server is 8.1.4, but I don't see any
changes to the plpgsql_convert_ident() in revs 1.46-1.54):

*** src/pl/plpgsql/src/pl_funcs.c.orig  Wed Nov 23 00:23:30 2005
--- src/pl/plpgsql/src/pl_funcs.c       Fri Sep 22 14:48:09 2006
***************
*** 363,368 ****
--- 363,381 ----
  {
        const char *sstart =3D s;
        int                     identctr =3D 0;
+       char                    *isspacebuf;
+       int                     i;
+
+       isspacebuf =3D (char*)malloc(strlen(s) + 1);
+       for (i=3D0; s[i]; ++i) {
+               if (isspace(s[i]))
+                       isspacebuf[i] =3D 't';
+               else
+                       isspacebuf[i] =3D 'f';
+       }
+
+       ereport(LOG,
+               (errmsg("plpgsql_convert_ident: ident=3D%s, isspace array=
=3D%s, isspace(0)=3D%c", s, isspacebuf, isspace(0)?'t':'f')));

        /* Outer loop over identifiers */
        while (*s)


and got the following line in the logfile:

  LOG:  plpgsql_convert_ident: ident=3D=C8, isspace array=3Dff, isspace(0)=
=3Df

The symbol after "input=3D" is that 0x0445 (D1 85 in utf-8) letter which
I mentioned above. I also double-checked this: got a stack trace (by
inserting NULL-pointer assignment instead of reporting "expected dot
between identifiers") and saw that the value passed to the
plpgsql_convert_ident points to the following bytes: D1 85 00.

So, isspace returns false on all three of them.

I have also run the same query on another cluster (and the same
binaries), initialized with a single-byte encoding: ru_RU.KOI8-R,
where everything works. isspace returns the same results:

LOG:  plpgsql_convert_ident: ident=3D=C8, isspace array=3Df, isspace(0)=3Df
LOG:  plpgsql_convert_ident: ident=3Dtest_table, isspace array=3Dffffffffff=
, isspace(0)=3Df

Strange...

--=20
WBR, Victor V. Snezhko
E-mail: snezhko@indorsoft.ru

pgsql-bugs by date:

Previous
From: Christoph Zwerschke
Date:
Subject: Re: BUG #2642: Connection problems without IPv6
Next
From: Victor Snezhko
Date:
Subject: Re: BUG #1931: ILIKE and LIKE fails on Turkish locale