Corruption of multibyte identifiers on UTF-8 locale - Mailing list pgsql-bugs

From Victor Snezhko
Subject Corruption of multibyte identifiers on UTF-8 locale
Date
Msg-id u4puynao7.fsf@indorsoft.ru
Whole thread Raw
Responses Re: Corruption of multibyte identifiers on UTF-8 locale
Re: Corruption of multibyte identifiers on UTF-8 locale
List pgsql-bugs
Hello,=20

Looks like we have more serious problem with multibyte identifiers.
When I run the following sequence of queries:

CREATE OR REPLACE FUNCTION CreateOrAlterTable()
RETURNS int
AS $$
BEGIN
  if not EXISTS(SELECT relname FROM pg_class WHERE relname ILIKE '=D41' AND=
 relkind =3D 'r') then
    CREATE TABLE =D41 (
           =CB1 int NOT NULL,
           PRIMARY KEY (=CB1)
    );
  end if;
  return 0;
END;
$$ LANGUAGE plpgsql;

SELECT CreateOrAlterTable();

CREATE OR REPLACE FUNCTION CreateOrAlterTable()
RETURNS int
AS $$
BEGIN
  if not EXISTS(SELECT relname FROM pg_class WHERE relname ILIKE '=D42' AND=
 relkind =3D 'r') then
    CREATE TABLE =D42 (
           =CB2 int NOT NULL,
           PRIMARY KEY (=CB2)
    );
  end if;
  return 0;
END;
$$ LANGUAGE plpgsql;

and then try to create the second table:

  SELECT CreateOrAlterTable();

, this gives me the following error (on HEAD as well as patched 8.1.4):

ERROR:  invalid byte sequence for encoding "UTF8": 0xf18231
HINT:  This error can also happen if the byte sequence does not match the e=
ncoding expected by the server, which is controlled by "client_encoding".
CONTEXT:  SQL statement "SELECT not EXISTS(SELECT relname FROM pg_class WHE=
RE relname ILIKE '?1' AND relkind =3D 'r')"
PL/pgSQL function "createoraltertable" line 2 at if

correct utf-8 byte sequence is 0xd18231, so it looks like we call
tolower() somewhere on parts of multibyte characters, and it does the
same as isspace() - it interprets it's argument as wide character, and
converts it.

simple create tables work, as well as create tables which are called
inside a procedure without "IF EXISTS" check.

So, we either don't support utf-8 on BSDs (BTW, this needs to be
checked on less popular BSD flavors) for now, or we need to fix this
somehow. E.g., by calling only wide-character checks, which will
complicate things...

--=20
WBR, Victor V. Snezhko
E-mail: snezhko@indorsoft.ru

pgsql-bugs by date:

Previous
From: Victor Snezhko
Date:
Subject: Re: BUG #1931: ILIKE and LIKE fails on Turkish locale
Next
From: Victor Snezhko
Date:
Subject: Re: Corruption of multibyte identifiers on UTF-8 locale