Home > mailing lists

Re: BUG #1976: steps to reproduce BUG #1438: Non UTF-8 client encoding problem - Mailing list pgsql-bugs

From	Tom Lane
Subject	Re: BUG #1976: steps to reproduce BUG #1438: Non UTF-8 client encoding problem
Date	October 20, 2005 00:07:58
Msg-id	3472.1129777666@sss.pgh.pa.us Whole thread Raw
In response to	BUG #1976: steps to reproduce BUG #1438: Non UTF-8 client encoding problem ("Stanislav Sukholet" <ctac113@mail.ru>)
Responses	Re: BUG #1976: steps to reproduce BUG #1438: Non UTF-8 client encoding problem
List	pgsql-bugs

Tree view

Stanislav Sukholet <ctac@osib.so-cdu.ru> writes:
>> Can't reproduce this here. Â What locale settings are you using in the
>> database? Â (Particularly lc_ctype and lc_messages)

> mydb=> SHOW client_encoding ;
>  client_encoding
> -----------------
>  KOI8
> (1 Ð·Ð°Ð¿Ð¸ÑÑ)

> mydb=> show LC_CTYPE;
>   lc_ctype
> -------------
>  ru_RU.koi8r
> (1 Ð·Ð°Ð¿Ð¸ÑÑ)

> mydb=> show LC_MESSAGES;
>  lc_messages
> -------------
>  ru_RU.koi8r
> (1 Ð·Ð°Ð¿Ð¸ÑÑ)

> mydb=> CREATE TABLE a (b INTEGER PRIMARY KEY);
> ERROR:  ignoring unconvertible UTF-8 character 0xd3cf

OK, with that I can reproduce it in 7.4, but more recent releases
produce a bunch of "WARNING:  ignoring unconvertible UTF-8 character"
notices and then complete the operation successfully.

This is basically the same problem discussed in this thread:
http://archives.postgresql.org/pgsql-patches/2005-08/msg00037.php
namely that gettext() converts the translated error message to the
encoding implied by LC_CTYPE ... but the error reporting machinery
expects the string to be in the encoding specified for the database.

I have applied a minor tweak to the 7.4 branch to make it behave more
like the later releases, ie you get a WARNING not an ERROR.  However
this is certainly not really a solution --- the only reason the behavior
isn't worse is that the ru_RU message catalog doesn't try to translate
"ignoring unconvertible UTF-8 character" and so you don't get into the
recursive failure discussed in the above thread.

The bottom line is that this is one of several reasons why it's a bad
idea to use a database encoding that's incompatible with the underlying
locale settings.  I doubt that we'll really be able to fix that until
we replace all our dependence on the C library's locale facilities
... which is something that will probably happen someday, but don't
hold your breath waiting :-(

In short, if you want to use UTF8 database encoding, specify a
UTF8-based locale setting when you initdb.  Don't try to change
the database encoding via -E.

            regards, tom lane

pgsql-bugs by date:

From: Devrim GUNDUZ
Date: 19 October 2005, 16:43:32
Subject: Re: BUG #1970: Existing /etc/pam.d/postgresql clobbered by

From: Ivan
Date: 20 October 2005, 04:43:35
Subject: Re: union bug

Re: BUG #1976: steps to reproduce BUG #1438: Non UTF-8 client encoding problem - Mailing list pgsql-bugs

Previous

Next