BUG #2685: Wrong charset of server messages on client [PATCH] - Mailing list pgsql-bugs

From Sergiy Vyshnevetskiy
Subject BUG #2685: Wrong charset of server messages on client [PATCH]
Msg-id 200610101455.k9AEtTTd085210@wwwmaster.postgresql.org
Whole thread Raw
Responses Re: BUG #2685: Wrong charset of server messages on client [PATCH]
List pgsql-bugs
The following bug has been logged online:

Bug reference:      2685
Logged by:          Sergiy Vyshnevetskiy
Email address:      serg@vostok.net
PostgreSQL version: 8.1
Operating system:   FreeBSD-6 stable
Description:        Wrong charset of server messages on client [PATCH]


PostgreSQL backend uses gettext() to localize its messages. The charset of
localized messages is determined by LC_CTYPE by default.

Then the message is processed through sprintf-like mechanism (with database
data as possible arguments) and fed to send_message_to_frontend(), that
converts data from _database_charset_(!) to client charset.

If LC_CTYPE is not the same as (at least binary compatible to) database
charset, then client gets garbage characters in server messages. If database
charset is UTF-8, then cluster may recusively generate "invalid byte
sequence for encoding" errors till it fills up
errordata[ERRORDATA_STACK_SIZE], then it panics.


Convert server messages to database charset.


--- src/backend/utils/mb/mbutils.c.o0 Tue Oct 10 11:51:13 2006

+++ src/backend/utils/mb/mbutils.c  Tue Oct 10 11:49:22 2006

@@ -615,6 +615,7 @@

  DatabaseEncoding = &pg_enc2name_tbl[encoding];

  Assert(DatabaseEncoding->encoding == encoding);

 #ifdef USE_ICU





This, however, uncovers another bug: PostgreSQL dumps the messages into
stderr/syslog as-is, without converting database data from database charset
to charset from LC_MESSAGES. After this patch it will do so with message
text too. The fix should be trivial - set up a conversion from database
charset to server charset. I will post a patch for it later.


I used pg_enc2iananame_tbl instead of pg_enc2name_tbl, because gettext
doesn't accept many

Possible TODO:
Change PostgreSQL charset names to IANA-standard names.

pgsql-bugs by date:

From: "Milen A. Radev"
Subject: BUG #2684: Memory leak in libpq
From: Tom Lane
Subject: Re: BUG #2684: Memory leak in libpq