Re: [bug fix] multibyte messages are displayed incorrectly on the client - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: [bug fix] multibyte messages are displayed incorrectly on the client |
Date | |
Msg-id | 20131230030207.GA1551279@tornado.leadboat.com Whole thread Raw |
In response to | Re: [bug fix] multibyte messages are displayed incorrectly on the client ("MauMau" <maumau307@gmail.com>) |
Responses |
Re: [bug fix] multibyte messages are displayed incorrectly on the client
|
List | pgsql-hackers |
On Sun, Dec 22, 2013 at 07:51:55PM +0900, MauMau wrote: > From: "Noah Misch" <noah@leadboat.com> > >Better to attack that directly. Arrange to apply any > >client_encoding named in > >the startup packet earlier, before authentication. This relates > >to the TODO > >item "Let the client indicate character encoding of database names, user > >names, and passwords". (I expect such an endeavor to be tricky.) > > Unfortunately, character set conversion is not possible until the > database session is established, since it requires system catalog > access. Please the comment in src/backend/utils/mb/mbutils.c: > > * During backend startup we can't set client encoding because we (a) > * can't look up the conversion functions, and (b) may not know the database > * encoding yet either. So SetClientEncoding() just accepts anything and > * remembers it for InitializeClientEncoding() to apply later. Yes, changing that is the tricky part. > I guess that's why Tom-san suggested the same solution as my patch > (as a compromise) in the below thread, which is also a TODO item: > > Re: encoding of PostgreSQL messages > http://www.postgresql.org/message-id/19896.1234107496@sss.pgh.pa.us That's fair for the necessarily-earliest messages, like 'invalid value for parameter "client_encoding"' and messages pertaining to the physical structure of the startup packet. The client's encoding expectation is unknowable. An error that mentions "client_encoding" will hopefully put users on the right track regardless of how we translate and encode the surrounding words. The other affected messages are quite technical, making a casual user unlikely to fix or even see them. Not so for authentication messages, so I'm wary of forcing use of ASCII that late in the handshake. Note that choosing to use ASCII need not imply wholly declining to translate. If the build uses GNU libiconv, gettext can emit ASCII approximations for translations that conform to a Latin-derived alphabet, falling back to no translation where the alphabet differs too much. pg_perm_setlocale(LC_CTYPE, "C") requests such behavior. (The inferior iconv //TRANSLIT implementation of GNU libc will convert non-ASCII characters to question marks, though.) > From: "Alvaro Herrera" <alvherre@2ndquadrant.com> > >The problem is that if there's an encoding mismatch, the message might > >be impossible to figure out. If the message is in english, at least it > >can be searched for in the web, or something -- the user might even find > >a page in which the english error string appears, with a native language > >explanation. > > I feel like this, too. Being readable in English is better than > being unrecognizable. I agree that English consistently beats mojibake. I question whether that makes up for the loss of translation when encodings do happen to match, particularly for non-technical errors like a mistyped password. The everything-UTF8 scenario appears often, perhaps explaining infrequent complaints about the status quo. If 90% of translated message users have client_encoding != server_encoding, then +1 for your patch's strategy. If the figure is only 60%, I'd vote for holding out for a more-extensive fix that allows us to encoding-convert localized authentication failure messages. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: