Re: BUG #7493: Postmaster messages unreadable in a Windows console - Mailing list pgsql-hackers

From Noah Misch
Subject Re: BUG #7493: Postmaster messages unreadable in a Windows console
Date
Msg-id 20130210210259.GA7401@tornado.leadboat.com
Whole thread Raw
In response to Re: BUG #7493: Postmaster messages unreadable in a Windows console  (Alexander Law <exclusion@gmail.com>)
Responses Re: BUG #7493: Postmaster messages unreadable in a Windows console  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: BUG #7493: Postmaster messages unreadable in a Windows console  (Alexander Law <exclusion@gmail.com>)
List pgsql-hackers
On Wed, Jan 30, 2013 at 10:00:01AM +0400, Alexander Law wrote:
> 30.01.2013 05:51, Noah Misch wrote:
>> On Tue, Jan 29, 2013 at 09:54:04AM -0500, Tom Lane wrote:
>>> Alexander Law <exclusion@gmail.com> writes:
>>>> Please look at the following l10n bug:
>>>> http://www.postgresql.org/message-id/502A26F1.6010109@gmail.com
>>>> and the proposed patch.

>> Even then, I wouldn't be surprised to find problematic consequences beyond
>> error display.  What if all the databases are EUC_JP, the platform encoding is
>> KOI8, and some postgresql.conf settings contain EUC_JP characters?  Does the
>> postmaster not rely on its use of SQL_ASCII to allow those values?
>>
>> I would look at fixing this by making the error output machinery smarter in
>> this area before changing the postmaster's notion of server_encoding.

With your proposed change, the problem will resurface in an actual SQL_ASCII
database.  At the problem's root is write_console()'s assumption that messages
are in the database encoding.  pg_bind_textdomain_codeset() tries to make that
so, but it only works for encodings with a pg_enc2gettext_tbl entry.  That
excludes SQL_ASCII, MULE_INTERNAL, and others.  write_console() needs to
behave differently in such cases.

> Maybe I still miss something but I thought that  
> postinit.c/CheckMyDatabase will switch encoding of a messages by  
> pg_bind_textdomain_codeset to EUC_JP so there will be no issues with it.  
> But until then KOI8 should be used.
> Regarding postgresql.conf, as it has no explicit encoding specification,  
> it should be interpreted as having the platform encoding. So in your  
> example it should contain KOI8, not EUC_JP characters.

Following some actual testing, I see that we treat postgresql.conf values as
byte sequences; any reinterpretation as encoded text happens later.  Hence,
contrary to my earlier suspicion, your patch does not make that situation
worse.  The present situation is bad; among other things, current_setting() is
a vector for injecting invalid text data.  But unconditionally validating
postgresql.conf values in the platform encoding would not be an improvement.
Suppose you have a UTF-8 platform encoding and KOI8R databases.  You may wish
to put KOI8R strings in a GUC, say search_path.  That's possible today; if we
required that postgresql.conf conform to the platform encoding and no other,
it would become impossible.  This area warrants improvement, but doing so will
entail careful design.

Thanks,
nm



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: backup.sgml patch that adds information on custom format backups
Next
From: Phil Sorber
Date:
Subject: Re: [PATCH] pg_isready (was: [WIP] pg_ping utility)