Re: BUG #7493: Postmaster messages unreadable in a Windows console - Mailing list pgsql-hackers

From Noah Misch
Subject Re: BUG #7493: Postmaster messages unreadable in a Windows console
Date
Msg-id 20130212021045.GA7600@tornado.leadboat.com
Whole thread Raw
In response to Re: BUG #7493: Postmaster messages unreadable in a Windows console  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #7493: Postmaster messages unreadable in a Windows console  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, Feb 10, 2013 at 06:47:30PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > Following some actual testing, I see that we treat postgresql.conf values as
> > byte sequences; any reinterpretation as encoded text happens later.  Hence,
> > contrary to my earlier suspicion, your patch does not make that situation
> > worse.  The present situation is bad; among other things, current_setting() is
> > a vector for injecting invalid text data.  But unconditionally validating
> > postgresql.conf values in the platform encoding would not be an improvement.
> > Suppose you have a UTF-8 platform encoding and KOI8R databases.  You may wish
> > to put KOI8R strings in a GUC, say search_path.  That's possible today; if we
> > required that postgresql.conf conform to the platform encoding and no other,
> > it would become impossible.  This area warrants improvement, but doing so will
> > entail careful design.
> 
> The key problem, ISTM, is that it's not at all clear what encoding to
> expect the incoming data to be in.  I'm concerned about trying to fix
> that by assuming it's in some "platform encoding" --- for one thing,
> while that might be a well-defined concept on Windows, I don't believe
> it is anywhere else.

GetPlatformEncoding() imposes a sufficiently-portable definition.  I just
don't think that definition leads to a value that can be presumed desirable
and adequate for postgresql.conf.

> If we knew that postgresql.conf was stored in, say, UTF8, then it would
> probably be possible to perform encoding conversion to get string
> variables into the database encoding.  Perhaps we should allow some
> magic syntax to tell us the encoding of a config file?
> 
>     file_encoding = 'utf8'    # must precede any non-ASCII in the file
> 
> There would still be a lot of practical problems to solve, like what to
> do if we fail to convert some string into the database encoding.  But at
> least the problems would be somewhat well-defined.

Agreed.  That's a promising direction.

> While we're thinking about this, it'd be nice to fix our handling (or
> rather lack of handling) of encoding considerations for database names,
> user names, and passwords.  I could imagine adding some sort of encoding
> marker to connection request packets, which could fix the don't-know-
> the-encoding problem as far as incoming data is concerned.

That deserves a TODO entry under Wire Protocol Changes to avoid losing it.

> But how
> shall we deal with storing the strings in shared catalogs, which have to
> be readable from multiple databases possibly of different encodings?

I suppose we would pick an encoding sufficient for all values we intend to
support (UTF8?  MULE_INTERNAL?), then store the data in that encoding using
either bytea or a new type, say "omnitext".

Thanks,
nm



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]
Next
From: Tom Lane
Date:
Subject: Re: BUG #7493: Postmaster messages unreadable in a Windows console