Re: garbage in psql -l - Mailing list pgsql-hackers

From Roger Leigh
Subject Re: garbage in psql -l
Date
Msg-id 20091125001431.GD14791@codelibre.net
Whole thread Raw
In response to Re: garbage in psql -l  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: garbage in psql -l
List pgsql-hackers
On Tue, Nov 24, 2009 at 05:43:00PM -0500, Tom Lane wrote:
> Roger Leigh <rleigh@codelibre.net> writes:
> > On Tue, Nov 24, 2009 at 02:19:27PM -0500, Tom Lane wrote:
> >> I wonder whether the most prudent solution wouldn't be to prevent
> >> default use of linestyle=unicode if ~/.psqlrc hasn't been read.
>
> > This problem is caused when there's a mismatch between the
> > client encoding and the user's locale.  We can detect this at
> > runtime and fall back to ASCII if we know they are incompatible.
>
> Well, no, that is *one* of the possible failure modes.  I've hit others
> already in the short time that the patch has been installed.  The one
> that's bit me most is that the locale environment seen by psql doesn't
> necessarily match what my xterm at the other end of an ssh connection
> is prepared to do --- which is something that psql simply doesn't have
> a way to detect.  Again, this is something that's never mattered before
> unless one was really pushing non-ASCII data around, and even then it
> was often possible to be sloppy.

Sure, but this type of misconfiguration is entirely outside the
purview of psql.  Everything else on the system, from man(1) to gcc
emacs and vi will be sending UTF-8 codes to your terminal for any
non-ASCII character they display.  While psql using UTF-8 for its
tables is certainly exposing the problem, in reality it was already
broken, and it's not psql's "fault" for using functionality the
system said was available.  It would equally break if you stored
non-ASCII characters in your UTF-8-encoded database and then ran
a SELECT query, since UTF-8 codes would again be sent to the
terminal.

For the specific case here, where the locale is KOI8-R, we can
determine at runtime that this isn't a UTF-8 locale and stay
using ASCII.  I'll be happy to send a patch in to correct this
specific case.

At least on GNU/Linux, checking nl_langinfo(CODESET) is considered
definitive for testing which character set is available, and it's
the standard SUS/POSIX interface for querying the locale.

> I'd be more excited about finding a way to use linestyle=unicode by
> default if it had anything beyond cosmetic benefits.  But it doesn't,
> and it's hard to justify ratcheting up the requirements for users to get
> their configurations exactly straight when that's all they'll get for it.

Bar the lack of nl_langinfo checking, once this is added we will go
out of our way to make sure that the system is capable of handling
UTF-8.  This is, IMHO, the limit of how far i/any/ tool should go to
handle things.  Worrying about misconfigured terminals, something
which is entirely the user's responsiblility, is I think a step too
far--going down this road means you'll be artificially limited to
ASCII, and the whole point of using nl_langinfo is to allow sensible
autoconfiguation, which almost all programs do nowadays.  I don't
think it makes sense to "penalise" the majority of users with
correctly-configured systems because a small minority have a
misconfigured terminal input encoding.  It is 2009, and all
contemporary systems support Unicode, and for the majority it is the
default.

Every one of the GNU utilities, plus most other free software,
localises itself using gettext, which in a UTF-8 locale, even
English locales, will transparently recode its output into the
locale codeset.  This hasn't resulted in major problems for
people using these tools; it's been like this way for years now.


Regards,
Roger

--  .''`.  Roger Leigh: :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/`. `'   Printing on
GNU/Linux?      http://gutenprint.sourceforge.net/  `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail. 

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: garbage in psql -l
Next
From: Hitoshi Harada
Date:
Subject: Re: Syntax conflicts in frame clause