Re: Unicode UTF-8 table formatting for psql text output - Mailing list pgsql-hackers

From Roger Leigh
Subject Re: Unicode UTF-8 table formatting for psql text output
Date
Msg-id 20090930144111.GA4486@codelibre.net
Whole thread Raw
In response to Re: Unicode UTF-8 table formatting for psql text output  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Unicode UTF-8 table formatting for psql text output
List pgsql-hackers
On Tue, Sep 29, 2009 at 04:28:57PM -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > On Tue, 2009-09-29 at 12:01 -0400, Tom Lane wrote:
> >> The bigger question is exactly how we expect this stuff to interact with
> >> pg_regress' --no-locale switch.  We already do clear all these variables
> >> when --no-locale is specified.  I am wondering just what --locale is
> >> supposed to do, and whether selectively lobotomizing the LC stuff has
> >> any real use at all.
> 
> > We should do the LANG or LC_CTYPE thing only on the client,
> > unconditionally.  The --no-locale/--locale options should primarily
> > determine what the temporary server uses.
> 
> Well, that seems fairly reasonable, but it's going to require some
> refactoring of pg_regress.  The initialize_environment function
> determines what happens in both the client and the temp server.

Two possible approaches to fix the tests are as follows:

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index f2f9603..74cdaa2 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -711,8 +711,7 @@ initialize_environment(void)     * is actually called.)     */    unsetenv("LANGUAGE");
-    unsetenv("LC_ALL");
-    putenv("LC_MESSAGES=C");
+    putenv("LC_ALL=C");    /*     * Set multibyte as requested

Here we just force the locale to C.  This does have the disadvantage
that --no-locale is made redundant, and any tests which are dependent
upon locale (if any?) will be run in the C locale.

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index f2f9603..65fb49a 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -712,6 +712,7 @@ initialize_environment(void)     */    unsetenv("LANGUAGE");    unsetenv("LC_ALL");
+    putenv("LC_CTYPE=C");    putenv("LC_MESSAGES=C");    /*

Here we set LC_CTYPE to C in addition to LC_MESSAGES (and for much the
same reasons).  However, if you test on non-C locales to check for
issues with other locale codesets, those tests are all going to be
forced to use ASCII.  Is this a problem in practice?

From the POV of my patch, it's working as designed: if the locale
codeset is UTF-8 it's outputting UTF-8.  But, due to the way the
test machinery is looking at the output, this is breaking the tests.
I'm not sure what I can do with my patch to make it behave differently
that is both compatible with its intended use and not break the tests--
this is really just breaking an assumption in the testing code that
the test output will always be ASCII.

Forcing the LC_CTYPE to C will force ASCII output and work around this
problem with the tests.  Another approach would be to let psql know
it's being run in a test environment with a PG_TEST or some other
environment variable which we can check for and use to turn off UTF-8
output if set.  Would that be better?


Regards,
Roger

--  .''`.  Roger Leigh: :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/`. `'   Printing on
GNU/Linux?      http://gutenprint.sourceforge.net/  `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.
 


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: TODO item: Allow more complex user/database default GUC settings
Next
From: Andrew Dunstan
Date:
Subject: Re: Unicode UTF-8 table formatting for psql text output