Thread: Rough sketch for locale by default

Rough sketch for locale by default

From
Peter Eisentraut
Date:
I've mentioned a while ago that I wanted to make the --enable-locale
switch the default (and remove the switch), and make the choice of
locale-awareness a run-time choice.  Here is how that might work. I've
already explained how I plan to get around the performance problems, so
this will just focus on the user interface.

We currently have two kinds of locale categories:  Those that must be
fixed at initdb-time and those that may be changed at run-time.

I suggest that initdb always defaults its locales to C and that we
provide command line options to set a different locale.  E.g.,

initdb --lc-collate=en_US

This makes the change transparent for those who like the C locale. It is
also much clearer than figuring out which of LANG, LC_COLLATE, LC_ALL will
get in your way.

Personally, I also find it better to separate the locale settings in your
login account meant for interactive use from those meant for PostgreSQL
servers.  In other words, if I'm the "postgres" account and administering
a bunch of databases I'd still like to set LC_ALL=de_DE so all the shell
commands print their things formatted right, and I don't want to change
this every time I start a server from within that account.

In particular, I'd like the following set of options:

--lc-collate
--lc-ctype
--locale  (allows specifying all in one, but may be overridden by specific options)

It might actually work to say

initdb --locale=''

to force inherting the settings from the environment.

In the post-initdb stage, we'd add a bunch of GUC variables, such as

lc_numeric
lc_monetary
lc_time
locale

These all default to "C".  For a start we'd make them fixed for the
life-time of the postmaster, but we could evaluate other options later.

This again makes this change hidden for users that didn't use locale
support.  Also, it prevents accidentally changing the locale when you
(or someone else) fiddle with your environment variables.

Note that you get the same kind of command line options as in initdb:
--lc-numeric, --locale, etc.  You can also run SHOW lc_numeric to see
what's going on.

Comments?

-- 
Peter Eisentraut   peter_e@gmx.net



Re: Rough sketch for locale by default

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> [ good stuff snipped ]

> ... Also, it prevents accidentally changing the locale when you
> (or someone else) fiddle with your environment variables.

If I follow this correctly, the behavior would be that PG would not pay
attention to *any* LC_xxx environment variables?  Although I agree with
that principle in the abstract, it bothers me that PG will be out of
step with every single other locale-using program in the Unix world.
We ought to think twice about whether that's really a good idea.

> Note that you get the same kind of command line options as in initdb:
> --lc-numeric, --locale, etc.  You can also run SHOW lc_numeric to see
> what's going on.

Probably you thought of this already: please also support SHOW for the
initdb-time variables (lc_collate, etc), so that one can find out the
active locale settings without having to resort to
contrib/pg_controldata.
        regards, tom lane


Re: Rough sketch for locale by default

From
Hannu Krosing
Date:
On Wed, 2002-03-27 at 19:26, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > [ good stuff snipped ]
> 
> > ... Also, it prevents accidentally changing the locale when you
> > (or someone else) fiddle with your environment variables.
> 
> If I follow this correctly, the behavior would be that PG would not pay
> attention to *any* LC_xxx environment variables?  Although I agree with
> that principle in the abstract, it bothers me that PG will be out of
> step with every single other locale-using program in the Unix world.

IIRC oracle uses NLS_LANG and not any LC_* (even on unix ;)

it is set to smth like NLS_LANG=ESTONIAN_ESTONIA.WE8ISO8859P15


> We ought to think twice about whether that's really a good idea.
> 
> > Note that you get the same kind of command line options as in initdb:
> > --lc-numeric, --locale, etc.  You can also run SHOW lc_numeric to see
> > what's going on.
> 
> Probably you thought of this already: please also support SHOW for the
> initdb-time variables (lc_collate, etc), so that one can find out the
> active locale settings without having to resort to
> contrib/pg_controldata.

------------
Hannu




Re: Rough sketch for locale by default

From
Hannu Krosing
Date:
On Wed, 2002-03-27 at 19:05, Peter Eisentraut wrote:
> I've mentioned a while ago that I wanted to make the --enable-locale
> switch the default (and remove the switch), and make the choice of
> locale-awareness a run-time choice.  Here is how that might work. I've
> already explained how I plan to get around the performance problems, so
> this will just focus on the user interface.
> 
> We currently have two kinds of locale categories:  Those that must be
> fixed at initdb-time and those that may be changed at run-time.

As a more radical idea we should get rid of those which are fixed at
initdb time (except databases storage charset) and do proper NCHAR types
for anything not in C locale.

-----------
Hannu




Re: Rough sketch for locale by default

From
Peter Eisentraut
Date:
Tom Lane writes:

> If I follow this correctly, the behavior would be that PG would not pay
> attention to *any* LC_xxx environment variables?  Although I agree with
> that principle in the abstract, it bothers me that PG will be out of
> step with every single other locale-using program in the Unix world.

During earlier discussions people had objected to enabling locale support
by default on the grounds that it is very hard to follow which locale is
getting activated when.  Especially from Japan I heard that a lot of
people have some locale settings in their environment, but that most
locales are unsuitable ("broken") for use in the PostgreSQL server.  So
this approach would keep the behavior backward compatible with the
--disable-locale case.

Here's a possible compromise for the postmaster:

We let initdb figure out what locales the user wants and then not only
initialize pg_control appropriately, but also write the run-time
changeable categories into the postgresql.conf file.  That way, the
postmaster executable could still consult the LC_* variables, but in the
common case it would just be overridden when the postgresql.conf file is
read.

This way we also hide the details of what locale category gets what
treatment from users that only want one locale for all categories and
don't want to change it.  Futhermore it all but eliminates the problem I'm
concerned about that the locale may accidentally be changed when the
postmaster is restarted.

How does initdb figure out what locale is wanted?  I agree it makes sense
to use the setting in the environment, because in many cases the database
will want to use the same locale as everything else on the system.  We
could provide a flag --no-locale, which sets all locale categories to "C",
as a clear and simple way to turn this off.

-- 
Peter Eisentraut   peter_e@gmx.net