Thread: Locale by default?

Locale by default?

From
Peter Eisentraut
Date:
It occurred to me that a server with locale features that is started in
the C locale is going to behave the same as a server without locale
features.  The exception are a few extra memory moving operations.  (I
sincerely hope that all systems' libcs have optimized paths for the C
locale.)  So we could get rid of this --enable-locale switch altogether.
Given our international user base, this would be an appropriate step and
move the locale support out of the "cumbersome secondary feature"
compartment.  What do you think?

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: Locale by default?

From
Justin Clift
Date:
Hi Peter,

Any idea of how many "extra memory moving operations" that would be?

:-)

Regards and best wishes,

Justin Clift


Peter Eisentraut wrote:
> 
> It occurred to me that a server with locale features that is started in
> the C locale is going to behave the same as a server without locale
> features.  The exception are a few extra memory moving operations.  (I
> sincerely hope that all systems' libcs have optimized paths for the C
> locale.)  So we could get rid of this --enable-locale switch altogether.
> Given our international user base, this would be an appropriate step and
> move the locale support out of the "cumbersome secondary feature"
> compartment.  What do you think?
> 
> --
> Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."    - Indira Gandhi


Re: Locale by default?

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> (I sincerely hope that all systems' libcs have optimized paths for the C
> locale.)  So we could get rid of this --enable-locale switch
> altogether.

Some experimental evidence to support the claim that --enable-locale has
zero cost would be good before taking this step.

If any hotspots turn up, we could possibly do runtime checks:
if (locale_is_c())    strcmp()else    strcoll()
        regards, tom lane


Re: Locale by default?

From
Justin Clift
Date:
If it's of any assistance, I'm working with the Open Source Database
Benchmark guys (osdb.sourceforge.net) to get an AS3AP-based benchmark
for PostgreSQL 7.1.x+ up-and-running reliably.

It's working on my Mandrake Linux 8.0 system here, but I need the main
OSDB guy to get back from holidays to review and commit things to their
CVS.  ETA of around a week from right now.  :)

My point is, if we've got decent benchmarking software (and we can
actually freely use it), we can do real-world validation tests when
considering things like Peter's suggestion.

Sounds good to me.

Regards and best wishes,

Justin Clift


Tom Lane wrote:
> 
> Peter Eisentraut <peter_e@gmx.net> writes:
> > (I sincerely hope that all systems' libcs have optimized paths for the C
> > locale.)  So we could get rid of this --enable-locale switch
> > altogether.
> 
> Some experimental evidence to support the claim that --enable-locale has
> zero cost would be good before taking this step.
> 
> If any hotspots turn up, we could possibly do runtime checks:
> 
>         if (locale_is_c())
>                 strcmp()
>         else
>                 strcoll()
> 
>                         regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."    - Indira Gandhi


Re: Locale by default?

From
Tatsuo Ishii
Date:
> It occurred to me that a server with locale features that is started in
> the C locale is going to behave the same as a server without locale
> features.  The exception are a few extra memory moving operations.  (I
> sincerely hope that all systems' libcs have optimized paths for the C
> locale.)  So we could get rid of this --enable-locale switch altogether.
> Given our international user base, this would be an appropriate step and
> move the locale support out of the "cumbersome secondary feature"
> compartment.  What do you think?

I wouldn't object it if there is a way to disable locale support.  We
in Japan are always troubled by borken Japanese locales on some
systems.  I'm afraid to hear more complains if there is no way to
disable the locale support.  Moreover, collation of locales for
Japanese are broken on all platforms as far as I know. I'm not sure
about other Asian languages though.
--
Tatsuo Ishii


Re: Locale by default?

From
Peter Eisentraut
Date:
Tatsuo Ishii writes:

> I wouldn't object it if there is a way to disable locale support.

export LC_ALL=C

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: Locale by default?

From
Hiroshi Inoue
Date:
Peter Eisentraut wrote:
> 
> Tatsuo Ishii writes:
> 
> > I wouldn't object it if there is a way to disable locale support.
> 
> export LC_ALL=C

I would object even if there's such a way.
People in Japan have hardly noticed that the strange
behabior is due to the strange locale(LC_COLLATE).

regards,
Hiroshi Inoue


Re: Locale by default?

From
Tatsuo Ishii
Date:
> Tatsuo Ishii writes:
> 
> > I wouldn't object it if there is a way to disable locale support.
> 
> export LC_ALL=C

It's not a solution. My point is people should not be troubled by the
useless feature (at least for Japanese) even if they set their locale
other than C.
--
Tatsuo Ishii



Re: Locale by default?

From
Peter Eisentraut
Date:
Hiroshi Inoue writes:

> I would object even if there's such a way.
> People in Japan have hardly noticed that the strange
> behabior is due to the strange locale(LC_COLLATE).

I don't think we should design our systems in a way that inconveniences
many users because some users are using broken operating systems.  If
Japanese users have not realized yet that the locale support they are
using is broken, then it's not the right solution to disable it in
PostgreSQL by default.  In that case the problem would just persist for
the system as a whole.  The right solution is for them to turn off locale
support in their operating system, the way it's supposed to be done.

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: Locale by default?

From
Peter Eisentraut
Date:
Tatsuo Ishii writes:

> > Tatsuo Ishii writes:
> >
> > > I wouldn't object it if there is a way to disable locale support.
> >
> > export LC_ALL=C
>
> It's not a solution. My point is people should not be troubled by the
> useless feature (at least for Japanese) even if they set their locale
> other than C.

If people set their locale to something other than C they have evidently
judged that locale is not useless.  Why would they set it otherwise?  I
don't think hiding away a feature because you think it's useless is a good
idea.  If people don't like it, allow them to turn it off.  If there are
potential problems related to the feature, document them.

Face it, everything has locale support these day.  PostgreSQL is one of
the few packages that even has it as an option to turn it off.  Users of
binary packages of PostgreSQL are all invariably faced with locale
features.  So it's not like sudden unasked-for locale support is going to
be a major shock.

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



RE: Locale by default?

From
"Zeugswetter Andreas SB SD"
Date:
> > I would object even if there's such a way.
> > People in Japan have hardly noticed that the strange
> > behabior is due to the strange locale(LC_COLLATE).
> 
> I don't think we should design our systems in a way that
inconveniences
> many users because some users are using broken operating systems.  If
> Japanese users have not realized yet that the locale support they are
> using is broken, then it's not the right solution to disable it in
> PostgreSQL by default.  In that case the problem would just persist
for
> the system as a whole.  The right solution is for them to turn off
locale
> support in their operating system, the way it's supposed to be done.

I do not agree with your above statement, I would also want a way to
turn 
it off in PostreSQL alone and leave the OS and rest as is (without a
need 
to worry about). (Our admins use C, En_US, De_DE, De_AT here, but no
locale 
support in the db)

Imho we also need to keep in mind that other DB's don't create locale
aware
char columns by default eighter (they have nchar or some other extended 
create table syntax).

Andreas


RE: Locale by default?

From
"Zeugswetter Andreas SB SD"
Date:
> Face it, everything has locale support these day.  PostgreSQL is one
of
> the few packages that even has it as an option to turn it off.  Users
of
> binary packages of PostgreSQL are all invariably faced with locale
> features.  So it's not like sudden unasked-for locale support is going
to
> be a major shock.

What makes you so opposed to a GUC for disabling locale support ?

Andreas


RE: Locale by default?

From
Peter Eisentraut
Date:
Zeugswetter Andreas SB SD writes:

> What makes you so opposed to a GUC for disabling locale support ?

Nothing.  It may in fact be the best solution.

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: Locale by default?

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Zeugswetter Andreas SB SD writes:
>> What makes you so opposed to a GUC for disabling locale support ?

> Nothing.  It may in fact be the best solution.

As long as locale has to be an initdb-time setting, a GUC var won't
help much.
        regards, tom lane


Re: Locale by default?

From
Hiroshi Inoue
Date:
Peter Eisentraut wrote:
> 
> Hiroshi Inoue writes:
> 
> > I would object even if there's such a way.
> > People in Japan have hardly noticed that the strange
> > behabior is due to the strange locale(LC_COLLATE).
> 
> I don't think we should design our systems in a way that inconveniences
> many users because some users are using broken operating systems. If
> Japanese users have not realized yet that the locale support they are
> using is broken,

I don't know if the locale support is broken in Japan.
I can't think of any reasonable Japanese collating sequence
at once(maybe for ever).  I don't think people should know 
about the existence of collating sequences in Japan.

> then it's not the right solution to disable it in
> PostgreSQL by default.  In that case the problem would just persist for
> the system as a whole.  The right solution is for them to turn off locale
> support in their operating system, the way it's supposed to be done.

DBMS should be independent from the OS settings as far as
possible especially in the handling of data. Currently we
could hardly judge if we are running on a locale or not from
the dbms POV and it doesn't seem a dbms kind of thing in the
first place. I'm a dbms guy not an OS guy and really dislike
the requirement for users to export LC_ALL=C. 

regards,
Hiroshi Inoue


Re: Locale by default?

From
Tatsuo Ishii
Date:
> If people set their locale to something other than C they have evidently
> judged that locale is not useless.  Why would they set it otherwise?

As Hiroshi pointed out, the broken thing is the LC_COLLATE, other
things in the local are working.

> I
> don't think hiding away a feature because you think it's useless is a good
> idea.  If people don't like it, allow them to turn it off.  If there are
> potential problems related to the feature, document them.

I don't object the idea letting users turn it off. I said we need a
way to turn it off in the configuration/compile time.

> Face it, everything has locale support these day.  PostgreSQL is one of
> the few packages that even has it as an option to turn it off.  Users of
> binary packages of PostgreSQL are all invariably faced with locale
> features.  So it's not like sudden unasked-for locale support is going to
> be a major shock.

I would say it's a misunderstanding that the locale (more precisely
LC_COLLATE) is usefull for *any* Language/encodings.
--
Tatsuo Ishii


Re: Locale by default?

From
Tatsuo Ishii
Date:
> Hiroshi Inoue writes:
> 
> > I would object even if there's such a way.
> > People in Japan have hardly noticed that the strange
> > behabior is due to the strange locale(LC_COLLATE).
> 
> I don't think we should design our systems in a way that inconveniences
> many users because some users are using broken operating systems.  If

I don't understand why you object the idea giving PostgreSQL the
ability to turn off the locale support in configuration/compile
time. In that way, there's no inconveniences for "many users".
--
Tatsuo Ishii


RE: Locale by default?

From
"Zeugswetter Andreas SB SD"
Date:
> DBMS should be independent from the OS settings as far as
> possible especially in the handling of data. Currently we
> could hardly judge if we are running on a locale or not from
> the dbms POV and it doesn't seem a dbms kind of thing in the
> first place. I'm a dbms guy not an OS guy and really dislike
> the requirement for users to export LC_ALL=C. 

Yup, I can second that.
Also note, that currently a locale aware index might get corrupted if 
you do an OS upgrade (that changes the collation: e.g. add the ?
symbol). 
I sortof think, that pg locale support is not yet up to prime time. 
If we had something that conformed to the Spec (per column lang and 
collation), then yes I would make it mainstream, but as is ?

Andreas


Re: Locale by default?

From
Peter Eisentraut
Date:
Tatsuo Ishii writes:

> I don't understand why you object the idea giving PostgreSQL the
> ability to turn off the locale support in configuration/compile
> time. In that way, there's no inconveniences for "many users".

I don't mind at all the ability to turn it off.  My point is that the
compile time is the wrong time to do it.  Many users use binary
packages these days, many more users would like to use binary packages.
But the creators of these packages have to make configuration choices to
satisfy all of their users.  So they turn on the locale support, because
that way if you don't want it you can turn if off.  The other way around
doesn't work.

The more appropriate way to handle this situation is to make it a runtime
option.  I agree that the LC_ALL/LC_COLLATE/LANG lattice is confusing and
fragile.  But there can be other ways, e.g.,

initdb --locale=en_US
initdb --locale-collate=C --locale-ctype=en_US
initdb # defaults to --locale=C

or in postgresql.conf

locale=C
locale_numeric=en_US
etc.

or

SHOW locale;
SHOW locale_numeric;

That way you always know exactly what situation you're in.  I think this
was Hiroshi's main concern, the reliance on export LC_ALL, and I agree
that this is bad.

You say locale in Japan works, except for LC_COLLATE.  This concern would
be satisfied by the above approach.

Comments?

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



RE: Locale by default?

From
"Zeugswetter Andreas SB SD"
Date:
> > I don't understand why you object the idea giving PostgreSQL the
> > ability to turn off the locale support in configuration/compile
> > time. In that way, there's no inconveniences for "many users".
> 
> I don't mind at all the ability to turn it off.  My point is that the
> compile time is the wrong time to do it.  Many users use binary
> packages these days, many more users would like to use binary
packages.
> But the creators of these packages have to make configuration choices
to
> satisfy all of their users.  So they turn on the locale support,
because
> that way if you don't want it you can turn if off.  The other way
around
> doesn't work.

Yup, imho we all understood that and the only (to be validated) concern
is
performance.

> 
> The more appropriate way to handle this situation is to make it a
runtime
> option.  I agree that the LC_ALL/LC_COLLATE/LANG lattice is confusing
and
> fragile.  But there can be other ways, e.g.,

Yes, that was the (or at least my) main concern.
> initdb --locale=en_US
> initdb --locale-collate=C --locale-ctype=en_US
> initdb # defaults to --locale=C
> 
> or in postgresql.conf
> 
> locale=C
> locale_numeric=en_US
> etc.
> 
> or
> 
> SHOW locale;
> SHOW locale_numeric;
> 
> That way you always know exactly what situation you're in.  I think
this
> was Hiroshi's main concern, the reliance on export LC_ALL, and I agree
> that this is bad.
> 
> You say locale in Japan works, except for LC_COLLATE.  This concern
would
> be satisfied by the above approach.
> 
> Comments?

I think that's it :-)

Andreas


Re: Locale by default?

From
Thomas Lockhart
Date:
> Face it, everything has locale support these day.  PostgreSQL is one of
> the few packages that even has it as an option to turn it off.  Users of
> binary packages of PostgreSQL are all invariably faced with locale
> features.  So it's not like sudden unasked-for locale support is going to
> be a major shock.

Certainly everyone would agree that "locale support" is desirable.
Tatsuo has been one of the earliest and most vocal participants in
design speculations on how to support the SQL9x concept of character
sets and collations, which for purposes of long range planning seem to
be synonymous with "locale" afaict.

The question is whether and how to continue to extend the use of
OS-supplied features to accomplish this support, with the severe
restrictions (from an SQL9x pov) which come with the OS implementation.
                  - Thomas


Re: Locale by default?

From
Tatsuo Ishii
Date:
> > Face it, everything has locale support these day.  PostgreSQL is one of
> > the few packages that even has it as an option to turn it off.  Users of
> > binary packages of PostgreSQL are all invariably faced with locale
> > features.  So it's not like sudden unasked-for locale support is going to
> > be a major shock.
> 
> Certainly everyone would agree that "locale support" is desirable.

No. At least for Japanese, LC_COLLATE is not usefull at all.  Let me
explain why. Japanese has three kind of characters: The first one is
called "Kanji", Scond one is "Hiragana". The last one is "Katakana".
Many pary of data stored in database are usually written in Kanji. The
problem is, Kanji is an ideogram and there is no algorithm to guess
the correct pronunciation for Kanji letters. The only solution for
this is add a separate column having Hiragana or Katakana letters
which represents the pronunciation for the Kanji column (Hiragana and
Kataka are phonogram). Sorting is also done by the additional
Hiragana/Katakan column, that can be done according to the code point
of Hiragana/Katakana. So no locale support (LC_COLLATE) is neccessary
at all for Japanese.

> Tatsuo has been one of the earliest and most vocal participants in
> design speculations on how to support the SQL9x concept of character
> sets and collations, which for purposes of long range planning seem to
> be synonymous with "locale" afaict.
> 
> The question is whether and how to continue to extend the use of
> OS-supplied features to accomplish this support, with the severe
> restrictions (from an SQL9x pov) which come with the OS implementation.

In my opinion, with the SQL99 collate support, the current locale
support should be vanished.
--
Tatsuo Ishii


Re: Locale by default?

From
Thomas Lockhart
Date:
> > Certainly everyone would agree that "locale support" is desirable.
> No...

That is why I put "locale support" in double-quotes. Sorry that I was
cryptic, but I do understand your concern that OS-specific locale
support is suspect for some languages.

> In my opinion, with the SQL99 collate support, the current locale
> support should be vanished.

Right. That is my opinion too.
                  - Thomas