Thread: locale support

locale support

From
Jodi Kanter
Date:
Just a simple question regarding which locale setting applies to which type of sorting. I have two machines, production and development. The development machine has LC_COLLATE and LC_CTYPE set to C and the other has them both set to en_US.
We originally changed the development machine from en_US to C because we were having a problem with spaces being sorted in correctly. That has been fixed on the development box, but I now noticed it is sorting lower case letters after all capital letters. I would prefer that case is ignored when alphabetical sorts are completed. This type of sort is working correctly on the machine where both locales are set to en_US.
Does this mean that one of the above mentioned locales needs to be set back to en_US? I would like to have case ignored but I want to be careful not to mess up the sorting of spaces, which we have already fixed. I hate to have to test on my own and then have to reinitialize more than once! Please advise if you can.
Thanks
Jodi
--

_______________________________
Jodi L Kanter
BioInformatics Database Administrator
University of Virginia
(434) 924-2846
jkanter@virginia.edu


 

 

 

Re: locale support

From
Peter Eisentraut
Date:
Jodi Kanter writes:

> We originally changed the development machine from en_US to C because we
> were having a problem with spaces being sorted in correctly. That has
> been fixed on the development box, but I now noticed it is sorting lower
> case letters after all capital letters. I would prefer that case is
> ignored when alphabetical sorts are completed. This type of sort is
> working correctly on the machine where both locales are set to en_US.

"C" gives you byte sort order (which happens to come out A..Za..z), en_US
(or any other "real" locale) gives you a more natural order that matches
what a typical dictionary would use.  If you have very particular
requirements, you can try to create your own locales.  Most modern
operating systems have support for that.

--
Peter Eisentraut   peter_e@gmx.net


Re: locale support

From
Bruno Wolff III
Date:
On Wed, May 07, 2003 at 15:34:52 -0400,
  Jodi Kanter <jkanter@virginia.edu> wrote:
> Just a simple question regarding which locale setting applies to which
> type of sorting. I have two machines, production and development. The
> development machine has LC_COLLATE and LC_CTYPE set to C and the other
> has them both set to en_US.
> We originally changed the development machine from en_US to C because we
> were having a problem with spaces being sorted in correctly. That has
> been fixed on the development box, but I now noticed it is sorting lower
> case letters after all capital letters. I would prefer that case is
> ignored when alphabetical sorts are completed. This type of sort is
> working correctly on the machine where both locales are set to en_US.
> Does this mean that one of the above mentioned locales needs to be set
> back to en_US? I would like to have case ignored but I want to be
> careful not to mess up the sorting of spaces, which we have already
> fixed. I hate to have to test on my own and then have to reinitialize
> more than once! Please advise if you can.

You could order by the lower function and still keep the C locale.
If you need an index, you can use a functional index.


Re: locale support

From
Oliver Elphick
Date:
On Wed, 2003-05-07 at 20:34, Jodi Kanter wrote:
> Just a simple question regarding which locale setting applies to which
> type of sorting. I have two machines, production and development. The
> development machine has LC_COLLATE and LC_CTYPE set to C and the other
> has them both set to en_US.
> We originally changed the development machine from en_US to C because
> we were having a problem with spaces being sorted in correctly. That
> has been fixed on the development box, but I now noticed it is sorting
> lower case letters after all capital letters. I would prefer that case
> is ignored when alphabetical sorts are completed. This type of sort is
> working correctly on the machine where both locales are set to en_US.
> Does this mean that one of the above mentioned locales needs to be set
> back to en_US? I would like to have case ignored but I want to be
> careful not to mess up the sorting of spaces, which we have already
> fixed. I hate to have to test on my own and then have to reinitialize
> more than once! Please advise if you can.

The sorting characteristics of C are strict ASCII order, spaces
significant.

The characteristics of en_* are dictionary order, more spaces sort after
fewer spaces:

        $ LANG=en_GB sort /tmp/ol
        fredbrooks
        fred brooks
        Fredbrooks
        Fred Brooks
        Fred  Brooks
        FredBuck
        Fred Buck
        Fred  Buck

        $ LANG=C sort /tmp/ol
        Fred  Brooks
        Fred  Buck
        Fred Brooks
        Fred Buck
        FredBuck
        Fredbrooks
        fred brooks
        fredbrooks

So I suspect that if you want a mixture of these characteristics, you
will have to write your own locale. Don't ask me how...

--
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight, UK                             http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
                 ========================================
     "Dearly beloved, avenge not yourselves, but rather give
      place unto wrath. For it is written, Vengeance is
      mine; I will repay, saith the Lord. Therefore if thine
      enemy hunger, feed him; if he thirst, give him drink;
      for in so doing thou shalt heap coals of fire on his
      head. Be not overcome of evil, but overcome evil with
      good."      Romans 12:19-21