Thread: locale support
Just a simple question regarding which locale setting applies to which type of sorting. I have two machines, production and development. The development machine has LC_COLLATE and LC_CTYPE set to C and the other has them both set to en_US.
We originally changed the development machine from en_US to C because we were having a problem with spaces being sorted in correctly. That has been fixed on the development box, but I now noticed it is sorting lower case letters after all capital letters. I would prefer that case is ignored when alphabetical sorts are completed. This type of sort is working correctly on the machine where both locales are set to en_US.
Does this mean that one of the above mentioned locales needs to be set back to en_US? I would like to have case ignored but I want to be careful not to mess up the sorting of spaces, which we have already fixed. I hate to have to test on my own and then have to reinitialize more than once! Please advise if you can.
Thanks
Jodi
We originally changed the development machine from en_US to C because we were having a problem with spaces being sorted in correctly. That has been fixed on the development box, but I now noticed it is sorting lower case letters after all capital letters. I would prefer that case is ignored when alphabetical sorts are completed. This type of sort is working correctly on the machine where both locales are set to en_US.
Does this mean that one of the above mentioned locales needs to be set back to en_US? I would like to have case ignored but I want to be careful not to mess up the sorting of spaces, which we have already fixed. I hate to have to test on my own and then have to reinitialize more than once! Please advise if you can.
Thanks
Jodi
--
_______________________________
Jodi L Kanter
BioInformatics Database Administrator
University of Virginia
(434) 924-2846
jkanter@virginia.edu
Jodi Kanter writes: > We originally changed the development machine from en_US to C because we > were having a problem with spaces being sorted in correctly. That has > been fixed on the development box, but I now noticed it is sorting lower > case letters after all capital letters. I would prefer that case is > ignored when alphabetical sorts are completed. This type of sort is > working correctly on the machine where both locales are set to en_US. "C" gives you byte sort order (which happens to come out A..Za..z), en_US (or any other "real" locale) gives you a more natural order that matches what a typical dictionary would use. If you have very particular requirements, you can try to create your own locales. Most modern operating systems have support for that. -- Peter Eisentraut peter_e@gmx.net
On Wed, May 07, 2003 at 15:34:52 -0400, Jodi Kanter <jkanter@virginia.edu> wrote: > Just a simple question regarding which locale setting applies to which > type of sorting. I have two machines, production and development. The > development machine has LC_COLLATE and LC_CTYPE set to C and the other > has them both set to en_US. > We originally changed the development machine from en_US to C because we > were having a problem with spaces being sorted in correctly. That has > been fixed on the development box, but I now noticed it is sorting lower > case letters after all capital letters. I would prefer that case is > ignored when alphabetical sorts are completed. This type of sort is > working correctly on the machine where both locales are set to en_US. > Does this mean that one of the above mentioned locales needs to be set > back to en_US? I would like to have case ignored but I want to be > careful not to mess up the sorting of spaces, which we have already > fixed. I hate to have to test on my own and then have to reinitialize > more than once! Please advise if you can. You could order by the lower function and still keep the C locale. If you need an index, you can use a functional index.
On Wed, 2003-05-07 at 20:34, Jodi Kanter wrote: > Just a simple question regarding which locale setting applies to which > type of sorting. I have two machines, production and development. The > development machine has LC_COLLATE and LC_CTYPE set to C and the other > has them both set to en_US. > We originally changed the development machine from en_US to C because > we were having a problem with spaces being sorted in correctly. That > has been fixed on the development box, but I now noticed it is sorting > lower case letters after all capital letters. I would prefer that case > is ignored when alphabetical sorts are completed. This type of sort is > working correctly on the machine where both locales are set to en_US. > Does this mean that one of the above mentioned locales needs to be set > back to en_US? I would like to have case ignored but I want to be > careful not to mess up the sorting of spaces, which we have already > fixed. I hate to have to test on my own and then have to reinitialize > more than once! Please advise if you can. The sorting characteristics of C are strict ASCII order, spaces significant. The characteristics of en_* are dictionary order, more spaces sort after fewer spaces: $ LANG=en_GB sort /tmp/ol fredbrooks fred brooks Fredbrooks Fred Brooks Fred Brooks FredBuck Fred Buck Fred Buck $ LANG=C sort /tmp/ol Fred Brooks Fred Buck Fred Brooks Fred Buck FredBuck Fredbrooks fred brooks fredbrooks So I suspect that if you want a mixture of these characteristics, you will have to write your own locale. Don't ask me how... -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight, UK http://www.lfix.co.uk/oliver GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C ======================================== "Dearly beloved, avenge not yourselves, but rather give place unto wrath. For it is written, Vengeance is mine; I will repay, saith the Lord. Therefore if thine enemy hunger, feed him; if he thirst, give him drink; for in so doing thou shalt heap coals of fire on his head. Be not overcome of evil, but overcome evil with good." Romans 12:19-21