Thread: Locale by default?
It occurred to me that a server with locale features that is started in the C locale is going to behave the same as a server without locale features. The exception are a few extra memory moving operations. (I sincerely hope that all systems' libcs have optimized paths for the C locale.) So we could get rid of this --enable-locale switch altogether. Given our international user base, this would be an appropriate step and move the locale support out of the "cumbersome secondary feature" compartment. What do you think? -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Hi Peter, Any idea of how many "extra memory moving operations" that would be? :-) Regards and best wishes, Justin Clift Peter Eisentraut wrote: > > It occurred to me that a server with locale features that is started in > the C locale is going to behave the same as a server without locale > features. The exception are a few extra memory moving operations. (I > sincerely hope that all systems' libcs have optimized paths for the C > locale.) So we could get rid of this --enable-locale switch altogether. > Given our international user base, this would be an appropriate step and > move the locale support out of the "cumbersome secondary feature" > compartment. What do you think? > > -- > Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
Peter Eisentraut <peter_e@gmx.net> writes: > (I sincerely hope that all systems' libcs have optimized paths for the C > locale.) So we could get rid of this --enable-locale switch > altogether. Some experimental evidence to support the claim that --enable-locale has zero cost would be good before taking this step. If any hotspots turn up, we could possibly do runtime checks: if (locale_is_c()) strcmp()else strcoll() regards, tom lane
If it's of any assistance, I'm working with the Open Source Database Benchmark guys (osdb.sourceforge.net) to get an AS3AP-based benchmark for PostgreSQL 7.1.x+ up-and-running reliably. It's working on my Mandrake Linux 8.0 system here, but I need the main OSDB guy to get back from holidays to review and commit things to their CVS. ETA of around a week from right now. :) My point is, if we've got decent benchmarking software (and we can actually freely use it), we can do real-world validation tests when considering things like Peter's suggestion. Sounds good to me. Regards and best wishes, Justin Clift Tom Lane wrote: > > Peter Eisentraut <peter_e@gmx.net> writes: > > (I sincerely hope that all systems' libcs have optimized paths for the C > > locale.) So we could get rid of this --enable-locale switch > > altogether. > > Some experimental evidence to support the claim that --enable-locale has > zero cost would be good before taking this step. > > If any hotspots turn up, we could possibly do runtime checks: > > if (locale_is_c()) > strcmp() > else > strcoll() > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
> It occurred to me that a server with locale features that is started in > the C locale is going to behave the same as a server without locale > features. The exception are a few extra memory moving operations. (I > sincerely hope that all systems' libcs have optimized paths for the C > locale.) So we could get rid of this --enable-locale switch altogether. > Given our international user base, this would be an appropriate step and > move the locale support out of the "cumbersome secondary feature" > compartment. What do you think? I wouldn't object it if there is a way to disable locale support. We in Japan are always troubled by borken Japanese locales on some systems. I'm afraid to hear more complains if there is no way to disable the locale support. Moreover, collation of locales for Japanese are broken on all platforms as far as I know. I'm not sure about other Asian languages though. -- Tatsuo Ishii
Tatsuo Ishii writes: > I wouldn't object it if there is a way to disable locale support. export LC_ALL=C -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Peter Eisentraut wrote: > > Tatsuo Ishii writes: > > > I wouldn't object it if there is a way to disable locale support. > > export LC_ALL=C I would object even if there's such a way. People in Japan have hardly noticed that the strange behabior is due to the strange locale(LC_COLLATE). regards, Hiroshi Inoue
> Tatsuo Ishii writes: > > > I wouldn't object it if there is a way to disable locale support. > > export LC_ALL=C It's not a solution. My point is people should not be troubled by the useless feature (at least for Japanese) even if they set their locale other than C. -- Tatsuo Ishii
Hiroshi Inoue writes: > I would object even if there's such a way. > People in Japan have hardly noticed that the strange > behabior is due to the strange locale(LC_COLLATE). I don't think we should design our systems in a way that inconveniences many users because some users are using broken operating systems. If Japanese users have not realized yet that the locale support they are using is broken, then it's not the right solution to disable it in PostgreSQL by default. In that case the problem would just persist for the system as a whole. The right solution is for them to turn off locale support in their operating system, the way it's supposed to be done. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Tatsuo Ishii writes: > > Tatsuo Ishii writes: > > > > > I wouldn't object it if there is a way to disable locale support. > > > > export LC_ALL=C > > It's not a solution. My point is people should not be troubled by the > useless feature (at least for Japanese) even if they set their locale > other than C. If people set their locale to something other than C they have evidently judged that locale is not useless. Why would they set it otherwise? I don't think hiding away a feature because you think it's useless is a good idea. If people don't like it, allow them to turn it off. If there are potential problems related to the feature, document them. Face it, everything has locale support these day. PostgreSQL is one of the few packages that even has it as an option to turn it off. Users of binary packages of PostgreSQL are all invariably faced with locale features. So it's not like sudden unasked-for locale support is going to be a major shock. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
> > I would object even if there's such a way. > > People in Japan have hardly noticed that the strange > > behabior is due to the strange locale(LC_COLLATE). > > I don't think we should design our systems in a way that inconveniences > many users because some users are using broken operating systems. If > Japanese users have not realized yet that the locale support they are > using is broken, then it's not the right solution to disable it in > PostgreSQL by default. In that case the problem would just persist for > the system as a whole. The right solution is for them to turn off locale > support in their operating system, the way it's supposed to be done. I do not agree with your above statement, I would also want a way to turn it off in PostreSQL alone and leave the OS and rest as is (without a need to worry about). (Our admins use C, En_US, De_DE, De_AT here, but no locale support in the db) Imho we also need to keep in mind that other DB's don't create locale aware char columns by default eighter (they have nchar or some other extended create table syntax). Andreas
> Face it, everything has locale support these day. PostgreSQL is one of > the few packages that even has it as an option to turn it off. Users of > binary packages of PostgreSQL are all invariably faced with locale > features. So it's not like sudden unasked-for locale support is going to > be a major shock. What makes you so opposed to a GUC for disabling locale support ? Andreas
Zeugswetter Andreas SB SD writes: > What makes you so opposed to a GUC for disabling locale support ? Nothing. It may in fact be the best solution. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Peter Eisentraut <peter_e@gmx.net> writes: > Zeugswetter Andreas SB SD writes: >> What makes you so opposed to a GUC for disabling locale support ? > Nothing. It may in fact be the best solution. As long as locale has to be an initdb-time setting, a GUC var won't help much. regards, tom lane
Peter Eisentraut wrote: > > Hiroshi Inoue writes: > > > I would object even if there's such a way. > > People in Japan have hardly noticed that the strange > > behabior is due to the strange locale(LC_COLLATE). > > I don't think we should design our systems in a way that inconveniences > many users because some users are using broken operating systems. If > Japanese users have not realized yet that the locale support they are > using is broken, I don't know if the locale support is broken in Japan. I can't think of any reasonable Japanese collating sequence at once(maybe for ever). I don't think people should know about the existence of collating sequences in Japan. > then it's not the right solution to disable it in > PostgreSQL by default. In that case the problem would just persist for > the system as a whole. The right solution is for them to turn off locale > support in their operating system, the way it's supposed to be done. DBMS should be independent from the OS settings as far as possible especially in the handling of data. Currently we could hardly judge if we are running on a locale or not from the dbms POV and it doesn't seem a dbms kind of thing in the first place. I'm a dbms guy not an OS guy and really dislike the requirement for users to export LC_ALL=C. regards, Hiroshi Inoue
> If people set their locale to something other than C they have evidently > judged that locale is not useless. Why would they set it otherwise? As Hiroshi pointed out, the broken thing is the LC_COLLATE, other things in the local are working. > I > don't think hiding away a feature because you think it's useless is a good > idea. If people don't like it, allow them to turn it off. If there are > potential problems related to the feature, document them. I don't object the idea letting users turn it off. I said we need a way to turn it off in the configuration/compile time. > Face it, everything has locale support these day. PostgreSQL is one of > the few packages that even has it as an option to turn it off. Users of > binary packages of PostgreSQL are all invariably faced with locale > features. So it's not like sudden unasked-for locale support is going to > be a major shock. I would say it's a misunderstanding that the locale (more precisely LC_COLLATE) is usefull for *any* Language/encodings. -- Tatsuo Ishii
> Hiroshi Inoue writes: > > > I would object even if there's such a way. > > People in Japan have hardly noticed that the strange > > behabior is due to the strange locale(LC_COLLATE). > > I don't think we should design our systems in a way that inconveniences > many users because some users are using broken operating systems. If I don't understand why you object the idea giving PostgreSQL the ability to turn off the locale support in configuration/compile time. In that way, there's no inconveniences for "many users". -- Tatsuo Ishii
> DBMS should be independent from the OS settings as far as > possible especially in the handling of data. Currently we > could hardly judge if we are running on a locale or not from > the dbms POV and it doesn't seem a dbms kind of thing in the > first place. I'm a dbms guy not an OS guy and really dislike > the requirement for users to export LC_ALL=C. Yup, I can second that. Also note, that currently a locale aware index might get corrupted if you do an OS upgrade (that changes the collation: e.g. add the ? symbol). I sortof think, that pg locale support is not yet up to prime time. If we had something that conformed to the Spec (per column lang and collation), then yes I would make it mainstream, but as is ? Andreas
Tatsuo Ishii writes: > I don't understand why you object the idea giving PostgreSQL the > ability to turn off the locale support in configuration/compile > time. In that way, there's no inconveniences for "many users". I don't mind at all the ability to turn it off. My point is that the compile time is the wrong time to do it. Many users use binary packages these days, many more users would like to use binary packages. But the creators of these packages have to make configuration choices to satisfy all of their users. So they turn on the locale support, because that way if you don't want it you can turn if off. The other way around doesn't work. The more appropriate way to handle this situation is to make it a runtime option. I agree that the LC_ALL/LC_COLLATE/LANG lattice is confusing and fragile. But there can be other ways, e.g., initdb --locale=en_US initdb --locale-collate=C --locale-ctype=en_US initdb # defaults to --locale=C or in postgresql.conf locale=C locale_numeric=en_US etc. or SHOW locale; SHOW locale_numeric; That way you always know exactly what situation you're in. I think this was Hiroshi's main concern, the reliance on export LC_ALL, and I agree that this is bad. You say locale in Japan works, except for LC_COLLATE. This concern would be satisfied by the above approach. Comments? -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
> > I don't understand why you object the idea giving PostgreSQL the > > ability to turn off the locale support in configuration/compile > > time. In that way, there's no inconveniences for "many users". > > I don't mind at all the ability to turn it off. My point is that the > compile time is the wrong time to do it. Many users use binary > packages these days, many more users would like to use binary packages. > But the creators of these packages have to make configuration choices to > satisfy all of their users. So they turn on the locale support, because > that way if you don't want it you can turn if off. The other way around > doesn't work. Yup, imho we all understood that and the only (to be validated) concern is performance. > > The more appropriate way to handle this situation is to make it a runtime > option. I agree that the LC_ALL/LC_COLLATE/LANG lattice is confusing and > fragile. But there can be other ways, e.g., Yes, that was the (or at least my) main concern. > initdb --locale=en_US > initdb --locale-collate=C --locale-ctype=en_US > initdb # defaults to --locale=C > > or in postgresql.conf > > locale=C > locale_numeric=en_US > etc. > > or > > SHOW locale; > SHOW locale_numeric; > > That way you always know exactly what situation you're in. I think this > was Hiroshi's main concern, the reliance on export LC_ALL, and I agree > that this is bad. > > You say locale in Japan works, except for LC_COLLATE. This concern would > be satisfied by the above approach. > > Comments? I think that's it :-) Andreas
> Face it, everything has locale support these day. PostgreSQL is one of > the few packages that even has it as an option to turn it off. Users of > binary packages of PostgreSQL are all invariably faced with locale > features. So it's not like sudden unasked-for locale support is going to > be a major shock. Certainly everyone would agree that "locale support" is desirable. Tatsuo has been one of the earliest and most vocal participants in design speculations on how to support the SQL9x concept of character sets and collations, which for purposes of long range planning seem to be synonymous with "locale" afaict. The question is whether and how to continue to extend the use of OS-supplied features to accomplish this support, with the severe restrictions (from an SQL9x pov) which come with the OS implementation. - Thomas
> > Face it, everything has locale support these day. PostgreSQL is one of > > the few packages that even has it as an option to turn it off. Users of > > binary packages of PostgreSQL are all invariably faced with locale > > features. So it's not like sudden unasked-for locale support is going to > > be a major shock. > > Certainly everyone would agree that "locale support" is desirable. No. At least for Japanese, LC_COLLATE is not usefull at all. Let me explain why. Japanese has three kind of characters: The first one is called "Kanji", Scond one is "Hiragana". The last one is "Katakana". Many pary of data stored in database are usually written in Kanji. The problem is, Kanji is an ideogram and there is no algorithm to guess the correct pronunciation for Kanji letters. The only solution for this is add a separate column having Hiragana or Katakana letters which represents the pronunciation for the Kanji column (Hiragana and Kataka are phonogram). Sorting is also done by the additional Hiragana/Katakan column, that can be done according to the code point of Hiragana/Katakana. So no locale support (LC_COLLATE) is neccessary at all for Japanese. > Tatsuo has been one of the earliest and most vocal participants in > design speculations on how to support the SQL9x concept of character > sets and collations, which for purposes of long range planning seem to > be synonymous with "locale" afaict. > > The question is whether and how to continue to extend the use of > OS-supplied features to accomplish this support, with the severe > restrictions (from an SQL9x pov) which come with the OS implementation. In my opinion, with the SQL99 collate support, the current locale support should be vanished. -- Tatsuo Ishii
> > Certainly everyone would agree that "locale support" is desirable. > No... That is why I put "locale support" in double-quotes. Sorry that I was cryptic, but I do understand your concern that OS-specific locale support is suspect for some languages. > In my opinion, with the SQL99 collate support, the current locale > support should be vanished. Right. That is my opinion too. - Thomas