Thread: Default Locale in initdb
Is it me or has the default locale of created databases change at some point? Currently, on Linux, if one does not specify a locale, the locale is taken from the system environment and it is not "C." While I can both sides of a discussion, I think that choosing a "locale" without one being specified is a bad idea, even if it is the locale of the machine. The reason why it is a bad idea is that certain features of the database which only work correctly with a locale of "C" will not work by default.
pgsql@mohawksoft.com wrote: >Is it me or has the default locale of created databases change at some point? > >Currently, on Linux, if one does not specify a locale, the locale is taken >from the system environment and it is not "C." > >While I can both sides of a discussion, I think that choosing a "locale" >without one being specified is a bad idea, even if it is the locale of the >machine. The reason why it is a bad idea is that certain features of the >database which only work correctly with a locale of "C" will not work by >default. > > > This is not new behaviour. (Why are you the only person who posts here who is nameless?) cheers andrew
Just because it is not new does not mean that it is good. When this new behavior was introduced, and I migrated our databases to the new PgSQL version (dump/restore), the locale of all my databases were silently changed from C to US_en. This broke one application in a very subtle way because of slightly different sort behavior in the different locale. Tracking it down was quite tricky. PgSQL was just a little too helpful in this case. Andrew Dunstan wrote: > pgsql@mohawksoft.com wrote: > >> Is it me or has the default locale of created databases change at some >> point? >> >> Currently, on Linux, if one does not specify a locale, the locale is >> taken >> from the system environment and it is not "C." >> >> While I can both sides of a discussion, I think that choosing a "locale" >> without one being specified is a bad idea, even if it is the locale of >> the >> machine. The reason why it is a bad idea is that certain features of the >> database which only work correctly with a locale of "C" will not work by >> default. > > This is not new behaviour. > > (Why are you the only person who posts here who is nameless?) > > cheers > > andrew -- __ / | Paul Ramsey | Refractions Research | Email: pramsey@refractions.net | Phone: (250) 885-0632 \_
On Wed, 2 Jun 2004 pgsql@mohawksoft.com wrote: > Is it me or has the default locale of created databases change at some point? > > Currently, on Linux, if one does not specify a locale, the locale is taken > from the system environment and it is not "C." > > While I can both sides of a discussion, I think that choosing a "locale" > without one being specified is a bad idea, even if it is the locale of the > machine. The reason why it is a bad idea is that certain features of the > database which only work correctly with a locale of "C" will not work by > default. The same is true with not taking the locale. Other unix applications will sort "correctly" without additional work, but PostgreSQL will not. The LIKE optimization can be "fixed" in recent versions by adding an index and leaving the locale, but getting correct sorting is going to require a reinitdb.
Paul Ramsey wrote: > Just because it is not new does not mean that it is good. Sure. I've been caught by it too. Once. :-) > > When this new behavior was introduced, and I migrated our databases to > the new PgSQL version (dump/restore), the locale of all my databases > were silently changed from C to US_en. This broke one application in a > very subtle way because of slightly different sort behavior in the > different locale. Tracking it down was quite tricky. > > PgSQL was just a little too helpful in this case. It doesn't happen silently - initdb tells you what it is doing. Ignoring the current environment and using a default value of "C" would be a very simple change to make, if that's what people want. cheers andrew > > Andrew Dunstan wrote: > >> pgsql@mohawksoft.com wrote: >> >>> Is it me or has the default locale of created databases change at >>> some point? >>> >>> Currently, on Linux, if one does not specify a locale, the locale is >>> taken >>> from the system environment and it is not "C." >>> >>> While I can both sides of a discussion, I think that choosing a >>> "locale" >>> without one being specified is a bad idea, even if it is the locale >>> of the >>> machine. The reason why it is a bad idea is that certain features of >>> the >>> database which only work correctly with a locale of "C" will not >>> work by >>> default. >> >> >> This is not new behaviour. >> >> (Why are you the only person who posts here who is nameless?) >> >> cheers >> >> andrew > > >
> When this new behavior was introduced, and I migrated our databases to > the new PgSQL version (dump/restore), the locale of all my databases > were silently changed from C to US_en. This broke one application in a > very subtle way because of slightly different sort behavior in the > different locale. Tracking it down was quite tricky. > > PgSQL was just a little too helpful in this case. Seems pretty nasty thing to do. I would so vote for making -E and -W and --locate required flags to initdb. Oh the amount of time I've spent with people in IRC.. Chris
Christopher Kings-Lynne wrote: > > When this new behavior was introduced, and I migrated our databases to > > the new PgSQL version (dump/restore), the locale of all my databases > > were silently changed from C to US_en. This broke one application in a > > very subtle way because of slightly different sort behavior in the > > different locale. Tracking it down was quite tricky. > > > > PgSQL was just a little too helpful in this case. > > Seems pretty nasty thing to do. I would so vote for making -E and -W > and --locate required flags to initdb. Oh the amount of time I've spent > with people in IRC.. What about folks who don't use locales? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
> Christopher Kings-Lynne wrote: >> > When this new behavior was introduced, and I migrated our databases to >> > the new PgSQL version (dump/restore), the locale of all my databases >> > were silently changed from C to US_en. This broke one application in a >> > very subtle way because of slightly different sort behavior in the >> > different locale. Tracking it down was quite tricky. >> > >> > PgSQL was just a little too helpful in this case. >> >> Seems pretty nasty thing to do. I would so vote for making -E and -W >> and --locate required flags to initdb. Oh the amount of time I've spent >> with people in IRC.. > > What about folks who don't use locales? This has bitten me a couple times. In what version did it change? My feeling, and I'd like to see what everyone else thinks, is that if you do not specify a locale, you get "C." That way things work as you'd expect in most cases.
> This has bitten me a couple times. In what version did it change? > > My feeling, and I'd like to see what everyone else thinks, is that if you > do not specify a locale, you get "C." I think that initdb should default to something, and do the following: * Have an explicit warnign if no locale specified, and what it is defaulting to * Same for encoding. NO-ONE knows about the -E option when they first use postgres. Trust me on this. * Same for -W. NO-ONE knows this exists. Then they change their trusts to md5 and they can't login to their postgres account anymore.
Christopher Kings-Lynne wrote: >> This has bitten me a couple times. In what version did it change? >> >> My feeling, and I'd like to see what everyone else thinks, is that if >> you >> do not specify a locale, you get "C." > > > I think that initdb should default to something, and do the following: > > * Have an explicit warnign if no locale specified, and what it is > defaulting to > > * Same for encoding. NO-ONE knows about the -E option when they first > use postgres. Trust me on this. > > * Same for -W. NO-ONE knows this exists. Then they change their > trusts to md5 and they can't login to their postgres account anymore. > Of these, encoding can be overridden when you create a db, and the password issue can be recovered from very quickly. Only the lc-ctype and lc-collate settings are written in stone by initdb. So I think we can split up the cases. ISTM there's a good case for defaulting at least lc-collate and lc-ctype to "C" rather than whatever the environment says (the other locale settings can be reset in the config file anyway). cheers andrew