Thread: per-database locale: createdb switches

per-database locale: createdb switches

From
Alvaro Herrera
Date:
Hi,

I just noticed that the interface for choosing a different locale at db
creation time is
createdb --lc-collate=X --lc-ctype=X.  Is there a reason for having
these two separate switches?  It seems awkward; why can't we just have a
single --locale switch that selects both settings at once?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: per-database locale: createdb switches

From
Teodor Sigaev
Date:
Alvaro Herrera wrote:
> Hi,
> 
> I just noticed that the interface for choosing a different locale at db
> creation time is
> createdb --lc-collate=X --lc-ctype=X.  Is there a reason for having
> these two separate switches?  It seems awkward; why can't we just have a
> single --locale switch that selects both settings at once?
> 

Sometimes it's needed to use C-collate with non-C-ctype. But for most 
users it's enough just a locale switch. What about 
[--locale=X|--lc-collate=X --lc-ctype=X] option?


Re: per-database locale: createdb switches

From
Tom Lane
Date:
Teodor Sigaev <teodor@sigaev.ru> writes:
> Alvaro Herrera wrote:
>> It seems awkward; why can't we just have a
>> single --locale switch that selects both settings at once?

> Sometimes it's needed to use C-collate with non-C-ctype. But for most 
> users it's enough just a locale switch. What about 
> [--locale=X|--lc-collate=X --lc-ctype=X] option?

Seems to me there's one there already.
        regards, tom lane


Re: per-database locale: createdb switches

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Teodor Sigaev <teodor@sigaev.ru> writes:
> > Alvaro Herrera wrote:
> >> It seems awkward; why can't we just have a
> >> single --locale switch that selects both settings at once?
> 
> > Sometimes it's needed to use C-collate with non-C-ctype. But for most 
> > users it's enough just a locale switch. What about 
> > [--locale=X|--lc-collate=X --lc-ctype=X] option?
> 
> Seems to me there's one there already.

You're thinking of initdb maybe?  I'm talking about createdb.


$ LC_ALL=C createdb --version
createdb (PostgreSQL) 8.4devel

$ LC_ALL=C createdb --help
createdb creates a PostgreSQL database.

Usage: createdb [OPTION]... [DBNAME] [DESCRIPTION]

Options: -D, --tablespace=TABLESPACE  default tablespace for the database -E, --encoding=ENCODING      encoding for the
database--lc-collate=LOCALE          LC_COLLATE setting for the database --lc-ctype=LOCALE            LC_CTYPE setting
forthe database -O, --owner=OWNER            database user to own the new database -T, --template=TEMPLATE
templatedatabase to copy -e, --echo                   show the commands being sent to the server --help
     show this help, then exit --version                    output version information, then exit
 

Connection options: -h, --host=HOSTNAME          database server host or socket directory -p, --port=PORT
databaseserver port -U, --username=USERNAME      user name to connect as -W, --password               force password
prompt

By default, a database with the same name as the current user is created.

Report bugs to <pgsql-bugs@postgresql.org>.


-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: per-database locale: createdb switches

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> Seems to me there's one there already.

> You're thinking of initdb maybe?  I'm talking about createdb.

Oh, okay.  But how often is someone going to be changing locales during
createdb?  I think the most common case might well be like Teodor said,
where you need to tweak them individually anyway.
        regards, tom lane


Re: per-database locale: createdb switches

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Tom Lane wrote:
> >> Seems to me there's one there already.
> 
> > You're thinking of initdb maybe?  I'm talking about createdb.
> 
> Oh, okay.  But how often is someone going to be changing locales during
> createdb?  I think the most common case might well be like Teodor said,
> where you need to tweak them individually anyway.

Frequently, I think.  In fact I think creating a database in a different
language is going to be more frequent than tweaking the settings
individually.

I like Teodor's proposal; I'll see about implementing that.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: per-database locale: createdb switches

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> I like Teodor's proposal; I'll see about implementing that.

Attached.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment

Re: per-database locale: createdb switches

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Alvaro Herrera wrote:
>> I like Teodor's proposal; I'll see about implementing that.

> Attached.

You missed updating the sgml docs, and personally I'd be inclined to
list -l before the individual --lc switches; otherwise it looks fine.
        regards, tom lane


Re: per-database locale: createdb switches

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Alvaro Herrera wrote:
> >> I like Teodor's proposal; I'll see about implementing that.
> 
> > Attached.
> 
> You missed updating the sgml docs, and personally I'd be inclined to
> list -l before the individual --lc switches; otherwise it looks fine.

Thanks, committed that way.  I noticed that --lc-ctype and --lc-collate
were forgotten in SGML docs, so I added them too.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: per-database locale: createdb switches

From
Heikki Linnakangas
Date:
Alvaro Herrera wrote:
> Tom Lane wrote:
>> Alvaro Herrera <alvherre@commandprompt.com> writes:
>>> Alvaro Herrera wrote:
>>>> I like Teodor's proposal; I'll see about implementing that.
>>> Attached.
>> You missed updating the sgml docs, and personally I'd be inclined to
>> list -l before the individual --lc switches; otherwise it looks fine.
> 
> Thanks, committed that way.  I noticed that --lc-ctype and --lc-collate
> were forgotten in SGML docs, so I added them too.

Should we have a shorthand CREATE DATABASE option like that as well?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: per-database locale: createdb switches

From
Bruce Momjian
Date:
Heikki Linnakangas wrote:
> Alvaro Herrera wrote:
> > Tom Lane wrote:
> >> Alvaro Herrera <alvherre@commandprompt.com> writes:
> >>> Alvaro Herrera wrote:
> >>>> I like Teodor's proposal; I'll see about implementing that.
> >>> Attached.
> >> You missed updating the sgml docs, and personally I'd be inclined to
> >> list -l before the individual --lc switches; otherwise it looks fine.
> > 
> > Thanks, committed that way.  I noticed that --lc-ctype and --lc-collate
> > were forgotten in SGML docs, so I added them too.
> 
> Should we have a shorthand CREATE DATABASE option like that as well?

createdb is really about convenience;  not sure it is warranted for
CREATE DATABASE.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: per-database locale: createdb switches

From
Peter Eisentraut
Date:
Bruce Momjian wrote:
> Heikki Linnakangas wrote:
>> Alvaro Herrera wrote:
>>> Tom Lane wrote:
>>>> Alvaro Herrera <alvherre@commandprompt.com> writes:
>>>>> Alvaro Herrera wrote:
>>>>>> I like Teodor's proposal; I'll see about implementing that.
>>>>> Attached.
>>>> You missed updating the sgml docs, and personally I'd be inclined to
>>>> list -l before the individual --lc switches; otherwise it looks fine.
>>> Thanks, committed that way.  I noticed that --lc-ctype and --lc-collate
>>> were forgotten in SGML docs, so I added them too.
>> Should we have a shorthand CREATE DATABASE option like that as well?
> 
> createdb is really about convenience;  not sure it is warranted for
> CREATE DATABASE.

I think unless you are doing something completely funny, you would 
usually want to have COLLATE and CTYPE equal.  The fact that you now 
have to enter both to get that result could be pretty annoying in 
practice, I  would think.



Re: per-database locale: createdb switches

From
Bruce Momjian
Date:
Peter Eisentraut wrote:
> Bruce Momjian wrote:
> > Heikki Linnakangas wrote:
> >> Alvaro Herrera wrote:
> >>> Tom Lane wrote:
> >>>> Alvaro Herrera <alvherre@commandprompt.com> writes:
> >>>>> Alvaro Herrera wrote:
> >>>>>> I like Teodor's proposal; I'll see about implementing that.
> >>>>> Attached.
> >>>> You missed updating the sgml docs, and personally I'd be inclined to
> >>>> list -l before the individual --lc switches; otherwise it looks fine.
> >>> Thanks, committed that way.  I noticed that --lc-ctype and --lc-collate
> >>> were forgotten in SGML docs, so I added them too.
> >> Should we have a shorthand CREATE DATABASE option like that as well?
> > 
> > createdb is really about convenience;  not sure it is warranted for
> > CREATE DATABASE.
> 
> I think unless you are doing something completely funny, you would 
> usually want to have COLLATE and CTYPE equal.  The fact that you now 
> have to enter both to get that result could be pretty annoying in 
> practice, I  would think.

I agree but I can't think of many cases where we offer one option which
controls two other options;  can you?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: per-database locale: createdb switches

From
Peter Eisentraut
Date:
Bruce Momjian wrote:
>>>>>> You missed updating the sgml docs, and personally I'd be inclined to
>>>>>> list -l before the individual --lc switches; otherwise it looks fine.
>>>>> Thanks, committed that way.  I noticed that --lc-ctype and --lc-collate
>>>>> were forgotten in SGML docs, so I added them too.
>>>> Should we have a shorthand CREATE DATABASE option like that as well?
>>> createdb is really about convenience;  not sure it is warranted for
>>> CREATE DATABASE.
>> I think unless you are doing something completely funny, you would 
>> usually want to have COLLATE and CTYPE equal.  The fact that you now 
>> have to enter both to get that result could be pretty annoying in 
>> practice, I  would think.
> 
> I agree but I can't think of many cases where we offer one option which
> controls two other options;  can you?

We have cases like that:

initdb --locale
createdb --locale

It looks to me, however, that there is possible confusion about what 
createdb --locale (as well as any possible option to be added to CREATE 
DATABASE) really affects:

initdb --locale controls --lc-ctype, --lc-collate, --lc-messages, 
--lc-monetary, --lc-numeric, --lc-time.

createdb --locale only controls --lc-ctype and --lc-collate.  The 
functionality to have database-specific settings of the other locale 
categories already exists, so why shouldn't those be set as well?

Which raises yet another question, why CTYPE and COLLATE have to be 
hardcoded settings and catalog columns instead of being stored in 
datconfig as database-startup-only settings?


Re: per-database locale: createdb switches

From
Heikki Linnakangas
Date:
Peter Eisentraut wrote:
> Which raises yet another question, why CTYPE and COLLATE have to be 
> hardcoded settings and catalog columns instead of being stored in 
> datconfig as database-startup-only settings?

Because changing CTYPE or COLLATE in an existing database would render 
indexes broken.

Perhaps we could've put them in datconfig, and forbidden changing them 
after CREATE DATABASE. Then again, encoding is a similar setting too, 
and that's stored in a catalog column.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: per-database locale: createdb switches

From
Peter Eisentraut
Date:
Heikki Linnakangas wrote:
> Peter Eisentraut wrote:
>> Which raises yet another question, why CTYPE and COLLATE have to be 
>> hardcoded settings and catalog columns instead of being stored in 
>> datconfig as database-startup-only settings?
> 
> Because changing CTYPE or COLLATE in an existing database would render 
> indexes broken.
> 
> Perhaps we could've put them in datconfig, and forbidden changing them 
> after CREATE DATABASE. Then again, encoding is a similar setting too, 
> and that's stored in a catalog column.

Yeah, it's a tricky case somewhere in between all the facilities that we 
already have.

I notice in the documentation that the createdb --lc-ctype sets the 
lc_ctype setting for the database, but the corresponding parameter for 
CREATE DATABASE is CTYPE, but the global GUC setting is lc_ctype. 
Should that be more consistent?


Re: per-database locale: createdb switches

From
Heikki Linnakangas
Date:
Peter Eisentraut wrote:
> Heikki Linnakangas wrote:
>> Peter Eisentraut wrote:
>>> Which raises yet another question, why CTYPE and COLLATE have to be 
>>> hardcoded settings and catalog columns instead of being stored in 
>>> datconfig as database-startup-only settings?
>>
>> Because changing CTYPE or COLLATE in an existing database would render 
>> indexes broken.
>>
>> Perhaps we could've put them in datconfig, and forbidden changing them 
>> after CREATE DATABASE. Then again, encoding is a similar setting too, 
>> and that's stored in a catalog column.
> 
> Yeah, it's a tricky case somewhere in between all the facilities that we 
> already have.
> 
> I notice in the documentation that the createdb --lc-ctype sets the 
> lc_ctype setting for the database, but the corresponding parameter for 
> CREATE DATABASE is CTYPE, but the global GUC setting is lc_ctype. Should 
> that be more consistent?

Hmm, I remember I pondered for a long time if it should be COLLATE and 
CTYPE or LC_COLLATE and LC_CTYPE. I think the rationale in the end was 
that a) COLLATE/CTYPE looks nicer and b) if we add support for ICU or 
some other collation implementation, the association with LC_* 
environment variables becomes misleading.

Being consistent would be nice, though.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: per-database locale: createdb switches

From
Alvaro Herrera
Date:
Heikki Linnakangas wrote:
> Peter Eisentraut wrote:

>> I notice in the documentation that the createdb --lc-ctype sets the  
>> lc_ctype setting for the database, but the corresponding parameter for  
>> CREATE DATABASE is CTYPE, but the global GUC setting is lc_ctype. 
>> Should that be more consistent?
>
> Hmm, I remember I pondered for a long time if it should be COLLATE and  
> CTYPE or LC_COLLATE and LC_CTYPE. I think the rationale in the end was  
> that a) COLLATE/CTYPE looks nicer and b) if we add support for ICU or  
> some other collation implementation, the association with LC_*  
> environment variables becomes misleading.
>
> Being consistent would be nice, though.

I think consistency could be reached by renaming the GUC setting to
ctype.  We could add a "lc_ctype" synonym for backwards compatibility
(like sort_mem) -- or maybe not.

Since the createdb setting is new as of 8.4, we should just rename that
to ctype as well.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: per-database locale: createdb switches

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Heikki Linnakangas wrote:
>> Hmm, I remember I pondered for a long time if it should be COLLATE and  
>> CTYPE or LC_COLLATE and LC_CTYPE. I think the rationale in the end was  
>> that a) COLLATE/CTYPE looks nicer and b) if we add support for ICU or  
>> some other collation implementation, the association with LC_*  
>> environment variables becomes misleading.
>> 
>> Being consistent would be nice, though.

> I think consistency could be reached by renaming the GUC setting to
> ctype.

I think this is a bad idea, particularly if you also rename the other
GUC to COLLATE (which is a reserved word that we're going to have to
implement someday).  People know what LC_CTYPE and LC_COLLATE do,
at least if they've heard of Unix locale support at all (and if not
they can google those names successfully).

If we want consistency then the right answer is to rename the *new*
things to lc_xxx, not break compatibility on the names of the
existing things.
        regards, tom lane


Re: per-database locale: createdb switches

From
Bruce Momjian
Date:
Peter Eisentraut wrote:
> Bruce Momjian wrote:
> >>>>>> You missed updating the sgml docs, and personally I'd be inclined to
> >>>>>> list -l before the individual --lc switches; otherwise it looks fine.
> >>>>> Thanks, committed that way.  I noticed that --lc-ctype and --lc-collate
> >>>>> were forgotten in SGML docs, so I added them too.
> >>>> Should we have a shorthand CREATE DATABASE option like that as well?
> >>> createdb is really about convenience;  not sure it is warranted for
> >>> CREATE DATABASE.
> >> I think unless you are doing something completely funny, you would 
> >> usually want to have COLLATE and CTYPE equal.  The fact that you now 
> >> have to enter both to get that result could be pretty annoying in 
> >> practice, I  would think.
> > 
> > I agree but I can't think of many cases where we offer one option which
> > controls two other options;  can you?
> 
> We have cases like that:
> 
> initdb --locale
> createdb --locale
> 
> It looks to me, however, that there is possible confusion about what 
> createdb --locale (as well as any possible option to be added to CREATE 
> DATABASE) really affects:
> 
> initdb --locale controls --lc-ctype, --lc-collate, --lc-messages, 
> --lc-monetary, --lc-numeric, --lc-time.
> 
> createdb --locale only controls --lc-ctype and --lc-collate.  The 
> functionality to have database-specific settings of the other locale 
> categories already exists, so why shouldn't those be set as well?
> 
> Which raises yet another question, why CTYPE and COLLATE have to be 
> hardcoded settings and catalog columns instead of being stored in 
> datconfig as database-startup-only settings?

I was asking for cases where _SQL_ commands have one parameter that
controls two others, not command-line examples.  Can you think of any?

FYI, I am fine adding the SQL-level option, I was just asking.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: per-database locale: createdb switches

From
Bruce Momjian
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Heikki Linnakangas wrote:
> >> Hmm, I remember I pondered for a long time if it should be COLLATE and  
> >> CTYPE or LC_COLLATE and LC_CTYPE. I think the rationale in the end was  
> >> that a) COLLATE/CTYPE looks nicer and b) if we add support for ICU or  
> >> some other collation implementation, the association with LC_*  
> >> environment variables becomes misleading.
> >> 
> >> Being consistent would be nice, though.
> 
> > I think consistency could be reached by renaming the GUC setting to
> > ctype.
> 
> I think this is a bad idea, particularly if you also rename the other
> GUC to COLLATE (which is a reserved word that we're going to have to
> implement someday).  People know what LC_CTYPE and LC_COLLATE do,
> at least if they've heard of Unix locale support at all (and if not
> they can google those names successfully).
> 
> If we want consistency then the right answer is to rename the *new*
> things to lc_xxx, not break compatibility on the names of the
> existing things.

Is anyone working on resolving this?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: per-database locale: createdb switches

From
Peter Eisentraut
Date:
Bruce Momjian wrote:
> Tom Lane wrote:
>> Alvaro Herrera <alvherre@commandprompt.com> writes:
>>> Heikki Linnakangas wrote:
>>>> Hmm, I remember I pondered for a long time if it should be COLLATE and  
>>>> CTYPE or LC_COLLATE and LC_CTYPE. I think the rationale in the end was  
>>>> that a) COLLATE/CTYPE looks nicer and b) if we add support for ICU or  
>>>> some other collation implementation, the association with LC_*  
>>>> environment variables becomes misleading.
>>>>
>>>> Being consistent would be nice, though.
>>> I think consistency could be reached by renaming the GUC setting to
>>> ctype.
>> I think this is a bad idea, particularly if you also rename the other
>> GUC to COLLATE (which is a reserved word that we're going to have to
>> implement someday).  People know what LC_CTYPE and LC_COLLATE do,
>> at least if they've heard of Unix locale support at all (and if not
>> they can google those names successfully).
>>
>> If we want consistency then the right answer is to rename the *new*
>> things to lc_xxx, not break compatibility on the names of the
>> existing things.
> 
> Is anyone working on resolving this?

I think we can just leave it for now.