Thread: WIP patch: Collation support

WIP patch: Collation support

From
"Radek Strnad"
Date:
I'm implementing collation support. Proposal can be found at http://archives.postgresql.org/pgsql-hackers/2008-07/msg00557.php

Progress so far:
- created catalogs pg_collation a pg_charset which are filled with three standard collations
- initdb changes rows called "DEFAULT" in both catalogs during the bki bootstrap phase with current system LC_COLLATE and LC_CTYPE or those set by command line.
- new collations can be defined with command CREATE COLLATION <collation name> FOR <character set specification>  FROM <existing collation name> [STRCOLFN <fn name>]
[ <pad characteristic> ] [ <case sensitive> ] [ LCCOLLATE <lc_collate> ] [ LCCTYPE <lc_ctype> ]
- because of pg_collation and pg_charset are catalogs individual for each database, if you want to create a database with collation other than specified, create it in template1 and then create database
- when connecting to database, it retrieves locales from pg_database and sets them

Design & functionality changes left:
- move retrieveing collation from pg_database to pg_type
- get case sensitivity and pad characteristic working
- wrap locale setting and text comparison into functions that will be referenced from pg_collation as regproc (make the code nice)
- when creating database with different collation than database cluster, the database has to be reindexed. Any idea how to do it? Function ReindexDatabase works only when database is opened.
- minor tweaks & fixes

Ideas? Comments?

Regards

Radek Strnad
Attachment

Re: WIP patch: Collation support

From
Alvaro Herrera
Date:
Radek Strnad escribió:

> - when creating database with different collation than database cluster, the
> database has to be reindexed. Any idea how to do it? Function
> ReindexDatabase works only when database is opened.

We have this Todo item:
 Set proper permissions on non-system schemas during db creation   Currently all schemas are owned by the super-user
becausethey are   copied from the template1 database. However, since all objects are   inherited from the template
database,it is not clear that setting   schemas to the db owner is correct. 
 

When this was discussed years ago, one proposed idea was that on the
first connection the backend should, as a first task, ensure that
certain administrative chores be done.  The first of those was changing
the ownership of schemas.  It sounds like this reindexing you propose
falls into the same category.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: WIP patch: Collation support

From
"Radek Strnad"
Date:
<div dir="ltr">Ok, so do you suggest to leave it with a notice "reindex database" or start to solve it somehow?<br
/><br/>Regards<br /><br />Radek Strnad<br /><br /><div class="gmail_quote">On Mon, Sep 1, 2008 at 12:08 AM, Alvaro
Herrera<span dir="ltr"><<a href="mailto:alvherre@commandprompt.com">alvherre@commandprompt.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex;
padding-left:1ex;">Radek Strnad escribió:<br /><div class="Ih2E3d"><br /> > - when creating database with different
collationthan database cluster, the<br /> > database has to be reindexed. Any idea how to do it? Function<br /> >
ReindexDatabaseworks only when database is opened.<br /><br /></div>We have this Todo item:<br /><br />  Set proper
permissionson non-system schemas during db creation<br />    Currently all schemas are owned by the super-user because
theyare<br />    copied from the template1 database. However, since all objects are<br />    inherited from the
templatedatabase, it is not clear that setting<br />    schemas to the db owner is correct.<br /><br /> When this was
discussedyears ago, one proposed idea was that on the<br /> first connection the backend should, as a first task,
ensurethat<br /> certain administrative chores be done.  The first of those was changing<br /> the ownership of
schemas. It sounds like this reindexing you propose<br /> falls into the same category.<br /><font color="#888888"><br
/>--<br /> Alvaro Herrera                                <a href="http://www.CommandPrompt.com/"
target="_blank">http://www.CommandPrompt.com/</a><br/> PostgreSQL Replication, Consulting, Custom Development, 24x7
support<br/></font></blockquote></div><br /></div> 

Re: WIP patch: Collation support

From
Peter Eisentraut
Date:
Radek Strnad wrote:
> - new collations can be defined with command CREATE COLLATION <collation 
> name> FOR <character set specification>  FROM <existing collation name> 
> [STRCOLFN <fn name>]
> [ <pad characteristic> ] [ <case sensitive> ] [ LCCOLLATE <lc_collate> ] 
> [ LCCTYPE <lc_ctype> ]

How do you plan to make a collation case sensitive or accent sensitive?  I have previously commented that this is not a
realisticview on how 
 
collations work.  Since you are apparently planning to use the system 
locales, I don't see how you can make this work.



Re: WIP patch: Collation support

From
Martijn van Oosterhout
Date:
On Tue, Sep 02, 2008 at 02:50:47PM +0300, Peter Eisentraut wrote:
> Radek Strnad wrote:
> >- new collations can be defined with command CREATE COLLATION <collation
> >name> FOR <character set specification>  FROM <existing collation name>
> >[STRCOLFN <fn name>]
> >[ <pad characteristic> ] [ <case sensitive> ] [ LCCOLLATE <lc_collate> ]
> >[ LCCTYPE <lc_ctype> ]
>
> How do you plan to make a collation case sensitive or accent sensitive?
>  I have previously commented that this is not a realistic view on how
> collations work.  Since you are apparently planning to use the system
> locales, I don't see how you can make this work.

While it's true POSIX locales don't handle this, other collation
libraries do and we should support them if the user wants.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: WIP patch: Collation support

From
"Radek Strnad"
Date:
I think at least case sensitivity can be done by comparing two strings converted to upper case with toupper() function.

Regards

Radek Strnad

On Tue, Sep 2, 2008 at 2:00 PM, Martijn van Oosterhout <kleptog@svana.org> wrote:
On Tue, Sep 02, 2008 at 02:50:47PM +0300, Peter Eisentraut wrote:
> Radek Strnad wrote:
> >- new collations can be defined with command CREATE COLLATION <collation
> >name> FOR <character set specification>  FROM <existing collation name>
> >[STRCOLFN <fn name>]
> >[ <pad characteristic> ] [ <case sensitive> ] [ LCCOLLATE <lc_collate> ]
> >[ LCCTYPE <lc_ctype> ]
>
> How do you plan to make a collation case sensitive or accent sensitive?
>  I have previously commented that this is not a realistic view on how
> collations work.  Since you are apparently planning to use the system
> locales, I don't see how you can make this work.

While it's true POSIX locales don't handle this, other collation
libraries do and we should support them if the user wants.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIvSrIIB7bNG8LQkwRAnkWAJ9FaiR9cOHFN2vkVmQaK5y7N9OJoQCbB+Ks
e0E4722hY/Q+Cz8tpzA0CGs=
=2Svh
-----END PGP SIGNATURE-----



Re: WIP patch: Collation support

From
Peter Eisentraut
Date:
Martijn van Oosterhout wrote:
> On Tue, Sep 02, 2008 at 02:50:47PM +0300, Peter Eisentraut wrote:
>> Radek Strnad wrote:
>>> - new collations can be defined with command CREATE COLLATION <collation 
>>> name> FOR <character set specification>  FROM <existing collation name> 
>>> [STRCOLFN <fn name>]
>>> [ <pad characteristic> ] [ <case sensitive> ] [ LCCOLLATE <lc_collate> ] 
>>> [ LCCTYPE <lc_ctype> ]
>> How do you plan to make a collation case sensitive or accent sensitive? 
>>  I have previously commented that this is not a realistic view on how 
>> collations work.  Since you are apparently planning to use the system 
>> locales, I don't see how you can make this work.
> 
> While it's true POSIX locales don't handle this, other collation
> libraries do and we should support them if the user wants.

Do they handle exactly those two attributes specifically?  Can you point 
out references?  Or do you mean, other collation libraries allow their 
collations to be configured/customized?  I think linguistically it is a 
very narrow view of the world to hardcode those two attributes.


Re: WIP patch: Collation support

From
Martijn van Oosterhout
Date:
On Tue, Sep 02, 2008 at 04:46:16PM +0300, Peter Eisentraut wrote:
> >While it's true POSIX locales don't handle this, other collation
> >libraries do and we should support them if the user wants.
>
> Do they handle exactly those two attributes specifically?  Can you point
> out references?  Or do you mean, other collation libraries allow their
> collations to be configured/customized?  I think linguistically it is a
> very narrow view of the world to hardcode those two attributes.

Well, yes. Accents and case are attributes of a character. (I'm using
the unicode model here). So, to do a case insensetive match you take
the characters, strip the attributes and then do the comparison. There
are specialised routines which handle the denormalisation of the string
for you so in theory you could even get specific about which accents
you ignore. In practice I don't think people do that.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: WIP patch: Collation support

From
Gregory Stark
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:

> On Tue, Sep 02, 2008 at 04:46:16PM +0300, Peter Eisentraut wrote:
>> >While it's true POSIX locales don't handle this, other collation
>> >libraries do and we should support them if the user wants.

I think that's backwards. We have to go with the lowest common denominator
functionality of those libraries if we're going to be portable. As long as
it's a superset of the SQL standard functionality. If we support features of
some of them that can't be emulated with others then users end up with SQL
code that will only work on some builds and not others. That might be worth it
for some features but I'm not sure this is one.

> Well, yes. Accents and case are attributes of a character. (I'm using
> the unicode model here). So, to do a case insensetive match you take
> the characters, strip the attributes and then do the comparison. There
> are specialised routines which handle the denormalisation of the string
> for you so in theory you could even get specific about which accents
> you ignore. In practice I don't think people do that.

I don't think composable unicode characters are really about collations. I
think it had more to do with representing glyphs in UTF32 before they gave up
on that. Does anyone still use composable characters?

Note that we don't currently support composable characters at all. I'm not
sure if that's a "nobody really cares" issue or a bug we should aim to fix
with real collation support.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication
support!


Re: WIP patch: Collation support

From
Martijn van Oosterhout
Date:
On Tue, Sep 02, 2008 at 05:42:13PM +0100, Gregory Stark wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > On Tue, Sep 02, 2008 at 04:46:16PM +0300, Peter Eisentraut wrote:
> >> >While it's true POSIX locales don't handle this, other collation
> >> >libraries do and we should support them if the user wants.
>
> I think that's backwards. We have to go with the lowest common denominator
> functionality of those libraries if we're going to be portable.

And I think that's backwards. Why can we only use a feature once every
OS out there implements it? We still run on systems that don't have SSL
support. LC_TYPE settings are not portable between systems, yet that
doesn't bother anyone. Why should we have a problem with collate
settings not being portable?

> I don't think composable unicode characters are really about collations. I
> think it had more to do with representing glyphs in UTF32 before they gave up
> on that. Does anyone still use composable characters?

Lookup the various normalisations forms:
http://en.wikipedia.org/wiki/Unicode_normalization
In particular Normal Form D.

Sure, composable characters have nothing to do with collation, but they
provide a uniform way of doing accent insensetive collation.

> Note that we don't currently support composable characters at all.

Any character which is an accent on a latin character is a decomposable
character. And last I checked we supported those.

Have a niceday,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: WIP patch: Collation support

From
Alvaro Herrera
Date:
Radek Strnad escribió:
> Ok, so do you suggest to leave it with a notice "reindex database" or start
> to solve it somehow?

I don't know.  If there are two tasks that need the same treatment, it
seems a safe conclusion that they need a common solution.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: WIP patch: Collation support

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> And I think that's backwards. Why can we only use a feature once every
> OS out there implements it? We still run on systems that don't have SSL
> support. LC_TYPE settings are not portable between systems, yet that
> doesn't bother anyone. Why should we have a problem with collate
> settings not being portable?

If we're going to approach it that way, we need a syntax for CREATE
COLLATION that doesn't hard-wire what the possible collation modifiers
are.
        regards, tom lane


Re: WIP patch: Collation support

From
Heikki Linnakangas
Date:
Radek Strnad wrote:
> Progress so far:
> - created catalogs pg_collation a pg_charset which are filled with three
> standard collations
> - initdb changes rows called "DEFAULT" in both catalogs during the bki
> bootstrap phase with current system LC_COLLATE and LC_CTYPE or those set by
> command line.
> - new collations can be defined with command CREATE COLLATION <collation
> name> FOR <character set specification>  FROM <existing collation name>
> [STRCOLFN <fn name>]
> [ <pad characteristic> ] [ <case sensitive> ] [ LCCOLLATE <lc_collate> ] [
> LCCTYPE <lc_ctype> ]
> - because of pg_collation and pg_charset are catalogs individual for each
> database, if you want to create a database with collation other than
> specified, create it in template1 and then create database

I have to wonder, is all that really necessary? The feature you're 
trying to implement is to support database-level collation at first, and 
perhaps column-level collation later. We don't need support for 
user-defined collations and charsets for that.

If leave all that out of the patch for now, we'll have a much slimmer, 
and just as useful patch, implementing database-level collation. We can 
add those catalogs later if we need them, but I don't think there's much 
point in adding all that infrastructure if they just reflect the locales 
installed in the operating system.

> - when connecting to database, it retrieves locales from pg_database and
> sets them

This is the real gist of this patch.

> Design & functionality changes left:
> - move retrieveing collation from pg_database to pg_type

I don't understand this item. What will you move?

> - get case sensitivity and pad characteristic working

I feel we should leave this to the collation implementation.

> - when creating database with different collation than database cluster, the
> database has to be reindexed. Any idea how to do it? Function
> ReindexDatabase works only when database is opened.

That's a tricky one. One idea is to prohibit choosing a different 
collation than the one in the template database, unless we know it's 
safe to do so without reindexing. The problem is that we don't know 
whether it's safe. A simple but limiting solution would be to require 
that the template database has the same collation as the database that's 
being created, except that template0 can always be used as template. 
template0 is safe, because there's no indexes on text columns there.

Note that we already have the same problem with encodings. If you create 
a database with LATIN1 encoding, load it with data, and then use that as 
a template for a database with UTF-8 encoding, the text data will be 
incorrectly encoded. We should probably fix that too.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: WIP patch: Collation support

From
Martijn van Oosterhout
Date:
On Wed, Sep 10, 2008 at 11:29:14AM +0300, Heikki Linnakangas wrote:
> Radek Strnad wrote:
> >- because of pg_collation and pg_charset are catalogs individual for each
> >database, if you want to create a database with collation other than
> >specified, create it in template1 and then create database
>
> I have to wonder, is all that really necessary? The feature you're
> trying to implement is to support database-level collation at first, and
> perhaps column-level collation later. We don't need support for
> user-defined collations and charsets for that.

Since the set of collations isn't exactly denumerable, we need some way
to allow the user to specify the collation they want. The only
collation PostgreSQL knows about is the C collation. Anything else is
user-defined.

> >Design & functionality changes left:
> >- move retrieveing collation from pg_database to pg_type
>
> I don't understand this item. What will you move?

Long term, the collation is a property of the type, but I agree, I'm not
sure why this patch needs it.

> That's a tricky one. One idea is to prohibit choosing a different
> collation than the one in the template database, unless we know it's
> safe to do so without reindexing.

But that put us back where we started: every database having the same
collation. We're trying to move away from that. Just reindex everything
and be done with it.

> Note that we already have the same problem with encodings. If you create
> a database with LATIN1 encoding, load it with data, and then use that as
> a template for a database with UTF-8 encoding, the text data will be
> incorrectly encoded. We should probably fix that too.

I'd say forbid more than one encoding in a cluster, but that's just my
opinion :)

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: WIP patch: Collation support

From
Heikki Linnakangas
Date:
Martijn van Oosterhout wrote:
> On Wed, Sep 10, 2008 at 11:29:14AM +0300, Heikki Linnakangas wrote:
>> Radek Strnad wrote:
>>> - because of pg_collation and pg_charset are catalogs individual for each
>>> database, if you want to create a database with collation other than
>>> specified, create it in template1 and then create database
>> I have to wonder, is all that really necessary? The feature you're 
>> trying to implement is to support database-level collation at first, and 
>> perhaps column-level collation later. We don't need support for 
>> user-defined collations and charsets for that.
> 
> Since the set of collations isn't exactly denumerable, we need some way
> to allow the user to specify the collation they want. The only
> collation PostgreSQL knows about is the C collation. Anything else is
> user-defined.

Let's just use the name of the OS locale, like we do now. Having a 
pg_collation catalog just moves the problem elsewhere: we'd still need 
something in pg_collation to tie the collation to the OS locale.

>>> Design & functionality changes left:
>>> - move retrieveing collation from pg_database to pg_type
>> I don't understand this item. What will you move?
> 
> Long term, the collation is a property of the type, ...

You might want to provide a default collation for a type as well, but 
the very finest grade is that you can specify collation for every (text) 
comparison operator in your query. Of course you don't want to do that 
for every query, which is why we should provide defaults at different 
levels: columns, tables, database. And perhaps types as well, but that's 
not the most interesting case.

I'm not sure what the SQL spec says about that, but I believe it 
provides syntax and rules for all that.

>> That's a tricky one. One idea is to prohibit choosing a different 
>> collation than the one in the template database, unless we know it's 
>> safe to do so without reindexing.
> 
> But that put us back where we started: every database having the same
> collation. We're trying to move away from that. Just reindex everything
> and be done with it.

That's easier said than done, unfortunately.

>> Note that we already have the same problem with encodings. If you create 
>> a database with LATIN1 encoding, load it with data, and then use that as 
>> a template for a database with UTF-8 encoding, the text data will be 
>> incorrectly encoded. We should probably fix that too.
> 
> I'd say forbid more than one encoding in a cluster, but that's just my
> opinion :)

Yeah, that's pretty useless, at least without support for different 
locales on different databases. But might as well keep it unless there's 
a pressing reason to drop it.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: WIP patch: Collation support

From
Zdenek Kotala
Date:
Heikki Linnakangas napsal(a):

>> Design & functionality changes left:
>> - move retrieveing collation from pg_database to pg_type
> 

The problem there is that pg_collation is local catalog, but pg_database 
is global catalog. IIRC, It was discussed during last commitfest. I 
think it is bad idea to make this kind of dependency. It seems to me 
better to implement something what we will use later to avoid useless 
and confusing dependency.
    Zdenek



Re: WIP patch: Collation support

From
Martijn van Oosterhout
Date:
On Wed, Sep 10, 2008 at 12:51:02PM +0300, Heikki Linnakangas wrote:
> >Since the set of collations isn't exactly denumerable, we need some way
> >to allow the user to specify the collation they want. The only
> >collation PostgreSQL knows about is the C collation. Anything else is
> >user-defined.
>
> Let's just use the name of the OS locale, like we do now. Having a
> pg_collation catalog just moves the problem elsewhere: we'd still need
> something in pg_collation to tie the collation to the OS locale.

There's not a one-to-one mapping between collation and locale name. A
locale name includes information about the charset and a collation may
have paramters like case-sensetivity and pad-attribute which are not
present in the locale name. You need a mapping anyway, which is what
this table is for.

The difference in collation between CHAR() and VARCHAR() is the usual
example here.

> >Long term, the collation is a property of the type, ...
>
> I'm not sure what the SQL spec says about that, but I believe it
> provides syntax and rules for all that.

The spec is quite detailed about and I posted code to do it years ago.
The point is that we don't need to go that far with this patch.

> >But that put us back where we started: every database having the same
> >collation. We're trying to move away from that. Just reindex everything
> >and be done with it.
>
> That's easier said than done, unfortunately.

I don't see an alternative.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: WIP patch: Collation support

From
Heikki Linnakangas
Date:
Martijn van Oosterhout wrote:
> On Wed, Sep 10, 2008 at 12:51:02PM +0300, Heikki Linnakangas wrote:
>>> Since the set of collations isn't exactly denumerable, we need some way
>>> to allow the user to specify the collation they want. The only
>>> collation PostgreSQL knows about is the C collation. Anything else is
>>> user-defined.
>> Let's just use the name of the OS locale, like we do now. Having a
>> pg_collation catalog just moves the problem elsewhere: we'd still need
>> something in pg_collation to tie the collation to the OS locale.
>
> There's not a one-to-one mapping between collation and locale name. A
> locale name includes information about the charset and a collation may
> have paramters like case-sensetivity and pad-attribute which are not
> present in the locale name. You need a mapping anyway, which is what
> this table is for.

Ideally, we would delegate the case-sensitivity and padding to the
collation implementation (ie. OS setlocale() or ICU). That said, I don't
think operating systems normally ship case-insensitive variants of
locales by default, so I agree it would be nice if we could implement
that ourselves. Still, we could identify case-sensitive locale names for
example by a suffix, like "en_GB.UTF8.case-insensitive".

I agree we will eventually need a way to give shorthand names for
collations, and a pg_collation catalog will then come handy. But that
can wait until we have the basic infrastructure ready to support column
and query-level collation.

>>> But that put us back where we started: every database having the same
>>> collation. We're trying to move away from that. Just reindex everything
>>> and be done with it.
>> That's easier said than done, unfortunately.
>
> I don't see an alternative.

Well, I proposed disallowing using a different collation than the source
database, except for using template0 as the source. That's pretty
limited, but is trivial to implement and still let's you have databases
with different collations in the same cluster.

I worked a bit on Radek's patch, stripping out all the pg_collate and
pg_charset catalog changes and commands, leaving just the core
functionality of database-level collations. It needs some cleanup and
documentation, but something like this I'd like to commit in this commit
fest. The new catalogs can wait until we have a real need for them.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
*** src/backend/access/transam/xlog.c
--- src/backend/access/transam/xlog.c
***************
*** 3847,3853 **** WriteControlFile(void)
  {
      int            fd;
      char        buffer[PG_CONTROL_SIZE];        /* need not be aligned */
-     char       *localeptr;

      /*
       * Initialize version and compatibility-check fields
--- 3847,3852 ----
***************
*** 3875,3893 **** WriteControlFile(void)
  #endif
      ControlFile->float4ByVal = FLOAT4PASSBYVAL;
      ControlFile->float8ByVal = FLOAT8PASSBYVAL;
!
!     ControlFile->localeBuflen = LOCALE_NAME_BUFLEN;
!     localeptr = setlocale(LC_COLLATE, NULL);
!     if (!localeptr)
!         ereport(PANIC,
!                 (errmsg("invalid LC_COLLATE setting")));
!     StrNCpy(ControlFile->lc_collate, localeptr, LOCALE_NAME_BUFLEN);
!     localeptr = setlocale(LC_CTYPE, NULL);
!     if (!localeptr)
!         ereport(PANIC,
!                 (errmsg("invalid LC_CTYPE setting")));
!     StrNCpy(ControlFile->lc_ctype, localeptr, LOCALE_NAME_BUFLEN);
!
      /* Contents are protected with a CRC */
      INIT_CRC32(ControlFile->crc);
      COMP_CRC32(ControlFile->crc,
--- 3874,3880 ----
  #endif
      ControlFile->float4ByVal = FLOAT4PASSBYVAL;
      ControlFile->float8ByVal = FLOAT8PASSBYVAL;
!
      /* Contents are protected with a CRC */
      INIT_CRC32(ControlFile->crc);
      COMP_CRC32(ControlFile->crc,
***************
*** 4126,4159 **** ReadControlFile(void)
                             " but the server was compiled without USE_FLOAT8_BYVAL."),
                   errhint("It looks like you need to recompile or initdb.")));
  #endif
-
-     if (ControlFile->localeBuflen != LOCALE_NAME_BUFLEN)
-         ereport(FATAL,
-                 (errmsg("database files are incompatible with server"),
-                  errdetail("The database cluster was initialized with LOCALE_NAME_BUFLEN %d,"
-                   " but the server was compiled with LOCALE_NAME_BUFLEN %d.",
-                            ControlFile->localeBuflen, LOCALE_NAME_BUFLEN),
-                  errhint("It looks like you need to recompile or initdb.")));
-     if (pg_perm_setlocale(LC_COLLATE, ControlFile->lc_collate) == NULL)
-         ereport(FATAL,
-             (errmsg("database files are incompatible with operating system"),
-              errdetail("The database cluster was initialized with LC_COLLATE \"%s\","
-                        " which is not recognized by setlocale().",
-                        ControlFile->lc_collate),
-              errhint("It looks like you need to initdb or install locale support.")));
-     if (pg_perm_setlocale(LC_CTYPE, ControlFile->lc_ctype) == NULL)
-         ereport(FATAL,
-             (errmsg("database files are incompatible with operating system"),
-         errdetail("The database cluster was initialized with LC_CTYPE \"%s\","
-                   " which is not recognized by setlocale().",
-                   ControlFile->lc_ctype),
-              errhint("It looks like you need to initdb or install locale support.")));
-
-     /* Make the fixed locale settings visible as GUC variables, too */
-     SetConfigOption("lc_collate", ControlFile->lc_collate,
-                     PGC_INTERNAL, PGC_S_OVERRIDE);
-     SetConfigOption("lc_ctype", ControlFile->lc_ctype,
-                     PGC_INTERNAL, PGC_S_OVERRIDE);
  }

  void
--- 4113,4118 ----
*** src/backend/commands/dbcommands.c
--- src/backend/commands/dbcommands.c
***************
*** 69,75 **** static bool get_db_info(const char *name, LOCKMODE lockmode,
              Oid *dbIdP, Oid *ownerIdP,
              int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
              Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
!             Oid *dbTablespace);
  static bool have_createdb_privilege(void);
  static void remove_dbtablespaces(Oid db_id);
  static bool check_db_file_conflict(Oid db_id);
--- 69,75 ----
              Oid *dbIdP, Oid *ownerIdP,
              int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
              Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
!             Oid *dbTablespace, char **dbCollation, char **dbCtype);
  static bool have_createdb_privilege(void);
  static void remove_dbtablespaces(Oid db_id);
  static bool check_db_file_conflict(Oid db_id);
***************
*** 87,92 **** createdb(const CreatedbStmt *stmt)
--- 87,94 ----
      Oid            src_dboid;
      Oid            src_owner;
      int            src_encoding;
+     char       *src_collation;
+     char       *src_ctype;
      bool        src_istemplate;
      bool        src_allowconn;
      Oid            src_lastsysoid;
***************
*** 104,113 **** createdb(const CreatedbStmt *stmt)
--- 106,119 ----
      DefElem    *downer = NULL;
      DefElem    *dtemplate = NULL;
      DefElem    *dencoding = NULL;
+     DefElem    *dcollation = NULL;
+     DefElem    *dctype = NULL;
      DefElem    *dconnlimit = NULL;
      char       *dbname = stmt->dbname;
      char       *dbowner = NULL;
      const char *dbtemplate = NULL;
+     char       *lc_collate = NULL;
+     char       *lc_ctype = NULL;
      int            encoding = -1;
      int            dbconnlimit = -1;
      int            ctype_encoding;
***************
*** 152,157 **** createdb(const CreatedbStmt *stmt)
--- 158,179 ----
                           errmsg("conflicting or redundant options")));
              dencoding = defel;
          }
+         else if (strcmp(defel->defname, "collate") == 0)
+         {
+             if (dcollation)
+                 ereport(ERROR,
+                         (errcode(ERRCODE_SYNTAX_ERROR),
+                          errmsg("conflicting or redundant options")));
+             dcollation = defel;
+         }
+         else if (strcmp(defel->defname, "ctype") == 0)
+         {
+             if (dctype)
+                 ereport(ERROR,
+                         (errcode(ERRCODE_SYNTAX_ERROR),
+                          errmsg("conflicting or redundant options")));
+             dctype = defel;
+         }
          else if (strcmp(defel->defname, "connectionlimit") == 0)
          {
              if (dconnlimit)
***************
*** 205,210 **** createdb(const CreatedbStmt *stmt)
--- 227,247 ----
              elog(ERROR, "unrecognized node type: %d",
                   nodeTag(dencoding->arg));
      }
+     /*
+     if (dlc_collate && dlc_collate->arg) {
+         lc_collate = strVal(dlc_collate->arg);
+         if ((locale_collate_assign(lc_collate, false, (GucSource)NULL)) == NULL)
+             ereport(ERROR,
+                 (errcode(ERRCODE_UNDEFINED_OBJECT),
+                 errmsg("%s is not a valid LC_COLLATE name",
+                         lc_collate)));
+     }
+     */
+     if (dcollation && dcollation->arg)
+         lc_collate = strVal(dcollation->arg);
+     if (dctype && dctype->arg)
+         lc_ctype = strVal(dctype->arg);
+
      if (dconnlimit && dconnlimit->arg)
          dbconnlimit = intVal(dconnlimit->arg);

***************
*** 243,249 **** createdb(const CreatedbStmt *stmt)
      if (!get_db_info(dbtemplate, ShareLock,
                       &src_dboid, &src_owner, &src_encoding,
                       &src_istemplate, &src_allowconn, &src_lastsysoid,
!                      &src_frozenxid, &src_deftablespace))
          ereport(ERROR,
                  (errcode(ERRCODE_UNDEFINED_DATABASE),
                   errmsg("template database \"%s\" does not exist",
--- 280,286 ----
      if (!get_db_info(dbtemplate, ShareLock,
                       &src_dboid, &src_owner, &src_encoding,
                       &src_istemplate, &src_allowconn, &src_lastsysoid,
!                      &src_frozenxid, &src_deftablespace, &src_collation, &src_ctype))
          ereport(ERROR,
                  (errcode(ERRCODE_UNDEFINED_DATABASE),
                   errmsg("template database \"%s\" does not exist",
***************
*** 305,310 **** createdb(const CreatedbStmt *stmt)
--- 342,359 ----
               errdetail("The server's LC_CTYPE setting requires encoding %s.",
                         pg_encoding_to_char(ctype_encoding))));

+     /*
+      * Message about reindexing new database
+      *
+      * We know that template0 doesn't contain any indexes that depend on
+      * collation or ctype.
+      */
+     if (strcmp(dbtemplate, "template0") != 0 &&
+         (strcmp(lc_collate, src_collation) || strcmp(lc_ctype, src_ctype)))
+         ereport(NOTICE,
+                 (errmsg("database \"%s\" needs to be reindexed manually (REINDEX DATABASE)",
+                         dbname)));
+
      /* Resolve default tablespace for new database */
      if (dtablespacename && dtablespacename->arg)
      {
***************
*** 369,374 **** createdb(const CreatedbStmt *stmt)
--- 418,432 ----
          /* Note there is no additional permission check in this path */
      }

+     /*
+      * If collation is specified for database, use it, otherwise inherit
+      * database cluster's collation.
+      */
+     if (lc_collate == NULL)
+         lc_collate = src_collation;
+     if (lc_ctype == NULL)
+         lc_ctype = src_ctype;
+
      /*
       * Check for db name conflict.    This is just to give a more friendly error
       * message than "unique index violation".  There's a race condition but
***************
*** 421,426 **** createdb(const CreatedbStmt *stmt)
--- 479,486 ----
          DirectFunctionCall1(namein, CStringGetDatum(dbname));
      new_record[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(datdba);
      new_record[Anum_pg_database_encoding - 1] = Int32GetDatum(encoding);
+     new_record[Anum_pg_database_collation - 1] = DirectFunctionCall1(namein, CStringGetDatum(lc_collate));
+     new_record[Anum_pg_database_ctype - 1] = DirectFunctionCall1(namein, CStringGetDatum(lc_ctype));
      new_record[Anum_pg_database_datistemplate - 1] = BoolGetDatum(false);
      new_record[Anum_pg_database_datallowconn - 1] = BoolGetDatum(true);
      new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit);
***************
*** 629,635 **** dropdb(const char *dbname, bool missing_ok)
      pgdbrel = heap_open(DatabaseRelationId, RowExclusiveLock);

      if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
!                      &db_istemplate, NULL, NULL, NULL, NULL))
      {
          if (!missing_ok)
          {
--- 689,695 ----
      pgdbrel = heap_open(DatabaseRelationId, RowExclusiveLock);

      if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
!                      &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL))
      {
          if (!missing_ok)
          {
***************
*** 781,787 **** RenameDatabase(const char *oldname, const char *newname)
      rel = heap_open(DatabaseRelationId, RowExclusiveLock);

      if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
!                      NULL, NULL, NULL, NULL, NULL))
          ereport(ERROR,
                  (errcode(ERRCODE_UNDEFINED_DATABASE),
                   errmsg("database \"%s\" does not exist", oldname)));
--- 841,847 ----
      rel = heap_open(DatabaseRelationId, RowExclusiveLock);

      if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
!                      NULL, NULL, NULL, NULL, NULL, NULL, NULL))
          ereport(ERROR,
                  (errcode(ERRCODE_UNDEFINED_DATABASE),
                   errmsg("database \"%s\" does not exist", oldname)));
***************
*** 1168,1174 **** get_db_info(const char *name, LOCKMODE lockmode,
              Oid *dbIdP, Oid *ownerIdP,
              int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
              Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
!             Oid *dbTablespace)
  {
      bool        result = false;
      Relation    relation;
--- 1228,1234 ----
              Oid *dbIdP, Oid *ownerIdP,
              int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
              Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
!             Oid *dbTablespace, char **dbCollation, char **dbCtype)
  {
      bool        result = false;
      Relation    relation;
***************
*** 1259,1264 **** get_db_info(const char *name, LOCKMODE lockmode,
--- 1319,1329 ----
                  /* default tablespace for this database */
                  if (dbTablespace)
                      *dbTablespace = dbform->dattablespace;
+                  /* default locale settings for this database */
+                  if (dbCollation)
+                      *dbCollation = pstrdup(NameStr(dbform->collation));
+                  if (dbCtype)
+                      *dbCtype = pstrdup(NameStr(dbform->ctype));
                  ReleaseSysCache(tuple);
                  result = true;
                  break;
*** src/backend/parser/gram.y
--- src/backend/parser/gram.y
***************
*** 398,404 **** static TypeName *TableFuncTypeName(List *columns);
      CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
      COMMITTED CONCURRENTLY CONFIGURATION CONNECTION CONSTRAINT CONSTRAINTS
      CONTENT_P CONTINUE_P CONVERSION_P COPY COST CREATE CREATEDB
!     CREATEROLE CREATEUSER CROSS CSV CURRENT_P CURRENT_DATE CURRENT_ROLE
      CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE

      DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
--- 398,404 ----
      CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
      COMMITTED CONCURRENTLY CONFIGURATION CONNECTION CONSTRAINT CONSTRAINTS
      CONTENT_P CONTINUE_P CONVERSION_P COPY COST CREATE CREATEDB
!     CREATEROLE CREATEUSER CROSS CSV CTYPE CURRENT_P CURRENT_DATE CURRENT_ROLE
      CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE

      DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
***************
*** 5458,5463 **** createdb_opt_item:
--- 5458,5479 ----
                  {
                      $$ = makeDefElem("encoding", NULL);
                  }
+             | COLLATE opt_equal Sconst
+                 {
+                     $$ = makeDefElem("collate", (Node *)makeString($3));
+                 }
+             | COLLATE opt_equal DEFAULT
+                 {
+                     $$ = makeDefElem("collate", NULL);
+                 }
+             | CTYPE opt_equal Sconst
+                 {
+                     $$ = makeDefElem("ctype", (Node *)makeString($3));
+                 }
+             | CTYPE opt_equal DEFAULT
+                 {
+                     $$ = makeDefElem("ctype", NULL);
+                 }
              | CONNECTION LIMIT opt_equal SignedIconst
                  {
                      $$ = makeDefElem("connectionlimit", (Node *)makeInteger($4));
***************
*** 9216,9221 **** unreserved_keyword:
--- 9232,9238 ----
              | CREATEROLE
              | CREATEUSER
              | CSV
+             | CTYPE
              | CURRENT_P
              | CURSOR
              | CYCLE
*** src/backend/parser/keywords.c
--- src/backend/parser/keywords.c
***************
*** 114,119 **** const ScanKeyword ScanKeywords[] = {
--- 114,120 ----
      {"createuser", CREATEUSER, UNRESERVED_KEYWORD},
      {"cross", CROSS, TYPE_FUNC_NAME_KEYWORD},
      {"csv", CSV, UNRESERVED_KEYWORD},
+     {"ctype", CTYPE, UNRESERVED_KEYWORD},
      {"current", CURRENT_P, UNRESERVED_KEYWORD},
      {"current_date", CURRENT_DATE, RESERVED_KEYWORD},
      {"current_role", CURRENT_ROLE, RESERVED_KEYWORD},
*** src/backend/utils/adt/pg_locale.c
--- src/backend/utils/adt/pg_locale.c
***************
*** 228,233 **** locale_xxx_assign(int category, const char *value, bool doit, GucSource source)
--- 228,245 ----
      return value;
  }

+ const char *
+ locale_collate_assign(const char *value, bool doit, GucSource source)
+ {
+     return locale_xxx_assign(LC_COLLATE, value, doit, source);
+ }
+
+ const char *
+ locale_ctype_assign(const char *value, bool doit, GucSource source)
+ {
+     return locale_xxx_assign(LC_CTYPE, value, doit, source);
+ }
+

  const char *
  locale_monetary_assign(const char *value, bool doit, GucSource source)
*** src/backend/utils/init/postinit.c
--- src/backend/utils/init/postinit.c
***************
*** 159,164 **** CheckMyDatabase(const char *name, bool am_superuser)
--- 159,166 ----
  {
      HeapTuple    tup;
      Form_pg_database dbform;
+     char       *collate;
+     char       *ctype;

      /* Fetch our real pg_database row */
      tup = SearchSysCache(DATABASEOID,
***************
*** 240,245 **** CheckMyDatabase(const char *name, bool am_superuser)
--- 242,270 ----
      /* If we have no other source of client_encoding, use server encoding */
      SetConfigOption("client_encoding", GetDatabaseEncodingName(),
                      PGC_BACKEND, PGC_S_DEFAULT);
+
+     /* assign locale variables */
+     collate = NameStr(dbform->collation);
+     ctype = NameStr(dbform->ctype);
+
+     if (setlocale(LC_COLLATE, collate) == NULL)
+         ereport(FATAL,
+             (errmsg("database local is incompatible with OS"),
+             errdetail("The database was initialized with LC_COLLATE \"%s\", "
+                         " which is not recognized by setlocale().", collate),
+             errhint("Try to recreate the database or install locale support.")));
+
+     if (setlocale(LC_CTYPE, ctype) == NULL)
+         ereport(FATAL,
+             (errmsg("database local is incompatible with OS"),
+             errdetail("The database was initialized with LC_CTYPE \"%s\", "
+                         " which is not recognized by setlocale().", ctype),
+             errhint("Try to recreate the database or install locale support.")));
+
+     /* Record it as a GUC internal option, too */
+
+     SetConfigOption("lc_collate", collate, PGC_INTERNAL, PGC_S_DATABASE);
+     SetConfigOption("lc_ctype", ctype, PGC_INTERNAL, PGC_S_DATABASE);

      /*
       * Lastly, set up any database-specific configuration variables.
*** src/bin/initdb/initdb.c
--- src/bin/initdb/initdb.c
***************
*** 1171,1176 **** setup_config(void)
--- 1171,1184 ----
      conflines = replace_token(conflines, "#port = 5432", repltok);
  #endif

+     snprintf(repltok, sizeof(repltok), "lc_collate = '%s'",
+              escape_quotes(lc_collate));
+     conflines = replace_token(conflines, "#lc_collate = 'C'", repltok);
+
+     snprintf(repltok, sizeof(repltok), "lc_ctype = '%s'",
+              escape_quotes(lc_collate));
+     conflines = replace_token(conflines, "#lc_ctype = 'C'", repltok);
+
      snprintf(repltok, sizeof(repltok), "lc_messages = '%s'",
               escape_quotes(lc_messages));
      conflines = replace_token(conflines, "#lc_messages = 'C'", repltok);
***************
*** 1353,1358 **** bootstrap_template1(char *short_version)
--- 1361,1370 ----

      bki_lines = replace_token(bki_lines, "ENCODING", encodingid);

+     bki_lines = replace_token(bki_lines, "LC_COLLATE", lc_collate);
+
+     bki_lines = replace_token(bki_lines, "LC_CTYPE", lc_ctype);
+
      /*
       * Pass correct LC_xxx environment to bootstrap.
       *
***************
*** 2806,2815 **** main(int argc, char *argv[])
          strcmp(lc_ctype, lc_numeric) == 0 &&
          strcmp(lc_ctype, lc_monetary) == 0 &&
          strcmp(lc_ctype, lc_messages) == 0)
!         printf(_("The database cluster will be initialized with locale %s.\n"), lc_ctype);
      else
      {
!         printf(_("The database cluster will be initialized with locales\n"
                   "  COLLATE:  %s\n"
                   "  CTYPE:    %s\n"
                   "  MESSAGES: %s\n"
--- 2818,2827 ----
          strcmp(lc_ctype, lc_numeric) == 0 &&
          strcmp(lc_ctype, lc_monetary) == 0 &&
          strcmp(lc_ctype, lc_messages) == 0)
!         printf(_("The database template1 will be initialized with locale %s.\n"), lc_ctype);
      else
      {
!         printf(_("The database template1 will be initialized with locales\n"
                   "  COLLATE:  %s\n"
                   "  CTYPE:    %s\n"
                   "  MESSAGES: %s\n"
*** src/bin/pg_controldata/pg_controldata.c
--- src/bin/pg_controldata/pg_controldata.c
***************
*** 220,231 **** main(int argc, char *argv[])
             (ControlFile.float4ByVal ? _("by value") : _("by reference")));
      printf(_("Float8 argument passing:              %s\n"),
             (ControlFile.float8ByVal ? _("by value") : _("by reference")));
-     printf(_("Maximum length of locale name:        %u\n"),
-            ControlFile.localeBuflen);
-     printf(_("LC_COLLATE:                           %s\n"),
-            ControlFile.lc_collate);
-     printf(_("LC_CTYPE:                             %s\n"),
-            ControlFile.lc_ctype);
-
      return 0;
  }
--- 220,224 ----
*** src/bin/pg_resetxlog/pg_resetxlog.c
--- src/bin/pg_resetxlog/pg_resetxlog.c
***************
*** 493,514 **** GuessControlValues(void)
  #endif
      ControlFile.float4ByVal = FLOAT4PASSBYVAL;
      ControlFile.float8ByVal = FLOAT8PASSBYVAL;
-     ControlFile.localeBuflen = LOCALE_NAME_BUFLEN;
-
-     localeptr = setlocale(LC_COLLATE, "");
-     if (!localeptr)
-     {
-         fprintf(stderr, _("%s: invalid LC_COLLATE setting\n"), progname);
-         exit(1);
-     }
-     strlcpy(ControlFile.lc_collate, localeptr, sizeof(ControlFile.lc_collate));
-     localeptr = setlocale(LC_CTYPE, "");
-     if (!localeptr)
-     {
-         fprintf(stderr, _("%s: invalid LC_CTYPE setting\n"), progname);
-         exit(1);
-     }
-     strlcpy(ControlFile.lc_ctype, localeptr, sizeof(ControlFile.lc_ctype));

      /*
       * XXX eventually, should try to grovel through old XLOG to develop more
--- 493,498 ----
***************
*** 584,595 **** PrintControlValues(bool guessed)
             (ControlFile.float4ByVal ? _("by value") : _("by reference")));
      printf(_("Float8 argument passing:              %s\n"),
             (ControlFile.float8ByVal ? _("by value") : _("by reference")));
-     printf(_("Maximum length of locale name:        %u\n"),
-            ControlFile.localeBuflen);
-     printf(_("LC_COLLATE:                           %s\n"),
-            ControlFile.lc_collate);
-     printf(_("LC_CTYPE:                             %s\n"),
-            ControlFile.lc_ctype);
  }


--- 568,573 ----
*** src/bin/scripts/createdb.c
--- src/bin/scripts/createdb.c
***************
*** 32,37 **** main(int argc, char *argv[])
--- 32,39 ----
          {"tablespace", required_argument, NULL, 'D'},
          {"template", required_argument, NULL, 'T'},
          {"encoding", required_argument, NULL, 'E'},
+         {"lc-collate", required_argument, NULL, 1},
+         {"lc-ctype", required_argument, NULL, 2},
          {NULL, 0, NULL, 0}
      };

***************
*** 50,55 **** main(int argc, char *argv[])
--- 52,59 ----
      char       *tablespace = NULL;
      char       *template = NULL;
      char       *encoding = NULL;
+     char       *lc_collate = NULL;
+     char       *lc_ctype = NULL;

      PQExpBufferData sql;

***************
*** 95,100 **** main(int argc, char *argv[])
--- 99,110 ----
              case 'E':
                  encoding = optarg;
                  break;
+             case 1:
+                 lc_collate = optarg;
+                 break;
+             case 2:
+                 lc_ctype = optarg;
+                 break;
              default:
                  fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
                  exit(1);
***************
*** 152,159 **** main(int argc, char *argv[])
          appendPQExpBuffer(&sql, " ENCODING '%s'", encoding);
      if (template)
          appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template));
      appendPQExpBuffer(&sql, ";\n");
!
      conn = connectDatabase(strcmp(dbname, "postgres") == 0 ? "template1" : "postgres",
                             host, port, username, password, progname);

--- 162,174 ----
          appendPQExpBuffer(&sql, " ENCODING '%s'", encoding);
      if (template)
          appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template));
+     if (lc_collate)
+         appendPQExpBuffer(&sql, " LCCOLLATE %s", fmtId(lc_collate));
+     if (lc_ctype)
+         appendPQExpBuffer(&sql, " LCCTYPE %s", fmtId(lc_ctype));
+
      appendPQExpBuffer(&sql, ";\n");
!
      conn = connectDatabase(strcmp(dbname, "postgres") == 0 ? "template1" : "postgres",
                             host, port, username, password, progname);

***************
*** 207,224 **** help(const char *progname)
      printf(_("Usage:\n"));
      printf(_("  %s [OPTION]... [DBNAME] [DESCRIPTION]\n"), progname);
      printf(_("\nOptions:\n"));
!     printf(_("  -D, --tablespace=TABLESPACE  default tablespace for the database\n"));
!     printf(_("  -E, --encoding=ENCODING      encoding for the database\n"));
!     printf(_("  -O, --owner=OWNER            database user to own the new database\n"));
!     printf(_("  -T, --template=TEMPLATE      template database to copy\n"));
!     printf(_("  -e, --echo                   show the commands being sent to the server\n"));
!     printf(_("  --help                       show this help, then exit\n"));
!     printf(_("  --version                    output version information, then exit\n"));
      printf(_("\nConnection options:\n"));
!     printf(_("  -h, --host=HOSTNAME          database server host or socket directory\n"));
!     printf(_("  -p, --port=PORT              database server port\n"));
!     printf(_("  -U, --username=USERNAME      user name to connect as\n"));
!     printf(_("  -W, --password               force password prompt\n"));
      printf(_("\nBy default, a database with the same name as the current user is created.\n"));
      printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));
  }
--- 222,240 ----
      printf(_("Usage:\n"));
      printf(_("  %s [OPTION]... [DBNAME] [DESCRIPTION]\n"), progname);
      printf(_("\nOptions:\n"));
!     printf(_("  -D, --tablespace=TABLESPACE      default tablespace for the database\n"));
!     printf(_("  -E, --encoding=ENCODING          encoding for the database\n"));
!     printf(_("    --lc-collate, --lc-ctype=LOCALE    locale for the database\n"));
!     printf(_("  -O, --owner=OWNER                database user to own the new database\n"));
!     printf(_("  -T, --template=TEMPLATE          template database to copy\n"));
!     printf(_("  -e, --echo                       show the commands being sent to the server\n"));
!     printf(_("  --help                           show this help, then exit\n"));
!     printf(_("  --version                        output version information, then exit\n"));
      printf(_("\nConnection options:\n"));
!     printf(_("  -h, --host=HOSTNAME              database server host or socket directory\n"));
!     printf(_("  -p, --port=PORT                  database server port\n"));
!     printf(_("  -U, --username=USERNAME          user name to connect as\n"));
!     printf(_("  -W, --password                   force password prompt\n"));
      printf(_("\nBy default, a database with the same name as the current user is created.\n"));
      printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));
  }
*** src/include/catalog/pg_control.h
--- src/include/catalog/pg_control.h
***************
*** 144,154 **** typedef struct ControlFileData
      bool        float4ByVal;    /* float4 pass-by-value? */
      bool        float8ByVal;    /* float8, int8, etc pass-by-value? */

-     /* active locales */
-     uint32        localeBuflen;
-     char        lc_collate[LOCALE_NAME_BUFLEN];
-     char        lc_ctype[LOCALE_NAME_BUFLEN];
-
      /* CRC of all above ... MUST BE LAST! */
      pg_crc32    crc;
  } ControlFileData;
--- 144,149 ----
*** src/include/catalog/pg_database.h
--- src/include/catalog/pg_database.h
***************
*** 33,38 **** CATALOG(pg_database,1262) BKI_SHARED_RELATION
--- 33,40 ----
      NameData    datname;        /* database name */
      Oid            datdba;            /* owner of database */
      int4        encoding;        /* character encoding */
+     NameData    collation;        /* LC_COLLATE of database */
+     NameData    ctype;            /* LC_CTYPE of database */
      bool        datistemplate;    /* allowed as CREATE DATABASE template? */
      bool        datallowconn;    /* new connections allowed? */
      int4        datconnlimit;    /* max connections allowed (-1=no limit) */
***************
*** 54,73 **** typedef FormData_pg_database *Form_pg_database;
   *        compiler constants for pg_database
   * ----------------
   */
! #define Natts_pg_database                11
  #define Anum_pg_database_datname        1
  #define Anum_pg_database_datdba            2
  #define Anum_pg_database_encoding        3
! #define Anum_pg_database_datistemplate    4
! #define Anum_pg_database_datallowconn    5
! #define Anum_pg_database_datconnlimit    6
! #define Anum_pg_database_datlastsysoid    7
! #define Anum_pg_database_datfrozenxid    8
! #define Anum_pg_database_dattablespace    9
! #define Anum_pg_database_datconfig        10
! #define Anum_pg_database_datacl            11

! DATA(insert OID = 1 (  template1 PGUID ENCODING t t -1 0 0 1663 _null_ _null_ ));
  SHDESCR("default template database");
  #define TemplateDbOid            1

--- 56,77 ----
   *        compiler constants for pg_database
   * ----------------
   */
! #define Natts_pg_database                13
  #define Anum_pg_database_datname        1
  #define Anum_pg_database_datdba            2
  #define Anum_pg_database_encoding        3
! #define Anum_pg_database_collation        4
! #define Anum_pg_database_ctype            5
! #define Anum_pg_database_datistemplate    6
! #define Anum_pg_database_datallowconn    7
! #define Anum_pg_database_datconnlimit    8
! #define Anum_pg_database_datlastsysoid    9
! #define Anum_pg_database_datfrozenxid    10
! #define Anum_pg_database_dattablespace    11
! #define Anum_pg_database_datconfig        12
! #define Anum_pg_database_datacl            13

! DATA(insert OID = 1 (  template1 PGUID ENCODING "LC_COLLATE" "LC_CTYPE" t t -1 0 0 1663 _null_ _null_));
  SHDESCR("default template database");
  #define TemplateDbOid            1

*** src/interfaces/ecpg/preproc/preproc.y
--- src/interfaces/ecpg/preproc/preproc.y
***************
*** 428,434 **** add_typedef(char *name, char * dimension, char * length, enum ECPGttype type_enu
      CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
      COMMITTED CONCURRENTLY CONFIGURATION CONNECTION CONSTRAINT CONSTRAINTS
      CONTENT_P CONTINUE_P CONVERSION_P COPY COST CREATE CREATEDB
!     CREATEROLE CREATEUSER CROSS CSV CURRENT_P CURRENT_DATE CURRENT_ROLE
      CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE

      DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
--- 428,434 ----
      CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
      COMMITTED CONCURRENTLY CONFIGURATION CONNECTION CONSTRAINT CONSTRAINTS
      CONTENT_P CONTINUE_P CONVERSION_P COPY COST CREATE CREATEDB
!     CREATEROLE CREATEUSER CROSS CSV CTYPE CURRENT_P CURRENT_DATE CURRENT_ROLE
      CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE

      DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS

Re: WIP patch: Collation support

From
Gregory Stark
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

> Well, I proposed disallowing using a different collation than the source
> database, except for using template0 as the source. That's pretty limited, but
> is trivial to implement and still let's you have databases with different
> collations in the same cluster.

+     if (strcmp(dbtemplate, "template0") != 0 &&
+         (strcmp(lc_collate, src_collation) || strcmp(lc_ctype, src_ctype)))
+         ereport(NOTICE,
+                 (errmsg("database \"%s\" needs to be reindexed manually (REINDEX DATABASE)",
+                         dbname)));        
+ 

This isn't what you described but I think I prefer it this way as just a
warning not an error. I can easily imagine cases where the admin knows there
are either no indexes or no data in their database or where they're perfectly
happy to reindex. A flat prohibition seems annoying.

That said it's worth noting that this would tie our hands with proposals like
retail vacuum which risk database corruption if they fail to refind an index
pointer for a tuple...

Incidentally it seems like the warning should actually explain *why* it needs
to be reindexed manually and perhaps what the consequences are until it is.

AFAIK we can't easily connect to the new database and do some fiddling with
it, can we? If we could we could check if there are any non-empty indexes
which depend on the collation and only print the warning if we find any (and
even mark them invalid).

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication
support!


Re: WIP patch: Collation support

From
Zdenek Kotala
Date:
Gregory Stark napsal(a):
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> 

<snip>

> AFAIK we can't easily connect to the new database and do some fiddling with
> it, can we? If we could we could check if there are any non-empty indexes
> which depend on the collation and only print the warning if we find any (and
> even mark them invalid).
> 

Autovacum uses InitPostgres function to swith to another database. I'm 
not sure how much safe it is in create database command and when we are 
already switched we can reindex affected indexes.
Zdenek


Re: WIP patch: Collation support

From
Zdenek Kotala
Date:
Heikki Linnakangas napsal(a):
> Martijn van Oosterhout wrote:
>> On Wed, Sep 10, 2008 at 12:51:02PM +0300, Heikki Linnakangas wrote:
>>>> Since the set of collations isn't exactly denumerable, we need some way
>>>> to allow the user to specify the collation they want. The only
>>>> collation PostgreSQL knows about is the C collation. Anything else is
>>>> user-defined.
>>> Let's just use the name of the OS locale, like we do now. Having a 
>>> pg_collation catalog just moves the problem elsewhere: we'd still 
>>> need something in pg_collation to tie the collation to the OS locale.
>>
>> There's not a one-to-one mapping between collation and locale name. A
>> locale name includes information about the charset and a collation may
>> have paramters like case-sensetivity and pad-attribute which are not
>> present in the locale name. You need a mapping anyway, which is what
>> this table is for.
> 
> Ideally, we would delegate the case-sensitivity and padding to the 
> collation implementation (ie. OS setlocale() or ICU). That said, I don't 
> think operating systems normally ship case-insensitive variants of 
> locales by default, so I agree it would be nice if we could implement 
> that ourselves. Still, we could identify case-sensitive locale names for 
> example by a suffix, like "en_GB.UTF8.case-insensitive".

The idea was to call to_upper (or to_lower) before case-sensitive 
collation processing. It is difficult to determine from suffix if it is 
sensitive or not.
    Zdenek

PS: We can discuss it in Prato


Re: WIP patch: Collation support

From
Alvaro Herrera
Date:
Zdenek Kotala escribió:
> Gregory Stark napsal(a):

>> AFAIK we can't easily connect to the new database and do some fiddling with
>> it, can we? If we could we could check if there are any non-empty indexes
>> which depend on the collation and only print the warning if we find any (and
>> even mark them invalid).
>
> Autovacum uses InitPostgres function to swith to another database. I'm  
> not sure how much safe it is in create database command and when we are  
> already switched we can reindex affected indexes.

It's only supposed to work if you're not previously connected to any
database.  (Autovacuum never "switches to another database"; any
particular worker only connects to a single database).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: WIP patch: Collation support

From
Heikki Linnakangas
Date:
Gregory Stark wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>
>> Well, I proposed disallowing using a different collation than the source
>> database, except for using template0 as the source. That's pretty limited, but
>> is trivial to implement and still let's you have databases with different
>> collations in the same cluster.
>
> +     if (strcmp(dbtemplate, "template0") != 0 &&
> +         (strcmp(lc_collate, src_collation) || strcmp(lc_ctype, src_ctype)))
> +         ereport(NOTICE,
> +                 (errmsg("database \"%s\" needs to be reindexed manually (REINDEX DATABASE)",
> +                         dbname)));
> +
>
> This isn't what you described but I think I prefer it this way as just a
> warning not an error.

Well, I'd prefer to make it an error, but I'm willing to listen if
others feel otherwise. I don't think the inconvenience of having to use
template0 is that big, compared to the potential of strange behavior
people would run into if they ignore the advice to reindex.

One weakness with a straight strcmp comparison is that it won't
recognize aliases of the same locale. For example, "fi_FI.UTF8" and
"fi_FI.UTF-8".

> AFAIK we can't easily connect to the new database and do some fiddling with
> it, can we? If we could we could check if there are any non-empty indexes
> which depend on the collation and only print the warning if we find any (and
> even mark them invalid).

I don't see that happening, unfortunately..

Attached is an updated version of the stripped-down patch. I've cleaned
it up a bit, and added more sanity checks. Documentation is still
missing and I haven't test it much.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
*** doc/src/sgml/ref/create_database.sgml
--- doc/src/sgml/ref/create_database.sgml
***************
*** 24,29 **** CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
--- 24,31 ----
      [ [ WITH ] [ OWNER [=] <replaceable class="parameter">dbowner</replaceable> ]
             [ TEMPLATE [=] <replaceable class="parameter">template</replaceable> ]
             [ ENCODING [=] <replaceable class="parameter">encoding</replaceable> ]
+            [ COLLATE [=] <replaceable class="parameter">collation</replaceable> ]
+            [ CTYPE [=] <replaceable class="parameter">ctype</replaceable> ]
             [ TABLESPACE [=] <replaceable class="parameter">tablespace</replaceable> ]
             [ CONNECTION LIMIT [=] <replaceable class="parameter">connlimit</replaceable> ] ]
  </synopsis>
***************
*** 113,118 **** CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
--- 115,136 ----
        </listitem>
       </varlistentry>
       <varlistentry>
+       <term><replaceable class="parameter">collation</replaceable></term>
+       <listitem>
+        <para>
+         LC_COLLATE setting to use in the new database.  XXX
+        </para>
+       </listitem>
+      </varlistentry>
+      <varlistentry>
+       <term><replaceable class="parameter">ctype</replaceable></term>
+       <listitem>
+        <para>
+         LC_CTYPE setting to use in the new database.  XXX
+        </para>
+       </listitem>
+      </varlistentry>
+      <varlistentry>
        <term><replaceable class="parameter">tablespace</replaceable></term>
        <listitem>
         <para>
*** src/backend/access/transam/xlog.c
--- src/backend/access/transam/xlog.c
***************
*** 3847,3853 **** WriteControlFile(void)
  {
      int            fd;
      char        buffer[PG_CONTROL_SIZE];        /* need not be aligned */
-     char       *localeptr;

      /*
       * Initialize version and compatibility-check fields
--- 3847,3852 ----
***************
*** 3876,3893 **** WriteControlFile(void)
      ControlFile->float4ByVal = FLOAT4PASSBYVAL;
      ControlFile->float8ByVal = FLOAT8PASSBYVAL;

-     ControlFile->localeBuflen = LOCALE_NAME_BUFLEN;
-     localeptr = setlocale(LC_COLLATE, NULL);
-     if (!localeptr)
-         ereport(PANIC,
-                 (errmsg("invalid LC_COLLATE setting")));
-     StrNCpy(ControlFile->lc_collate, localeptr, LOCALE_NAME_BUFLEN);
-     localeptr = setlocale(LC_CTYPE, NULL);
-     if (!localeptr)
-         ereport(PANIC,
-                 (errmsg("invalid LC_CTYPE setting")));
-     StrNCpy(ControlFile->lc_ctype, localeptr, LOCALE_NAME_BUFLEN);
-
      /* Contents are protected with a CRC */
      INIT_CRC32(ControlFile->crc);
      COMP_CRC32(ControlFile->crc,
--- 3875,3880 ----
***************
*** 4126,4159 **** ReadControlFile(void)
                             " but the server was compiled without USE_FLOAT8_BYVAL."),
                   errhint("It looks like you need to recompile or initdb.")));
  #endif
-
-     if (ControlFile->localeBuflen != LOCALE_NAME_BUFLEN)
-         ereport(FATAL,
-                 (errmsg("database files are incompatible with server"),
-                  errdetail("The database cluster was initialized with LOCALE_NAME_BUFLEN %d,"
-                   " but the server was compiled with LOCALE_NAME_BUFLEN %d.",
-                            ControlFile->localeBuflen, LOCALE_NAME_BUFLEN),
-                  errhint("It looks like you need to recompile or initdb.")));
-     if (pg_perm_setlocale(LC_COLLATE, ControlFile->lc_collate) == NULL)
-         ereport(FATAL,
-             (errmsg("database files are incompatible with operating system"),
-              errdetail("The database cluster was initialized with LC_COLLATE \"%s\","
-                        " which is not recognized by setlocale().",
-                        ControlFile->lc_collate),
-              errhint("It looks like you need to initdb or install locale support.")));
-     if (pg_perm_setlocale(LC_CTYPE, ControlFile->lc_ctype) == NULL)
-         ereport(FATAL,
-             (errmsg("database files are incompatible with operating system"),
-         errdetail("The database cluster was initialized with LC_CTYPE \"%s\","
-                   " which is not recognized by setlocale().",
-                   ControlFile->lc_ctype),
-              errhint("It looks like you need to initdb or install locale support.")));
-
-     /* Make the fixed locale settings visible as GUC variables, too */
-     SetConfigOption("lc_collate", ControlFile->lc_collate,
-                     PGC_INTERNAL, PGC_S_OVERRIDE);
-     SetConfigOption("lc_ctype", ControlFile->lc_ctype,
-                     PGC_INTERNAL, PGC_S_OVERRIDE);
  }

  void
--- 4113,4118 ----
*** src/backend/commands/dbcommands.c
--- src/backend/commands/dbcommands.c
***************
*** 53,58 ****
--- 53,59 ----
  #include "utils/fmgroids.h"
  #include "utils/guc.h"
  #include "utils/lsyscache.h"
+ #include "utils/pg_locale.h"
  #include "utils/syscache.h"
  #include "utils/tqual.h"

***************
*** 69,75 **** static bool get_db_info(const char *name, LOCKMODE lockmode,
              Oid *dbIdP, Oid *ownerIdP,
              int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
              Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
!             Oid *dbTablespace);
  static bool have_createdb_privilege(void);
  static void remove_dbtablespaces(Oid db_id);
  static bool check_db_file_conflict(Oid db_id);
--- 70,76 ----
              Oid *dbIdP, Oid *ownerIdP,
              int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
              Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
!             Oid *dbTablespace, char **dbCollation, char **dbCtype);
  static bool have_createdb_privilege(void);
  static void remove_dbtablespaces(Oid db_id);
  static bool check_db_file_conflict(Oid db_id);
***************
*** 87,92 **** createdb(const CreatedbStmt *stmt)
--- 88,95 ----
      Oid            src_dboid;
      Oid            src_owner;
      int            src_encoding;
+     char       *src_collation;
+     char       *src_ctype;
      bool        src_istemplate;
      bool        src_allowconn;
      Oid            src_lastsysoid;
***************
*** 104,113 **** createdb(const CreatedbStmt *stmt)
--- 107,120 ----
      DefElem    *downer = NULL;
      DefElem    *dtemplate = NULL;
      DefElem    *dencoding = NULL;
+     DefElem    *dcollation = NULL;
+     DefElem    *dctype = NULL;
      DefElem    *dconnlimit = NULL;
      char       *dbname = stmt->dbname;
      char       *dbowner = NULL;
      const char *dbtemplate = NULL;
+     char       *lc_collate = NULL;
+     char       *lc_ctype = NULL;
      int            encoding = -1;
      int            dbconnlimit = -1;
      int            ctype_encoding;
***************
*** 152,157 **** createdb(const CreatedbStmt *stmt)
--- 159,180 ----
                           errmsg("conflicting or redundant options")));
              dencoding = defel;
          }
+         else if (strcmp(defel->defname, "collate") == 0)
+         {
+             if (dcollation)
+                 ereport(ERROR,
+                         (errcode(ERRCODE_SYNTAX_ERROR),
+                          errmsg("conflicting or redundant options")));
+             dcollation = defel;
+         }
+         else if (strcmp(defel->defname, "ctype") == 0)
+         {
+             if (dctype)
+                 ereport(ERROR,
+                         (errcode(ERRCODE_SYNTAX_ERROR),
+                          errmsg("conflicting or redundant options")));
+             dctype = defel;
+         }
          else if (strcmp(defel->defname, "connectionlimit") == 0)
          {
              if (dconnlimit)
***************
*** 205,210 **** createdb(const CreatedbStmt *stmt)
--- 228,238 ----
              elog(ERROR, "unrecognized node type: %d",
                   nodeTag(dencoding->arg));
      }
+     if (dcollation && dcollation->arg)
+         lc_collate = strVal(dcollation->arg);
+     if (dctype && dctype->arg)
+         lc_ctype = strVal(dctype->arg);
+
      if (dconnlimit && dconnlimit->arg)
          dbconnlimit = intVal(dconnlimit->arg);

***************
*** 243,249 **** createdb(const CreatedbStmt *stmt)
      if (!get_db_info(dbtemplate, ShareLock,
                       &src_dboid, &src_owner, &src_encoding,
                       &src_istemplate, &src_allowconn, &src_lastsysoid,
!                      &src_frozenxid, &src_deftablespace))
          ereport(ERROR,
                  (errcode(ERRCODE_UNDEFINED_DATABASE),
                   errmsg("template database \"%s\" does not exist",
--- 271,278 ----
      if (!get_db_info(dbtemplate, ShareLock,
                       &src_dboid, &src_owner, &src_encoding,
                       &src_istemplate, &src_allowconn, &src_lastsysoid,
!                      &src_frozenxid, &src_deftablespace,
!                      &src_collation, &src_ctype))
          ereport(ERROR,
                  (errcode(ERRCODE_UNDEFINED_DATABASE),
                   errmsg("template database \"%s\" does not exist",
***************
*** 262,270 **** createdb(const CreatedbStmt *stmt)
                              dbtemplate)));
      }

!     /* If encoding is defaulted, use source's encoding */
      if (encoding < 0)
          encoding = src_encoding;

      /* Some encodings are client only */
      if (!PG_VALID_BE_ENCODING(encoding))
--- 291,303 ----
                              dbtemplate)));
      }

!     /* If encoding or locales are defaulted, use source's setting */
      if (encoding < 0)
          encoding = src_encoding;
+     if (lc_collate == NULL)
+         lc_collate = src_collation;
+     if (lc_ctype == NULL)
+         lc_ctype = src_ctype;

      /* Some encodings are client only */
      if (!PG_VALID_BE_ENCODING(encoding))
***************
*** 272,277 **** createdb(const CreatedbStmt *stmt)
--- 305,320 ----
                  (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                   errmsg("invalid server encoding %d", encoding)));

+     /* Check that the chosen locales are valid */
+     if (!check_locale(LC_COLLATE, lc_collate))
+         ereport(ERROR,
+                 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                  errmsg("invalid locale name %s", lc_collate)));
+     if (!check_locale(LC_CTYPE, lc_ctype))
+         ereport(ERROR,
+                 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                  errmsg("invalid locale name %s", lc_ctype)));
+
      /*
       * Check whether encoding matches server locale settings.  We allow
       * mismatch in three cases:
***************
*** 290,296 **** createdb(const CreatedbStmt *stmt)
       *
       * Note: if you change this policy, fix initdb to match.
       */
!     ctype_encoding = pg_get_encoding_from_locale(NULL);

      if (!(ctype_encoding == encoding ||
            ctype_encoding == PG_SQL_ASCII ||
--- 333,339 ----
       *
       * Note: if you change this policy, fix initdb to match.
       */
!     ctype_encoding = pg_get_encoding_from_locale(lc_ctype);

      if (!(ctype_encoding == encoding ||
            ctype_encoding == PG_SQL_ASCII ||
***************
*** 299,310 **** createdb(const CreatedbStmt *stmt)
  #endif
            (encoding == PG_SQL_ASCII && superuser())))
          ereport(ERROR,
!                 (errmsg("encoding %s does not match server's locale %s",
                          pg_encoding_to_char(encoding),
!                         setlocale(LC_CTYPE, NULL)),
!              errdetail("The server's LC_CTYPE setting requires encoding %s.",
                         pg_encoding_to_char(ctype_encoding))));

      /* Resolve default tablespace for new database */
      if (dtablespacename && dtablespacename->arg)
      {
--- 342,373 ----
  #endif
            (encoding == PG_SQL_ASCII && superuser())))
          ereport(ERROR,
!                 (errmsg("encoding %s does not match locale %s",
                          pg_encoding_to_char(encoding),
!                         lc_ctype),
!              errdetail("The chosen LC_CTYPE setting requires encoding %s.",
                         pg_encoding_to_char(ctype_encoding))));

+     /*
+      * Check that the new locale is compatible with the source database.
+      *
+      * We know that template0 doesn't contain any indexes that depend on
+      * collation or ctype, so template0 can be used as template for
+      * any locale.
+      */
+     if (strcmp(dbtemplate, "template0") != 0)
+     {
+         if (strcmp(lc_collate, src_collation))
+             ereport(ERROR,
+                     (errmsg("new collation is incompatible with the collation of the template database (%s)",
src_collation),
+                      errhint("Use the same collation as in the template database, or use template0 as template")));
+
+         if (strcmp(lc_ctype, src_ctype))
+             ereport(ERROR,
+                     (errmsg("new ctype is incompatible with the ctype of the template database (%s)", src_ctype),
+                      errhint("Use the same ctype as in the template database, or use template0 as template")));
+     }
+
      /* Resolve default tablespace for new database */
      if (dtablespacename && dtablespacename->arg)
      {
***************
*** 421,426 **** createdb(const CreatedbStmt *stmt)
--- 484,491 ----
          DirectFunctionCall1(namein, CStringGetDatum(dbname));
      new_record[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(datdba);
      new_record[Anum_pg_database_encoding - 1] = Int32GetDatum(encoding);
+     new_record[Anum_pg_database_collation - 1] = DirectFunctionCall1(namein, CStringGetDatum(lc_collate));
+     new_record[Anum_pg_database_ctype - 1] = DirectFunctionCall1(namein, CStringGetDatum(lc_ctype));
      new_record[Anum_pg_database_datistemplate - 1] = BoolGetDatum(false);
      new_record[Anum_pg_database_datallowconn - 1] = BoolGetDatum(true);
      new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit);
***************
*** 629,635 **** dropdb(const char *dbname, bool missing_ok)
      pgdbrel = heap_open(DatabaseRelationId, RowExclusiveLock);

      if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
!                      &db_istemplate, NULL, NULL, NULL, NULL))
      {
          if (!missing_ok)
          {
--- 694,700 ----
      pgdbrel = heap_open(DatabaseRelationId, RowExclusiveLock);

      if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
!                      &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL))
      {
          if (!missing_ok)
          {
***************
*** 781,787 **** RenameDatabase(const char *oldname, const char *newname)
      rel = heap_open(DatabaseRelationId, RowExclusiveLock);

      if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
!                      NULL, NULL, NULL, NULL, NULL))
          ereport(ERROR,
                  (errcode(ERRCODE_UNDEFINED_DATABASE),
                   errmsg("database \"%s\" does not exist", oldname)));
--- 846,852 ----
      rel = heap_open(DatabaseRelationId, RowExclusiveLock);

      if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
!                      NULL, NULL, NULL, NULL, NULL, NULL, NULL))
          ereport(ERROR,
                  (errcode(ERRCODE_UNDEFINED_DATABASE),
                   errmsg("database \"%s\" does not exist", oldname)));
***************
*** 1168,1174 **** get_db_info(const char *name, LOCKMODE lockmode,
              Oid *dbIdP, Oid *ownerIdP,
              int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
              Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
!             Oid *dbTablespace)
  {
      bool        result = false;
      Relation    relation;
--- 1233,1239 ----
              Oid *dbIdP, Oid *ownerIdP,
              int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
              Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
!             Oid *dbTablespace, char **dbCollation, char **dbCtype)
  {
      bool        result = false;
      Relation    relation;
***************
*** 1259,1264 **** get_db_info(const char *name, LOCKMODE lockmode,
--- 1324,1334 ----
                  /* default tablespace for this database */
                  if (dbTablespace)
                      *dbTablespace = dbform->dattablespace;
+                  /* default locale settings for this database */
+                  if (dbCollation)
+                      *dbCollation = pstrdup(NameStr(dbform->collation));
+                  if (dbCtype)
+                      *dbCtype = pstrdup(NameStr(dbform->ctype));
                  ReleaseSysCache(tuple);
                  result = true;
                  break;
*** src/backend/parser/gram.y
--- src/backend/parser/gram.y
***************
*** 398,404 **** static TypeName *TableFuncTypeName(List *columns);
      CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
      COMMITTED CONCURRENTLY CONFIGURATION CONNECTION CONSTRAINT CONSTRAINTS
      CONTENT_P CONTINUE_P CONVERSION_P COPY COST CREATE CREATEDB
!     CREATEROLE CREATEUSER CROSS CSV CURRENT_P CURRENT_DATE CURRENT_ROLE
      CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE

      DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
--- 398,404 ----
      CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
      COMMITTED CONCURRENTLY CONFIGURATION CONNECTION CONSTRAINT CONSTRAINTS
      CONTENT_P CONTINUE_P CONVERSION_P COPY COST CREATE CREATEDB
!     CREATEROLE CREATEUSER CROSS CSV CTYPE CURRENT_P CURRENT_DATE CURRENT_ROLE
      CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE

      DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
***************
*** 5458,5463 **** createdb_opt_item:
--- 5458,5479 ----
                  {
                      $$ = makeDefElem("encoding", NULL);
                  }
+             | COLLATE opt_equal Sconst
+                 {
+                     $$ = makeDefElem("collate", (Node *)makeString($3));
+                 }
+             | COLLATE opt_equal DEFAULT
+                 {
+                     $$ = makeDefElem("collate", NULL);
+                 }
+             | CTYPE opt_equal Sconst
+                 {
+                     $$ = makeDefElem("ctype", (Node *)makeString($3));
+                 }
+             | CTYPE opt_equal DEFAULT
+                 {
+                     $$ = makeDefElem("ctype", NULL);
+                 }
              | CONNECTION LIMIT opt_equal SignedIconst
                  {
                      $$ = makeDefElem("connectionlimit", (Node *)makeInteger($4));
***************
*** 9216,9221 **** unreserved_keyword:
--- 9232,9238 ----
              | CREATEROLE
              | CREATEUSER
              | CSV
+             | CTYPE
              | CURRENT_P
              | CURSOR
              | CYCLE
*** src/backend/parser/keywords.c
--- src/backend/parser/keywords.c
***************
*** 114,119 **** const ScanKeyword ScanKeywords[] = {
--- 114,120 ----
      {"createuser", CREATEUSER, UNRESERVED_KEYWORD},
      {"cross", CROSS, TYPE_FUNC_NAME_KEYWORD},
      {"csv", CSV, UNRESERVED_KEYWORD},
+     {"ctype", CTYPE, UNRESERVED_KEYWORD},
      {"current", CURRENT_P, UNRESERVED_KEYWORD},
      {"current_date", CURRENT_DATE, RESERVED_KEYWORD},
      {"current_role", CURRENT_ROLE, RESERVED_KEYWORD},
*** src/backend/utils/adt/pg_locale.c
--- src/backend/utils/adt/pg_locale.c
***************
*** 189,194 **** pg_perm_setlocale(int category, const char *locale)
--- 189,218 ----
  }


+ /*
+  * Is the locale name valid for the locale category?
+  */
+ bool
+ check_locale(int category, const char *value)
+ {
+     char       *save;
+     bool        ret;
+
+     save = setlocale(category, NULL);
+     if (!save)
+         return false;            /* won't happen, we hope */
+
+     /* save may be pointing at a modifiable scratch variable, see above */
+     save = pstrdup(save);
+
+     ret = (setlocale(category, value) != NULL);
+
+     setlocale(category, save);    /* assume this won't fail */
+     pfree(save);
+
+     return ret;
+ }
+
  /* GUC assign hooks */

  /*
***************
*** 203,223 **** pg_perm_setlocale(int category, const char *locale)
  static const char *
  locale_xxx_assign(int category, const char *value, bool doit, GucSource source)
  {
!     char       *save;
!
!     save = setlocale(category, NULL);
!     if (!save)
!         return NULL;            /* won't happen, we hope */
!
!     /* save may be pointing at a modifiable scratch variable, see above */
!     save = pstrdup(save);
!
!     if (!setlocale(category, value))
          value = NULL;            /* set failure return marker */

-     setlocale(category, save);    /* assume this won't fail */
-     pfree(save);
-
      /* need to reload cache next time? */
      if (doit && value != NULL)
      {
--- 227,235 ----
  static const char *
  locale_xxx_assign(int category, const char *value, bool doit, GucSource source)
  {
!     if (!check_locale(category, value))
          value = NULL;            /* set failure return marker */

      /* need to reload cache next time? */
      if (doit && value != NULL)
      {
*** src/backend/utils/init/postinit.c
--- src/backend/utils/init/postinit.c
***************
*** 159,164 **** CheckMyDatabase(const char *name, bool am_superuser)
--- 159,166 ----
  {
      HeapTuple    tup;
      Form_pg_database dbform;
+     char       *collate;
+     char       *ctype;

      /* Fetch our real pg_database row */
      tup = SearchSysCache(DATABASEOID,
***************
*** 240,245 **** CheckMyDatabase(const char *name, bool am_superuser)
--- 242,269 ----
      /* If we have no other source of client_encoding, use server encoding */
      SetConfigOption("client_encoding", GetDatabaseEncodingName(),
                      PGC_BACKEND, PGC_S_DEFAULT);
+
+     /* assign locale variables */
+     collate = NameStr(dbform->collation);
+     ctype = NameStr(dbform->ctype);
+
+     if (setlocale(LC_COLLATE, collate) == NULL)
+         ereport(FATAL,
+             (errmsg("database locale is incompatible with operating system"),
+             errdetail("The database was initialized with LC_COLLATE \"%s\", "
+                         " which is not recognized by setlocale().", collate),
+             errhint("Recreate the database with another locale or install the missing locale.")));
+
+     if (setlocale(LC_CTYPE, ctype) == NULL)
+         ereport(FATAL,
+             (errmsg("database locale is incompatible with operating system"),
+             errdetail("The database was initialized with LC_CTYPE \"%s\", "
+                         " which is not recognized by setlocale().", ctype),
+             errhint("Recreate the database with another locale or install the missing locale.")));
+
+     /* Make the locale settings visible as GUC variables, too */
+     SetConfigOption("lc_collate", collate, PGC_INTERNAL, PGC_S_DATABASE);
+     SetConfigOption("lc_ctype", ctype, PGC_INTERNAL, PGC_S_DATABASE);

      /*
       * Lastly, set up any database-specific configuration variables.
*** src/bin/initdb/initdb.c
--- src/bin/initdb/initdb.c
***************
*** 1353,1358 **** bootstrap_template1(char *short_version)
--- 1353,1362 ----

      bki_lines = replace_token(bki_lines, "ENCODING", encodingid);

+     bki_lines = replace_token(bki_lines, "LC_COLLATE", lc_collate);
+
+     bki_lines = replace_token(bki_lines, "LC_CTYPE", lc_ctype);
+
      /*
       * Pass correct LC_xxx environment to bootstrap.
       *
***************
*** 2378,2389 **** usage(const char *progname)
      printf(_("\nOptions:\n"));
      printf(_(" [-D, --pgdata=]DATADIR     location for this database cluster\n"));
      printf(_("  -E, --encoding=ENCODING   set default encoding for new databases\n"));
!     printf(_("  --locale=LOCALE           initialize database cluster with given locale\n"));
      printf(_("  --lc-collate, --lc-ctype, --lc-messages=LOCALE\n"
               "  --lc-monetary, --lc-numeric, --lc-time=LOCALE\n"
!              "                            initialize database cluster with given locale\n"
!              "                            in the respective category (default taken from\n"
!              "                            environment)\n"));
      printf(_("  --no-locale               equivalent to --locale=C\n"));
      printf(_("  -T, --text-search-config=CFG\n"
           "                            default text search configuration\n"));
--- 2382,2393 ----
      printf(_("\nOptions:\n"));
      printf(_(" [-D, --pgdata=]DATADIR     location for this database cluster\n"));
      printf(_("  -E, --encoding=ENCODING   set default encoding for new databases\n"));
!     printf(_("  --locale=LOCALE           set default locale for new databases\n"));
      printf(_("  --lc-collate, --lc-ctype, --lc-messages=LOCALE\n"
               "  --lc-monetary, --lc-numeric, --lc-time=LOCALE\n"
!              "                            set default locale in the respective\n"
!              "                            category for new databases (default\n"
!              "                            taken from environment)\n"));
      printf(_("  --no-locale               equivalent to --locale=C\n"));
      printf(_("  -T, --text-search-config=CFG\n"
           "                            default text search configuration\n"));
***************
*** 2806,2815 **** main(int argc, char *argv[])
          strcmp(lc_ctype, lc_numeric) == 0 &&
          strcmp(lc_ctype, lc_monetary) == 0 &&
          strcmp(lc_ctype, lc_messages) == 0)
!         printf(_("The database cluster will be initialized with locale %s.\n"), lc_ctype);
      else
      {
!         printf(_("The database cluster will be initialized with locales\n"
                   "  COLLATE:  %s\n"
                   "  CTYPE:    %s\n"
                   "  MESSAGES: %s\n"
--- 2810,2821 ----
          strcmp(lc_ctype, lc_numeric) == 0 &&
          strcmp(lc_ctype, lc_monetary) == 0 &&
          strcmp(lc_ctype, lc_messages) == 0)
!         printf(_("The template databases will be initialized with locale %s.\n"), lc_ctype);
      else
      {
!         /* XXX only collate and ctype are actually set in stone here, others
!          * are userset gucs */
!         printf(_("The template databases will be initialized with locales\n"
                   "  COLLATE:  %s\n"
                   "  CTYPE:    %s\n"
                   "  MESSAGES: %s\n"
*** src/bin/pg_controldata/pg_controldata.c
--- src/bin/pg_controldata/pg_controldata.c
***************
*** 220,231 **** main(int argc, char *argv[])
             (ControlFile.float4ByVal ? _("by value") : _("by reference")));
      printf(_("Float8 argument passing:              %s\n"),
             (ControlFile.float8ByVal ? _("by value") : _("by reference")));
-     printf(_("Maximum length of locale name:        %u\n"),
-            ControlFile.localeBuflen);
-     printf(_("LC_COLLATE:                           %s\n"),
-            ControlFile.lc_collate);
-     printf(_("LC_CTYPE:                             %s\n"),
-            ControlFile.lc_ctype);
-
      return 0;
  }
--- 220,224 ----
*** src/bin/pg_resetxlog/pg_resetxlog.c
--- src/bin/pg_resetxlog/pg_resetxlog.c
***************
*** 493,514 **** GuessControlValues(void)
  #endif
      ControlFile.float4ByVal = FLOAT4PASSBYVAL;
      ControlFile.float8ByVal = FLOAT8PASSBYVAL;
-     ControlFile.localeBuflen = LOCALE_NAME_BUFLEN;
-
-     localeptr = setlocale(LC_COLLATE, "");
-     if (!localeptr)
-     {
-         fprintf(stderr, _("%s: invalid LC_COLLATE setting\n"), progname);
-         exit(1);
-     }
-     strlcpy(ControlFile.lc_collate, localeptr, sizeof(ControlFile.lc_collate));
-     localeptr = setlocale(LC_CTYPE, "");
-     if (!localeptr)
-     {
-         fprintf(stderr, _("%s: invalid LC_CTYPE setting\n"), progname);
-         exit(1);
-     }
-     strlcpy(ControlFile.lc_ctype, localeptr, sizeof(ControlFile.lc_ctype));

      /*
       * XXX eventually, should try to grovel through old XLOG to develop more
--- 493,498 ----
***************
*** 584,595 **** PrintControlValues(bool guessed)
             (ControlFile.float4ByVal ? _("by value") : _("by reference")));
      printf(_("Float8 argument passing:              %s\n"),
             (ControlFile.float8ByVal ? _("by value") : _("by reference")));
-     printf(_("Maximum length of locale name:        %u\n"),
-            ControlFile.localeBuflen);
-     printf(_("LC_COLLATE:                           %s\n"),
-            ControlFile.lc_collate);
-     printf(_("LC_CTYPE:                             %s\n"),
-            ControlFile.lc_ctype);
  }


--- 568,573 ----
*** src/bin/scripts/createdb.c
--- src/bin/scripts/createdb.c
***************
*** 32,37 **** main(int argc, char *argv[])
--- 32,39 ----
          {"tablespace", required_argument, NULL, 'D'},
          {"template", required_argument, NULL, 'T'},
          {"encoding", required_argument, NULL, 'E'},
+         {"lc-collate", required_argument, NULL, 1},
+         {"lc-ctype", required_argument, NULL, 2},
          {NULL, 0, NULL, 0}
      };

***************
*** 50,55 **** main(int argc, char *argv[])
--- 52,59 ----
      char       *tablespace = NULL;
      char       *template = NULL;
      char       *encoding = NULL;
+     char       *lc_collate = NULL;
+     char       *lc_ctype = NULL;

      PQExpBufferData sql;

***************
*** 95,100 **** main(int argc, char *argv[])
--- 99,110 ----
              case 'E':
                  encoding = optarg;
                  break;
+             case 1:
+                 lc_collate = optarg;
+                 break;
+             case 2:
+                 lc_ctype = optarg;
+                 break;
              default:
                  fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
                  exit(1);
***************
*** 152,157 **** main(int argc, char *argv[])
--- 162,172 ----
          appendPQExpBuffer(&sql, " ENCODING '%s'", encoding);
      if (template)
          appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template));
+     if (lc_collate)
+         appendPQExpBuffer(&sql, " LCCOLLATE %s", fmtId(lc_collate));
+     if (lc_ctype)
+         appendPQExpBuffer(&sql, " LCCTYPE %s", fmtId(lc_ctype));
+
      appendPQExpBuffer(&sql, ";\n");

      conn = connectDatabase(strcmp(dbname, "postgres") == 0 ? "template1" : "postgres",
***************
*** 209,214 **** help(const char *progname)
--- 224,232 ----
      printf(_("\nOptions:\n"));
      printf(_("  -D, --tablespace=TABLESPACE  default tablespace for the database\n"));
      printf(_("  -E, --encoding=ENCODING      encoding for the database\n"));
+     printf(_("    --lc-collate=LOCALE          LC_COLLATE setting for the database\n"));
+     printf(_("    --lc-ctype=LOCALE            LC_CTYPE setting for the database\n"));
+
      printf(_("  -O, --owner=OWNER            database user to own the new database\n"));
      printf(_("  -T, --template=TEMPLATE      template database to copy\n"));
      printf(_("  -e, --echo                   show the commands being sent to the server\n"));
*** src/include/catalog/pg_control.h
--- src/include/catalog/pg_control.h
***************
*** 144,154 **** typedef struct ControlFileData
      bool        float4ByVal;    /* float4 pass-by-value? */
      bool        float8ByVal;    /* float8, int8, etc pass-by-value? */

-     /* active locales */
-     uint32        localeBuflen;
-     char        lc_collate[LOCALE_NAME_BUFLEN];
-     char        lc_ctype[LOCALE_NAME_BUFLEN];
-
      /* CRC of all above ... MUST BE LAST! */
      pg_crc32    crc;
  } ControlFileData;
--- 144,149 ----
*** src/include/catalog/pg_database.h
--- src/include/catalog/pg_database.h
***************
*** 33,38 **** CATALOG(pg_database,1262) BKI_SHARED_RELATION
--- 33,40 ----
      NameData    datname;        /* database name */
      Oid            datdba;            /* owner of database */
      int4        encoding;        /* character encoding */
+     NameData    collation;        /* LC_COLLATE of database */
+     NameData    ctype;            /* LC_CTYPE of database */
      bool        datistemplate;    /* allowed as CREATE DATABASE template? */
      bool        datallowconn;    /* new connections allowed? */
      int4        datconnlimit;    /* max connections allowed (-1=no limit) */
***************
*** 54,73 **** typedef FormData_pg_database *Form_pg_database;
   *        compiler constants for pg_database
   * ----------------
   */
! #define Natts_pg_database                11
  #define Anum_pg_database_datname        1
  #define Anum_pg_database_datdba            2
  #define Anum_pg_database_encoding        3
! #define Anum_pg_database_datistemplate    4
! #define Anum_pg_database_datallowconn    5
! #define Anum_pg_database_datconnlimit    6
! #define Anum_pg_database_datlastsysoid    7
! #define Anum_pg_database_datfrozenxid    8
! #define Anum_pg_database_dattablespace    9
! #define Anum_pg_database_datconfig        10
! #define Anum_pg_database_datacl            11

! DATA(insert OID = 1 (  template1 PGUID ENCODING t t -1 0 0 1663 _null_ _null_ ));
  SHDESCR("default template database");
  #define TemplateDbOid            1

--- 56,77 ----
   *        compiler constants for pg_database
   * ----------------
   */
! #define Natts_pg_database                13
  #define Anum_pg_database_datname        1
  #define Anum_pg_database_datdba            2
  #define Anum_pg_database_encoding        3
! #define Anum_pg_database_collation        4
! #define Anum_pg_database_ctype            5
! #define Anum_pg_database_datistemplate    6
! #define Anum_pg_database_datallowconn    7
! #define Anum_pg_database_datconnlimit    8
! #define Anum_pg_database_datlastsysoid    9
! #define Anum_pg_database_datfrozenxid    10
! #define Anum_pg_database_dattablespace    11
! #define Anum_pg_database_datconfig        12
! #define Anum_pg_database_datacl            13

! DATA(insert OID = 1 (  template1 PGUID ENCODING "LC_COLLATE" "LC_CTYPE" t t -1 0 0 1663 _null_ _null_));
  SHDESCR("default template database");
  #define TemplateDbOid            1

*** src/include/utils/pg_locale.h
--- src/include/utils/pg_locale.h
***************
*** 39,44 **** extern const char *locale_numeric_assign(const char *value,
--- 39,45 ----
  extern const char *locale_time_assign(const char *value,
                     bool doit, GucSource source);

+ extern bool check_locale(int category, const char *locale);
  extern char *pg_perm_setlocale(int category, const char *locale);

  extern bool lc_collate_is_c(void);
*** src/interfaces/ecpg/preproc/preproc.y
--- src/interfaces/ecpg/preproc/preproc.y
***************
*** 428,434 **** add_typedef(char *name, char * dimension, char * length, enum ECPGttype type_enu
      CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
      COMMITTED CONCURRENTLY CONFIGURATION CONNECTION CONSTRAINT CONSTRAINTS
      CONTENT_P CONTINUE_P CONVERSION_P COPY COST CREATE CREATEDB
!     CREATEROLE CREATEUSER CROSS CSV CURRENT_P CURRENT_DATE CURRENT_ROLE
      CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE

      DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
--- 428,434 ----
      CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
      COMMITTED CONCURRENTLY CONFIGURATION CONNECTION CONSTRAINT CONSTRAINTS
      CONTENT_P CONTINUE_P CONVERSION_P COPY COST CREATE CREATEDB
!     CREATEROLE CREATEUSER CROSS CSV CTYPE CURRENT_P CURRENT_DATE CURRENT_ROLE
      CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE

      DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS

Re: WIP patch: Collation support

From
Martijn van Oosterhout
Date:
On Thu, Sep 18, 2008 at 05:41:16PM +0300, Heikki Linnakangas wrote:
> Attached is an updated version of the stripped-down patch. I've cleaned
> it up a bit, and added more sanity checks. Documentation is still
> missing and I haven't test it much.

FWIW, I disagree with the stripping down and think we should aim for
the whole patch as submitted. Historically, the hardest part of getting
collation support into postgres has been the catalog changes and easier
the support for something other than OS locales. I supported the patch as
it was precisely because it finally did that, and the stripping down
takes us back to square one.

Implementing the COLLATE keyword is much easier once you have the underlying
support and woldn't involve any backward incompatabilities. The
stripped down version gets us to a state which will only be supported
(hopefully) for one release.

For anyone counting, Firebird added support for ICU more than three
years ago.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: WIP patch: Collation support

From
Heikki Linnakangas
Date:
Martijn van Oosterhout wrote:
> FWIW, I disagree with the stripping down and think we should aim for
> the whole patch as submitted. Historically, the hardest part of getting
> collation support into postgres has been the catalog changes and easier
> the support for something other than OS locales. I supported the patch as
> it was precisely because it finally did that, and the stripping down
> takes us back to square one.

The catalog changes are one issue, but hardly the biggest one. 
Database-level collation works just as well without them, and for 
finer-grain control, the proposed patch was nowhere near enough. You'll 
need further catalog changes to track collation for each index column, 
for example, probably something in the operator families or classes etc.

In addition to catalog changes, for finer grained collation you need 
changes in the planner and executor as well. In planner, to provide 
support for the concept of a sort order using a specific collation, and 
track that whenever sort order is handled. In executor, the capability 
to compare and sort using different locales at different times. Those 
changes seem a lot more difficult to me than adding a couple of system 
catalogs, which is pretty straightforward.

It's not like the patch is going to disappear from planet Earth if it 
doesn't get committed for 8.4. It's still valuable and available when 
the new catalogs are needed.

> Implementing the COLLATE keyword is much easier once you have the underlying
> support and woldn't involve any backward incompatabilities. The
> stripped down version gets us to a state which will only be supported
> (hopefully) for one release.

Yes, I hope so too. Implementation in phases is a good thing in my mind, 
but adding a lot of complexity that isn't need until in a future release 
is not. We will need to support the 8.4 CREATE DATABASE syntax for 
backwards-compatibility, but I don't see a problem with that.

> For anyone counting, Firebird added support for ICU more than three
> years ago.

ICU is orthogonal to this patch. This patch didn't provide ICU support, 
and we could start using ICU without the catalog changes.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: WIP patch: Collation support

From
Heikki Linnakangas
Date:
Here's an updated version of the stripped-down patch, now with
documentation changes, plus a couple of minor bug fixes.

I think this is looking pretty good now, and I would appreciate review
from others before I go ahead committing this. Is there any more places
in the documentation that needs updating, or any grammar mistakes?
Further opinions on whether to allow using a database with different
collation order as template, requiring a manual reindex and facing weird
results if that's neglected, or to error out with a hint suggesting to
use template0 as template (as is in the patch now)?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index f484db8..1e1786a 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -130,23 +130,24 @@ initdb --locale=sv_SE

    <para>
     The nature of some locale categories is that their value has to be
-    fixed for the lifetime of a database cluster.  That is, once
-    <command>initdb</command> has run, you cannot change them anymore.
-    <literal>LC_COLLATE</literal> and <literal>LC_CTYPE</literal> are
-    those categories.  They affect the sort order of indexes, so they
-    must be kept fixed, or indexes on text columns will become corrupt.
-    <productname>PostgreSQL</productname> enforces this by recording
-    the values of <envar>LC_COLLATE</> and <envar>LC_CTYPE</> that are
-    seen by <command>initdb</>.  The server automatically adopts
-    those two values when it is started.
+    fixed when the database is created.  You can use different settings
+    for different databases, but once a database is created, you cannot
+    change them for that database anymore. <literal>LC_COLLATE</literal>
+    and <literal>LC_CTYPE</literal> are those categories.  They affect
+    the sort order of indexes, so they must be kept fixed, or indexes on
+    text columns will become corrupt.  The default values for these
+    categories are defined when <command>initdb</command> is run, and
+    those values are used when new databases are created, unless
+    explicitly specified otherwise in the <command>CREATE
+    DATABASE</command> command.
    </para>

    <para>
     The other locale categories can be changed as desired whenever the
     server is running by setting the run-time configuration variables
     that have the same name as the locale categories (see <xref
-    linkend="runtime-config-client-format"> for details).  The defaults that are
-    chosen by <command>initdb</command> are actually only written into
+    linkend="runtime-config-client-format"> for details).  The defaults
+    that are chosen by <command>initdb</command> are actually only written into
     the configuration file <filename>postgresql.conf</filename> to
     serve as defaults when the server is started.  If you delete these
     assignments from <filename>postgresql.conf</filename> then the
@@ -261,7 +262,7 @@ initdb --locale=sv_SE

    <para>
     Check that <productname>PostgreSQL</> is actually using the locale
-    that you think it is.  <envar>LC_COLLATE</> and <envar>LC_CTYPE</>
+    that you think it is.  The default <envar>LC_COLLATE</> and <envar>LC_CTYPE</>
     settings are determined at <command>initdb</> time and cannot be
     changed without repeating <command>initdb</>.  Other locale
     settings including <envar>LC_MESSAGES</> and <envar>LC_MONETARY</>
@@ -320,16 +321,10 @@ initdb --locale=sv_SE

   <para>
    An important restriction, however, is that each database character set
-   must be compatible with the server's <envar>LC_CTYPE</> setting.
+   must be compatible with the database's <envar>LC_CTYPE</> setting.
    When <envar>LC_CTYPE</> is <literal>C</> or <literal>POSIX</>, any
    character set is allowed, but for other settings of <envar>LC_CTYPE</>
    there is only one character set that will work correctly.
-   Since the <envar>LC_CTYPE</> setting is frozen by <command>initdb</>, the
-   apparent flexibility to use different encodings in different databases
-   of a cluster is more theoretical than real, except when you select
-   <literal>C</> or <literal>POSIX</> locale (thus disabling any real locale
-   awareness).  It is likely that these mechanisms will be revisited in future
-   versions of <productname>PostgreSQL</productname>.
   </para>

    <sect2 id="multibyte-charset-supported">
@@ -734,19 +729,19 @@ initdb -E EUC_JP
     </para>

     <para>
-     If you have selected <literal>C</> or <literal>POSIX</> locale,
-     you can create a database with a different character set:
+     You can specify a non-default encoding at database creation time,
+     provided that the encoding is compatible with the selected locale:

 <screen>
-createdb -E EUC_KR korean
+createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr korean
 </screen>

      This will create a database named <literal>korean</literal> that
-     uses the character set <literal>EUC_KR</literal>.  Another way to
-     accomplish this is to use this SQL command:
+     uses the character set <literal>EUC_KR</literal>, and locale <literal>ko_KR</literal>.
+     Another way to accomplish this is to use this SQL command:

 <programlisting>
-CREATE DATABASE korean WITH ENCODING 'EUC_KR';
+CREATE DATABASE korean WITH ENCODING 'EUC_KR' COLLATE='ko_KR.euckr' CTYPE='ko_KR.euckr' TEMPLATE=template0;
 </programlisting>

      The encoding for a database is stored in the system catalog
@@ -754,6 +749,8 @@ CREATE DATABASE korean WITH ENCODING 'EUC_KR';
      <option>-l</option> option or the <command>\l</command> command
      of <command>psql</command>.

+XXX It would be nice to show the "korean" database here. And we should
+modify psql to show locale as well
 <screen>
 $ <userinput>psql -l</userinput>
             List of databases
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 2d4a7bf..f02a957 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -157,7 +157,7 @@ CREATE INDEX test1_id_index ON test1 (id);
    <emphasis>if</emphasis> the pattern is a constant and is anchored to
    the beginning of the string — for example, <literal>col LIKE
    'foo%'</literal> or <literal>col ~ '^foo'</literal>, but not
-   <literal>col LIKE '%bar'</literal>. However, if your server does not
+   <literal>col LIKE '%bar'</literal>. However, if your database does not
    use the C locale you will need to create the index with a special
    operator class to support indexing of pattern-matching queries. See
    <xref linkend="indexes-opclass"> below. It is also possible to use
@@ -922,7 +922,7 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
       according to the locale-specific collation rules.  This makes
       these operator classes suitable for use by queries involving
       pattern matching expressions (<literal>LIKE</literal> or POSIX
-      regular expressions) when the server does not use the standard
+      regular expressions) when the database does not use the standard
       <quote>C</quote> locale.  As an example, you might index a
       <type>varchar</type> column like this:
 <programlisting>
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 95350c4..ed048b0 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -24,6 +24,8 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
     [ [ WITH ] [ OWNER [=] <replaceable class="parameter">dbowner</replaceable> ]
            [ TEMPLATE [=] <replaceable class="parameter">template</replaceable> ]
            [ ENCODING [=] <replaceable class="parameter">encoding</replaceable> ]
+           [ COLLATE [=] <replaceable class="parameter">collation</replaceable> ]
+           [ CTYPE [=] <replaceable class="parameter">ctype</replaceable> ]
            [ TABLESPACE [=] <replaceable class="parameter">tablespace</replaceable> ]
            [ CONNECTION LIMIT [=] <replaceable class="parameter">connlimit</replaceable> ] ]
 </synopsis>
@@ -113,6 +115,29 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
       </listitem>
      </varlistentry>
      <varlistentry>
+      <term><replaceable class="parameter">collation</replaceable></term>
+      <listitem>
+       <para>
+        Collation order (<literal>LC_COLLATE</>) to use in the new database.
+    This affects the sort order applied to strings, e.g in queries with
+        ORDER BY, as well as the order used in indexes on text columns.
+        The default is to use the collation order of the template database.
+        See below for additional restrictions.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><replaceable class="parameter">ctype</replaceable></term>
+      <listitem>
+       <para>
+        Character classification (<literal>LC_CTYPE</>) to use in the new
+        database. This affects the categorization of characters, e.g. lower,
+        upper and digit. The default is to use the character classification of
+        the template database. See below for additional restrictions.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
       <term><replaceable class="parameter">tablespace</replaceable></term>
       <listitem>
        <para>
@@ -181,12 +206,10 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>

   <para>
    Any character set encoding specified for the new database must be
-   compatible with the server's <envar>LC_CTYPE</> locale setting.
+   compatible with the chosen COLLATE and CTYPE settings.
    If <envar>LC_CTYPE</> is <literal>C</> (or equivalently
    <literal>POSIX</>), then all encodings are allowed, but for other
-   locale settings there is only one encoding that will work properly,
-   and so the apparent freedom to specify an encoding is illusory if
-   you didn't initialize the database cluster in <literal>C</> locale.
+   locale settings there is only one encoding that will work properly.
    <command>CREATE DATABASE</> will allow superusers to specify
    <literal>SQL_ASCII</> encoding regardless of the locale setting,
    but this choice is deprecated and may result in misbehavior of
@@ -195,6 +218,16 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
   </para>

   <para>
+   The <literal>COLLATE</> and <literal>CTYPE</> settings must match
+   those of the template database, except when template0 is used as
+   template. This is because <literal>COLLATE</> and <literal>CTYPE</>
+   affects the ordering in indexes, so that any indexes copied from the
+   template database would be invalid in the new database with different
+   settings. <literal>template0</literal>, however, is known to not
+   contain any indexes that would be affected.
+  </para>
+
+  <para>
    The <literal>CONNECTION LIMIT</> option is only enforced approximately;
    if two new sessions start at about the same time when just one
    connection <quote>slot</> remains for the database, it is possible that
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 10a9d1c..b313b1f 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -76,25 +76,30 @@ PostgreSQL documentation

   <para>
    <command>initdb</command> initializes the database cluster's default
-   locale and character set encoding. The collation order
-   (<literal>LC_COLLATE</>) and character set classes
-   (<literal>LC_CTYPE</>, e.g. upper, lower, digit) are fixed for all
-   databases and cannot be changed. Collation orders other than
-   <literal>C</> or <literal>POSIX</> also have a performance penalty.
-   For these reasons it is important to choose the right locale when
-   running <command>initdb</command>. The remaining locale categories
-   can be changed later when the server is started. All server locale
-   values (<literal>lc_*</>) can be displayed via <command>SHOW ALL</>.
+   locale and character set encoding.
+
+   The character set encoding, collation order (<literal>LC_COLLATE</>)
+   and character set classes (<literal>LC_CTYPE</>, e.g. upper, lower,
+   digit) can be set separately for a database when it is created.
+   <command>initdb</command> determines those settings for the
+   <literal>template1</literal> database, which will serve as the
+   default for all other databases.
+
+   To alter the default collation order or character set classes, use the
+   <option>--lc-collate</option> and <option>--lc-ctype</option> options.
+   Collation orders other than <literal>C</> or <literal>POSIX</> also have
+   a performance penalty.  For these reasons it is important to choose the
+   right locale when running <command>initdb</command>.
+
+   The remaining locale categories can be changed later when the server
+   is started.  You can also use <option>--locale</option> to set the
+   default for all locale categories, including collation order and
+   character set classes. All server locale values (<literal>lc_*</>) can
+   be displayed via <command>SHOW ALL</>.
    More details can be found in <xref linkend="locale">.
-  </para>

-  <para>
-   The character set encoding can be set separately for a database when
-   it is created. <command>initdb</command> determines the encoding for
-   the <literal>template1</literal> database, which will serve as the
-   default for all other databases. To alter the default encoding use
-   the <option>--encoding</option> option. More details can be found in
-   <xref linkend="multibyte">.
+   To alter the default encoding, use the <option>--encoding</option>.
+   More details can be found in <xref linkend="multibyte">.
   </para>

  </refsect1>
diff --git a/doc/src/sgml/ref/pg_controldata.sgml b/doc/src/sgml/ref/pg_controldata.sgml
index b379df0..7f50b7b 100644
--- a/doc/src/sgml/ref/pg_controldata.sgml
+++ b/doc/src/sgml/ref/pg_controldata.sgml
@@ -30,7 +30,7 @@ PostgreSQL documentation
   <title>Description</title>
   <para>
    <command>pg_controldata</command> prints information initialized during
-   <command>initdb</>, such as the catalog version and server locale.
+   <command>initdb</>, such as the catalog version.
    It also shows information about write-ahead logging and checkpoint
    processing.  This information is cluster-wide, and not specific to any one
    database.
diff --git a/doc/src/sgml/ref/pg_resetxlog.sgml b/doc/src/sgml/ref/pg_resetxlog.sgml
index 1933455..f7331e4 100644
--- a/doc/src/sgml/ref/pg_resetxlog.sgml
+++ b/doc/src/sgml/ref/pg_resetxlog.sgml
@@ -62,12 +62,9 @@ PostgreSQL documentation
    by specifying the <literal>-f</> (force) switch.  In this case plausible
    values will be substituted for the missing data.  Most of the fields can be
    expected to match, but manual assistance might be needed for the next OID,
-   next transaction ID and epoch, next multitransaction ID and offset,
-   WAL starting address, and database locale fields.
-   The first six of these can be set using the switches discussed below.
-   <command>pg_resetxlog</command>'s own environment is the source for its
-   guess at the locale fields; take care that <envar>LANG</> and so forth
-   match the environment that <command>initdb</> was run in.
+   next transaction ID and epoch, next multitransaction ID and offset, and
+   WAL starting address fields.
+   The first five of these can be set using the switches discussed below.
    If you are not able to determine correct values for all these fields,
    <literal>-f</> can still be used, but
    the recovered database must be treated with even more suspicion than
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index e463d15..a1be7cb 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -747,8 +747,7 @@ SELECT name FROM distributors ORDER BY code;

    <para>
     Character-string data is sorted according to the locale-specific
-    collation order that was established when the database cluster
-    was initialized.
+    collation order that was established when the database was created.
    </para>
   </refsect2>

diff --git a/doc/src/sgml/ref/show.sgml b/doc/src/sgml/ref/show.sgml
index 3d238c3..461a9c8 100644
--- a/doc/src/sgml/ref/show.sgml
+++ b/doc/src/sgml/ref/show.sgml
@@ -82,8 +82,8 @@ SHOW ALL
          <para>
           Shows the database's locale setting for collation (text
           ordering).  At present, this parameter can be shown but not
-          set, because the setting is determined at
-          <command>initdb</> time.
+          set, because the setting is determined at database creation
+          time.
          </para>
         </listitem>
        </varlistentry>
@@ -94,8 +94,8 @@ SHOW ALL
          <para>
           Shows the database's locale setting for character
           classification.  At present, this parameter can be shown but
-          not set, because the setting is determined at
-          <command>initdb</> time.
+          not set, because the setting is determined at database creation
+      time.
          </para>
         </listitem>
        </varlistentry>
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 5ce6163..4b7f5eb 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -145,11 +145,12 @@ postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput>
    Normally, it will just take the locale settings in the environment
    and apply them to the initialized database.  It is possible to
    specify a different locale for the database; more information about
-   that can be found in <xref linkend="locale">.  The sort order used
-   within a particular database cluster is set by
-   <command>initdb</command> and cannot be changed later, short of
-   dumping all data, rerunning <command>initdb</command>, and reloading
-   the data. There is also a performance impact for using locales
+   that can be found in <xref linkend="locale">.  The default sort order used
+   within the particular database cluster is set by
+   <command>initdb</command>, and while you can create new databases using
+   different sort order, the order used in the template databases that initdb
+   creates cannot be changed without dropping and recreating them.
+   There is also a performance impact for using locales
    other than <literal>C</> or <literal>POSIX</>. Therefore, it is
    important to make this choice correctly the first time.
   </para>
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index e48f82b..98ac7f8 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -1896,7 +1896,7 @@ LIMIT 10;

   <note>
    <para>
-    The parser's notion of a <quote>letter</> is determined by the server's
+    The parser's notion of a <quote>letter</> is determined by the database's
     locale setting, specifically <varname>lc_ctype</>.  Words containing
     only the basic ASCII letters are reported as a separate token type,
     since it is sometimes useful to distinguish them.  In most European
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5645271..edd1d74 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3847,7 +3847,6 @@ WriteControlFile(void)
 {
     int            fd;
     char        buffer[PG_CONTROL_SIZE];        /* need not be aligned */
-    char       *localeptr;

     /*
      * Initialize version and compatibility-check fields
@@ -3876,18 +3875,6 @@ WriteControlFile(void)
     ControlFile->float4ByVal = FLOAT4PASSBYVAL;
     ControlFile->float8ByVal = FLOAT8PASSBYVAL;

-    ControlFile->localeBuflen = LOCALE_NAME_BUFLEN;
-    localeptr = setlocale(LC_COLLATE, NULL);
-    if (!localeptr)
-        ereport(PANIC,
-                (errmsg("invalid LC_COLLATE setting")));
-    StrNCpy(ControlFile->lc_collate, localeptr, LOCALE_NAME_BUFLEN);
-    localeptr = setlocale(LC_CTYPE, NULL);
-    if (!localeptr)
-        ereport(PANIC,
-                (errmsg("invalid LC_CTYPE setting")));
-    StrNCpy(ControlFile->lc_ctype, localeptr, LOCALE_NAME_BUFLEN);
-
     /* Contents are protected with a CRC */
     INIT_CRC32(ControlFile->crc);
     COMP_CRC32(ControlFile->crc,
@@ -4126,34 +4113,6 @@ ReadControlFile(void)
                            " but the server was compiled without USE_FLOAT8_BYVAL."),
                  errhint("It looks like you need to recompile or initdb.")));
 #endif
-
-    if (ControlFile->localeBuflen != LOCALE_NAME_BUFLEN)
-        ereport(FATAL,
-                (errmsg("database files are incompatible with server"),
-                 errdetail("The database cluster was initialized with LOCALE_NAME_BUFLEN %d,"
-                  " but the server was compiled with LOCALE_NAME_BUFLEN %d.",
-                           ControlFile->localeBuflen, LOCALE_NAME_BUFLEN),
-                 errhint("It looks like you need to recompile or initdb.")));
-    if (pg_perm_setlocale(LC_COLLATE, ControlFile->lc_collate) == NULL)
-        ereport(FATAL,
-            (errmsg("database files are incompatible with operating system"),
-             errdetail("The database cluster was initialized with LC_COLLATE \"%s\","
-                       " which is not recognized by setlocale().",
-                       ControlFile->lc_collate),
-             errhint("It looks like you need to initdb or install locale support.")));
-    if (pg_perm_setlocale(LC_CTYPE, ControlFile->lc_ctype) == NULL)
-        ereport(FATAL,
-            (errmsg("database files are incompatible with operating system"),
-        errdetail("The database cluster was initialized with LC_CTYPE \"%s\","
-                  " which is not recognized by setlocale().",
-                  ControlFile->lc_ctype),
-             errhint("It looks like you need to initdb or install locale support.")));
-
-    /* Make the fixed locale settings visible as GUC variables, too */
-    SetConfigOption("lc_collate", ControlFile->lc_collate,
-                    PGC_INTERNAL, PGC_S_OVERRIDE);
-    SetConfigOption("lc_ctype", ControlFile->lc_ctype,
-                    PGC_INTERNAL, PGC_S_OVERRIDE);
 }

 void
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 4dd6262..2d5e27b 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -53,6 +53,7 @@
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/lsyscache.h"
+#include "utils/pg_locale.h"
 #include "utils/syscache.h"
 #include "utils/tqual.h"

@@ -69,7 +70,7 @@ static bool get_db_info(const char *name, LOCKMODE lockmode,
             Oid *dbIdP, Oid *ownerIdP,
             int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
             Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
-            Oid *dbTablespace);
+            Oid *dbTablespace, char **dbCollation, char **dbCtype);
 static bool have_createdb_privilege(void);
 static void remove_dbtablespaces(Oid db_id);
 static bool check_db_file_conflict(Oid db_id);
@@ -87,6 +88,8 @@ createdb(const CreatedbStmt *stmt)
     Oid            src_dboid;
     Oid            src_owner;
     int            src_encoding;
+    char       *src_collation;
+    char       *src_ctype;
     bool        src_istemplate;
     bool        src_allowconn;
     Oid            src_lastsysoid;
@@ -104,10 +107,14 @@ createdb(const CreatedbStmt *stmt)
     DefElem    *downer = NULL;
     DefElem    *dtemplate = NULL;
     DefElem    *dencoding = NULL;
+    DefElem    *dcollation = NULL;
+    DefElem    *dctype = NULL;
     DefElem    *dconnlimit = NULL;
     char       *dbname = stmt->dbname;
     char       *dbowner = NULL;
     const char *dbtemplate = NULL;
+    char       *lc_collate = NULL;
+    char       *lc_ctype = NULL;
     int            encoding = -1;
     int            dbconnlimit = -1;
     int            ctype_encoding;
@@ -152,6 +159,22 @@ createdb(const CreatedbStmt *stmt)
                          errmsg("conflicting or redundant options")));
             dencoding = defel;
         }
+        else if (strcmp(defel->defname, "collate") == 0)
+        {
+            if (dcollation)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
+            dcollation = defel;
+        }
+        else if (strcmp(defel->defname, "ctype") == 0)
+        {
+            if (dctype)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
+            dctype = defel;
+        }
         else if (strcmp(defel->defname, "connectionlimit") == 0)
         {
             if (dconnlimit)
@@ -205,6 +228,11 @@ createdb(const CreatedbStmt *stmt)
             elog(ERROR, "unrecognized node type: %d",
                  nodeTag(dencoding->arg));
     }
+    if (dcollation && dcollation->arg)
+        lc_collate = strVal(dcollation->arg);
+    if (dctype && dctype->arg)
+        lc_ctype = strVal(dctype->arg);
+
     if (dconnlimit && dconnlimit->arg)
         dbconnlimit = intVal(dconnlimit->arg);

@@ -243,7 +271,8 @@ createdb(const CreatedbStmt *stmt)
     if (!get_db_info(dbtemplate, ShareLock,
                      &src_dboid, &src_owner, &src_encoding,
                      &src_istemplate, &src_allowconn, &src_lastsysoid,
-                     &src_frozenxid, &src_deftablespace))
+                     &src_frozenxid, &src_deftablespace,
+                     &src_collation, &src_ctype))
         ereport(ERROR,
                 (errcode(ERRCODE_UNDEFINED_DATABASE),
                  errmsg("template database \"%s\" does not exist",
@@ -262,9 +291,13 @@ createdb(const CreatedbStmt *stmt)
                             dbtemplate)));
     }

-    /* If encoding is defaulted, use source's encoding */
+    /* If encoding or locales are defaulted, use source's setting */
     if (encoding < 0)
         encoding = src_encoding;
+    if (lc_collate == NULL)
+        lc_collate = src_collation;
+    if (lc_ctype == NULL)
+        lc_ctype = src_ctype;

     /* Some encodings are client only */
     if (!PG_VALID_BE_ENCODING(encoding))
@@ -272,6 +305,16 @@ createdb(const CreatedbStmt *stmt)
                 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                  errmsg("invalid server encoding %d", encoding)));

+    /* Check that the chosen locales are valid */
+    if (!check_locale(LC_COLLATE, lc_collate))
+        ereport(ERROR,
+                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                 errmsg("invalid locale name %s", lc_collate)));
+    if (!check_locale(LC_CTYPE, lc_ctype))
+        ereport(ERROR,
+                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                 errmsg("invalid locale name %s", lc_ctype)));
+
     /*
      * Check whether encoding matches server locale settings.  We allow
      * mismatch in three cases:
@@ -290,7 +333,7 @@ createdb(const CreatedbStmt *stmt)
      *
      * Note: if you change this policy, fix initdb to match.
      */
-    ctype_encoding = pg_get_encoding_from_locale(NULL);
+    ctype_encoding = pg_get_encoding_from_locale(lc_ctype);

     if (!(ctype_encoding == encoding ||
           ctype_encoding == PG_SQL_ASCII ||
@@ -299,12 +342,32 @@ createdb(const CreatedbStmt *stmt)
 #endif
           (encoding == PG_SQL_ASCII && superuser())))
         ereport(ERROR,
-                (errmsg("encoding %s does not match server's locale %s",
+                (errmsg("encoding %s does not match locale %s",
                         pg_encoding_to_char(encoding),
-                        setlocale(LC_CTYPE, NULL)),
-             errdetail("The server's LC_CTYPE setting requires encoding %s.",
+                        lc_ctype),
+             errdetail("The chosen LC_CTYPE setting requires encoding %s.",
                        pg_encoding_to_char(ctype_encoding))));

+    /*
+     * Check that the new locale is compatible with the source database.
+     *
+     * We know that template0 doesn't contain any indexes that depend on
+     * collation or ctype, so template0 can be used as template for
+     * any locale.
+     */
+    if (strcmp(dbtemplate, "template0") != 0)
+    {
+        if (strcmp(lc_collate, src_collation))
+            ereport(ERROR,
+                    (errmsg("new collation is incompatible with the collation of the template database (%s)",
src_collation),
+                     errhint("Use the same collation as in the template database, or use template0 as template")));
+
+        if (strcmp(lc_ctype, src_ctype))
+            ereport(ERROR,
+                    (errmsg("new ctype is incompatible with the ctype of the template database (%s)", src_ctype),
+                     errhint("Use the same ctype as in the template database, or use template0 as template")));
+    }
+
     /* Resolve default tablespace for new database */
     if (dtablespacename && dtablespacename->arg)
     {
@@ -421,6 +484,8 @@ createdb(const CreatedbStmt *stmt)
         DirectFunctionCall1(namein, CStringGetDatum(dbname));
     new_record[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(datdba);
     new_record[Anum_pg_database_encoding - 1] = Int32GetDatum(encoding);
+    new_record[Anum_pg_database_collation - 1] = DirectFunctionCall1(namein, CStringGetDatum(lc_collate));
+    new_record[Anum_pg_database_ctype - 1] = DirectFunctionCall1(namein, CStringGetDatum(lc_ctype));
     new_record[Anum_pg_database_datistemplate - 1] = BoolGetDatum(false);
     new_record[Anum_pg_database_datallowconn - 1] = BoolGetDatum(true);
     new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit);
@@ -629,7 +694,7 @@ dropdb(const char *dbname, bool missing_ok)
     pgdbrel = heap_open(DatabaseRelationId, RowExclusiveLock);

     if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-                     &db_istemplate, NULL, NULL, NULL, NULL))
+                     &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL))
     {
         if (!missing_ok)
         {
@@ -781,7 +846,7 @@ RenameDatabase(const char *oldname, const char *newname)
     rel = heap_open(DatabaseRelationId, RowExclusiveLock);

     if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
-                     NULL, NULL, NULL, NULL, NULL))
+                     NULL, NULL, NULL, NULL, NULL, NULL, NULL))
         ereport(ERROR,
                 (errcode(ERRCODE_UNDEFINED_DATABASE),
                  errmsg("database \"%s\" does not exist", oldname)));
@@ -1168,7 +1233,7 @@ get_db_info(const char *name, LOCKMODE lockmode,
             Oid *dbIdP, Oid *ownerIdP,
             int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
             Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
-            Oid *dbTablespace)
+            Oid *dbTablespace, char **dbCollation, char **dbCtype)
 {
     bool        result = false;
     Relation    relation;
@@ -1259,6 +1324,11 @@ get_db_info(const char *name, LOCKMODE lockmode,
                 /* default tablespace for this database */
                 if (dbTablespace)
                     *dbTablespace = dbform->dattablespace;
+                 /* default locale settings for this database */
+                 if (dbCollation)
+                     *dbCollation = pstrdup(NameStr(dbform->collation));
+                 if (dbCtype)
+                     *dbCtype = pstrdup(NameStr(dbform->ctype));
                 ReleaseSysCache(tuple);
                 result = true;
                 break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 5ff353d..0831a94 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -398,7 +398,7 @@ static TypeName *TableFuncTypeName(List *columns);
     CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
     COMMITTED CONCURRENTLY CONFIGURATION CONNECTION CONSTRAINT CONSTRAINTS
     CONTENT_P CONTINUE_P CONVERSION_P COPY COST CREATE CREATEDB
-    CREATEROLE CREATEUSER CROSS CSV CURRENT_P CURRENT_DATE CURRENT_ROLE
+    CREATEROLE CREATEUSER CROSS CSV CTYPE CURRENT_P CURRENT_DATE CURRENT_ROLE
     CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE

     DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
@@ -5458,6 +5458,22 @@ createdb_opt_item:
                 {
                     $$ = makeDefElem("encoding", NULL);
                 }
+            | COLLATE opt_equal Sconst
+                {
+                    $$ = makeDefElem("collate", (Node *)makeString($3));
+                }
+            | COLLATE opt_equal DEFAULT
+                {
+                    $$ = makeDefElem("collate", NULL);
+                }
+            | CTYPE opt_equal Sconst
+                {
+                    $$ = makeDefElem("ctype", (Node *)makeString($3));
+                }
+            | CTYPE opt_equal DEFAULT
+                {
+                    $$ = makeDefElem("ctype", NULL);
+                }
             | CONNECTION LIMIT opt_equal SignedIconst
                 {
                     $$ = makeDefElem("connectionlimit", (Node *)makeInteger($4));
@@ -9216,6 +9232,7 @@ unreserved_keyword:
             | CREATEROLE
             | CREATEUSER
             | CSV
+            | CTYPE
             | CURRENT_P
             | CURSOR
             | CYCLE
diff --git a/src/backend/parser/keywords.c b/src/backend/parser/keywords.c
index b30a478..e2fe4bf 100644
--- a/src/backend/parser/keywords.c
+++ b/src/backend/parser/keywords.c
@@ -114,6 +114,7 @@ const ScanKeyword ScanKeywords[] = {
     {"createuser", CREATEUSER, UNRESERVED_KEYWORD},
     {"cross", CROSS, TYPE_FUNC_NAME_KEYWORD},
     {"csv", CSV, UNRESERVED_KEYWORD},
+    {"ctype", CTYPE, UNRESERVED_KEYWORD},
     {"current", CURRENT_P, UNRESERVED_KEYWORD},
     {"current_date", CURRENT_DATE, RESERVED_KEYWORD},
     {"current_role", CURRENT_ROLE, RESERVED_KEYWORD},
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index c0a01ae..4a55be5 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -189,6 +189,30 @@ pg_perm_setlocale(int category, const char *locale)
 }


+/*
+ * Is the locale name valid for the locale category?
+ */
+bool
+check_locale(int category, const char *value)
+{
+    char       *save;
+    bool        ret;
+
+    save = setlocale(category, NULL);
+    if (!save)
+        return false;            /* won't happen, we hope */
+
+    /* save may be pointing at a modifiable scratch variable, see above */
+    save = pstrdup(save);
+
+    ret = (setlocale(category, value) != NULL);
+
+    setlocale(category, save);    /* assume this won't fail */
+    pfree(save);
+
+    return ret;
+}
+
 /* GUC assign hooks */

 /*
@@ -203,21 +227,9 @@ pg_perm_setlocale(int category, const char *locale)
 static const char *
 locale_xxx_assign(int category, const char *value, bool doit, GucSource source)
 {
-    char       *save;
-
-    save = setlocale(category, NULL);
-    if (!save)
-        return NULL;            /* won't happen, we hope */
-
-    /* save may be pointing at a modifiable scratch variable, see above */
-    save = pstrdup(save);
-
-    if (!setlocale(category, value))
+    if (!check_locale(category, value))
         value = NULL;            /* set failure return marker */

-    setlocale(category, save);    /* assume this won't fail */
-    pfree(save);
-
     /* need to reload cache next time? */
     if (doit && value != NULL)
     {
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 461edd9..08fbf3b 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -159,6 +159,8 @@ CheckMyDatabase(const char *name, bool am_superuser)
 {
     HeapTuple    tup;
     Form_pg_database dbform;
+    char       *collate;
+    char       *ctype;

     /* Fetch our real pg_database row */
     tup = SearchSysCache(DATABASEOID,
@@ -240,6 +242,28 @@ CheckMyDatabase(const char *name, bool am_superuser)
     /* If we have no other source of client_encoding, use server encoding */
     SetConfigOption("client_encoding", GetDatabaseEncodingName(),
                     PGC_BACKEND, PGC_S_DEFAULT);
+
+    /* assign locale variables */
+    collate = NameStr(dbform->collation);
+    ctype = NameStr(dbform->ctype);
+
+    if (setlocale(LC_COLLATE, collate) == NULL)
+        ereport(FATAL,
+            (errmsg("database locale is incompatible with operating system"),
+            errdetail("The database was initialized with LC_COLLATE \"%s\", "
+                        " which is not recognized by setlocale().", collate),
+            errhint("Recreate the database with another locale or install the missing locale.")));
+
+    if (setlocale(LC_CTYPE, ctype) == NULL)
+        ereport(FATAL,
+            (errmsg("database locale is incompatible with operating system"),
+            errdetail("The database was initialized with LC_CTYPE \"%s\", "
+                        " which is not recognized by setlocale().", ctype),
+            errhint("Recreate the database with another locale or install the missing locale.")));
+
+    /* Make the locale settings visible as GUC variables, too */
+    SetConfigOption("lc_collate", collate, PGC_INTERNAL, PGC_S_DATABASE);
+    SetConfigOption("lc_ctype", ctype, PGC_INTERNAL, PGC_S_DATABASE);

     /*
      * Lastly, set up any database-specific configuration variables.
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 3effafb..4b85281 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1353,6 +1353,10 @@ bootstrap_template1(char *short_version)

     bki_lines = replace_token(bki_lines, "ENCODING", encodingid);

+    bki_lines = replace_token(bki_lines, "LC_COLLATE", lc_collate);
+
+    bki_lines = replace_token(bki_lines, "LC_CTYPE", lc_ctype);
+
     /*
      * Pass correct LC_xxx environment to bootstrap.
      *
@@ -2378,12 +2382,12 @@ usage(const char *progname)
     printf(_("\nOptions:\n"));
     printf(_(" [-D, --pgdata=]DATADIR     location for this database cluster\n"));
     printf(_("  -E, --encoding=ENCODING   set default encoding for new databases\n"));
-    printf(_("  --locale=LOCALE           initialize database cluster with given locale\n"));
+    printf(_("  --locale=LOCALE           set default locale for new databases\n"));
     printf(_("  --lc-collate, --lc-ctype, --lc-messages=LOCALE\n"
              "  --lc-monetary, --lc-numeric, --lc-time=LOCALE\n"
-             "                            initialize database cluster with given locale\n"
-             "                            in the respective category (default taken from\n"
-             "                            environment)\n"));
+             "                            set default locale in the respective\n"
+             "                            category for new databases (default\n"
+             "                            taken from environment)\n"));
     printf(_("  --no-locale               equivalent to --locale=C\n"));
     printf(_("  -T, --text-search-config=CFG\n"
          "                            default text search configuration\n"));
@@ -2806,10 +2810,12 @@ main(int argc, char *argv[])
         strcmp(lc_ctype, lc_numeric) == 0 &&
         strcmp(lc_ctype, lc_monetary) == 0 &&
         strcmp(lc_ctype, lc_messages) == 0)
-        printf(_("The database cluster will be initialized with locale %s.\n"), lc_ctype);
+        printf(_("The template databases will be initialized with locale %s.\n"), lc_ctype);
     else
     {
-        printf(_("The database cluster will be initialized with locales\n"
+        /* XXX only collate and ctype are actually set in stone here, others
+         * are userset gucs */
+        printf(_("The template databases will be initialized with locales\n"
                  "  COLLATE:  %s\n"
                  "  CTYPE:    %s\n"
                  "  MESSAGES: %s\n"
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 7970c32..08e102f 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -220,12 +220,5 @@ main(int argc, char *argv[])
            (ControlFile.float4ByVal ? _("by value") : _("by reference")));
     printf(_("Float8 argument passing:              %s\n"),
            (ControlFile.float8ByVal ? _("by value") : _("by reference")));
-    printf(_("Maximum length of locale name:        %u\n"),
-           ControlFile.localeBuflen);
-    printf(_("LC_COLLATE:                           %s\n"),
-           ControlFile.lc_collate);
-    printf(_("LC_CTYPE:                             %s\n"),
-           ControlFile.lc_ctype);
-
     return 0;
 }
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 345b89c..0df796b 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -493,22 +493,6 @@ GuessControlValues(void)
 #endif
     ControlFile.float4ByVal = FLOAT4PASSBYVAL;
     ControlFile.float8ByVal = FLOAT8PASSBYVAL;
-    ControlFile.localeBuflen = LOCALE_NAME_BUFLEN;
-
-    localeptr = setlocale(LC_COLLATE, "");
-    if (!localeptr)
-    {
-        fprintf(stderr, _("%s: invalid LC_COLLATE setting\n"), progname);
-        exit(1);
-    }
-    strlcpy(ControlFile.lc_collate, localeptr, sizeof(ControlFile.lc_collate));
-    localeptr = setlocale(LC_CTYPE, "");
-    if (!localeptr)
-    {
-        fprintf(stderr, _("%s: invalid LC_CTYPE setting\n"), progname);
-        exit(1);
-    }
-    strlcpy(ControlFile.lc_ctype, localeptr, sizeof(ControlFile.lc_ctype));

     /*
      * XXX eventually, should try to grovel through old XLOG to develop more
@@ -584,12 +568,6 @@ PrintControlValues(bool guessed)
            (ControlFile.float4ByVal ? _("by value") : _("by reference")));
     printf(_("Float8 argument passing:              %s\n"),
            (ControlFile.float8ByVal ? _("by value") : _("by reference")));
-    printf(_("Maximum length of locale name:        %u\n"),
-           ControlFile.localeBuflen);
-    printf(_("LC_COLLATE:                           %s\n"),
-           ControlFile.lc_collate);
-    printf(_("LC_CTYPE:                             %s\n"),
-           ControlFile.lc_ctype);
 }


diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 286667e..37d41de 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -32,6 +32,8 @@ main(int argc, char *argv[])
         {"tablespace", required_argument, NULL, 'D'},
         {"template", required_argument, NULL, 'T'},
         {"encoding", required_argument, NULL, 'E'},
+        {"lc-collate", required_argument, NULL, 1},
+        {"lc-ctype", required_argument, NULL, 2},
         {NULL, 0, NULL, 0}
     };

@@ -50,6 +52,8 @@ main(int argc, char *argv[])
     char       *tablespace = NULL;
     char       *template = NULL;
     char       *encoding = NULL;
+    char       *lc_collate = NULL;
+    char       *lc_ctype = NULL;

     PQExpBufferData sql;

@@ -95,6 +99,12 @@ main(int argc, char *argv[])
             case 'E':
                 encoding = optarg;
                 break;
+            case 1:
+                lc_collate = optarg;
+                break;
+            case 2:
+                lc_ctype = optarg;
+                break;
             default:
                 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
                 exit(1);
@@ -152,6 +162,11 @@ main(int argc, char *argv[])
         appendPQExpBuffer(&sql, " ENCODING '%s'", encoding);
     if (template)
         appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template));
+    if (lc_collate)
+        appendPQExpBuffer(&sql, " COLLATE '%s'", lc_collate);
+    if (lc_ctype)
+        appendPQExpBuffer(&sql, " CTYPE '%s'", lc_ctype);
+
     appendPQExpBuffer(&sql, ";\n");

     conn = connectDatabase(strcmp(dbname, "postgres") == 0 ? "template1" : "postgres",
@@ -209,6 +224,9 @@ help(const char *progname)
     printf(_("\nOptions:\n"));
     printf(_("  -D, --tablespace=TABLESPACE  default tablespace for the database\n"));
     printf(_("  -E, --encoding=ENCODING      encoding for the database\n"));
+    printf(_("  --lc-collate=LOCALE          LC_COLLATE setting for the database\n"));
+    printf(_("  --lc-ctype=LOCALE            LC_CTYPE setting for the database\n"));
+
     printf(_("  -O, --owner=OWNER            database user to own the new database\n"));
     printf(_("  -T, --template=TEMPLATE      template database to copy\n"));
     printf(_("  -e, --echo                   show the commands being sent to the server\n"));
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 38b5a84..aee6934 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -144,11 +144,6 @@ typedef struct ControlFileData
     bool        float4ByVal;    /* float4 pass-by-value? */
     bool        float8ByVal;    /* float8, int8, etc pass-by-value? */

-    /* active locales */
-    uint32        localeBuflen;
-    char        lc_collate[LOCALE_NAME_BUFLEN];
-    char        lc_ctype[LOCALE_NAME_BUFLEN];
-
     /* CRC of all above ... MUST BE LAST! */
     pg_crc32    crc;
 } ControlFileData;
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index 6e9e5d2..1b10e60 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -33,6 +33,8 @@ CATALOG(pg_database,1262) BKI_SHARED_RELATION
     NameData    datname;        /* database name */
     Oid            datdba;            /* owner of database */
     int4        encoding;        /* character encoding */
+    NameData    collation;        /* LC_COLLATE of database */
+    NameData    ctype;            /* LC_CTYPE of database */
     bool        datistemplate;    /* allowed as CREATE DATABASE template? */
     bool        datallowconn;    /* new connections allowed? */
     int4        datconnlimit;    /* max connections allowed (-1=no limit) */
@@ -54,20 +56,22 @@ typedef FormData_pg_database *Form_pg_database;
  *        compiler constants for pg_database
  * ----------------
  */
-#define Natts_pg_database                11
+#define Natts_pg_database                13
 #define Anum_pg_database_datname        1
 #define Anum_pg_database_datdba            2
 #define Anum_pg_database_encoding        3
-#define Anum_pg_database_datistemplate    4
-#define Anum_pg_database_datallowconn    5
-#define Anum_pg_database_datconnlimit    6
-#define Anum_pg_database_datlastsysoid    7
-#define Anum_pg_database_datfrozenxid    8
-#define Anum_pg_database_dattablespace    9
-#define Anum_pg_database_datconfig        10
-#define Anum_pg_database_datacl            11
+#define Anum_pg_database_collation        4
+#define Anum_pg_database_ctype            5
+#define Anum_pg_database_datistemplate    6
+#define Anum_pg_database_datallowconn    7
+#define Anum_pg_database_datconnlimit    8
+#define Anum_pg_database_datlastsysoid    9
+#define Anum_pg_database_datfrozenxid    10
+#define Anum_pg_database_dattablespace    11
+#define Anum_pg_database_datconfig        12
+#define Anum_pg_database_datacl            13

-DATA(insert OID = 1 (  template1 PGUID ENCODING t t -1 0 0 1663 _null_ _null_ ));
+DATA(insert OID = 1 (  template1 PGUID ENCODING "LC_COLLATE" "LC_CTYPE" t t -1 0 0 1663 _null_ _null_));
 SHDESCR("default template database");
 #define TemplateDbOid            1

diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 5a49823..2b60027 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -39,6 +39,7 @@ extern const char *locale_numeric_assign(const char *value,
 extern const char *locale_time_assign(const char *value,
                    bool doit, GucSource source);

+extern bool check_locale(int category, const char *locale);
 extern char *pg_perm_setlocale(int category, const char *locale);

 extern bool lc_collate_is_c(void);
diff --git a/src/interfaces/ecpg/preproc/preproc.y b/src/interfaces/ecpg/preproc/preproc.y
index 0a8b62b..949e76b 100644
--- a/src/interfaces/ecpg/preproc/preproc.y
+++ b/src/interfaces/ecpg/preproc/preproc.y
@@ -428,7 +428,7 @@ add_typedef(char *name, char * dimension, char * length, enum ECPGttype type_enu
     CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
     COMMITTED CONCURRENTLY CONFIGURATION CONNECTION CONSTRAINT CONSTRAINTS
     CONTENT_P CONTINUE_P CONVERSION_P COPY COST CREATE CREATEDB
-    CREATEROLE CREATEUSER CROSS CSV CURRENT_P CURRENT_DATE CURRENT_ROLE
+    CREATEROLE CREATEUSER CROSS CSV CTYPE CURRENT_P CURRENT_DATE CURRENT_ROLE
     CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE

     DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS

Re: WIP patch: Collation support

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Here's an updated version of the stripped-down patch, now with 
> documentation changes, plus a couple of minor bug fixes.

> I think this is looking pretty good now, and I would appreciate review 
> from others before I go ahead committing this.

I guess the $64 question is whether this moves us towards where we want to
be?  If the ultimate goal is to get to per-column collations, I think this
patch is likely to be counterproductive, because it will establish syntax
and semantics that we'll have to try to remain compatible with.  If we're
going to be satisfied with per-database collations, then great, let's do
it.


As far as reviewing the patch itself goes:

* It's missing pg_dump (or I guess really pg_dumpall) support.

* Please call the new columns datcollate and datctype.  Just because
someone ignored the naming convention years ago for pg_database.encoding
doesn't mean we should ignore it here.  Also, if you're going to name one
column based on an LC_foo variable's name, do the other one the same way.
I note also that this would match the keywords used in CREATE DATABASE.

* Don't even think of adding system catalog columns without updating
catalogs.sgml.

* You should try to get rid of LOCALE_NAME_BUFLEN altogether.  Definitely
the comment about it in pg_control.h is now obsolete.

* You *must* bump PG_CONTROL_VERSION when you change its layout.
(Don't forget catversion too, since you're also changing pg_database)

* This doc sentence reads a bit awkwardly:
   An important restriction, however, is that each database character set   must be compatible with the database's
<envar>LC_CTYPE</>setting.
 

Maybe
   An important restriction, however, is that each database's character set   must be compatible with the database's
<envar>LC_CTYPE</>setting.
 

Also I wonder whether we shouldn't say that it must be compatible with
LC_CTYPE *and* LC_COLLATE.

* This bit in the CREATE DATABASE ref page is also awkward:
   Any character set encoding specified for the new database must be   compatible with the chosen COLLATE and CTYPE
settings.

Maybe
   The character set encoding used for the new database must be   compatible with the chosen COLLATE and CTYPE
settings.

* This bit doesn't look "ready to commit":

***************
*** 754,759 ****
--- 749,756 ----      <option>-l</option> option or the <command>\l</command> command      of <command>psql</command>.

+ XXX It would be nice to show the "korean" database here. And we should
+ modify psql to show locale as well <screen> $ <userinput>psql -l</userinput>             List of databases

* This makes sense, but then shouldn't we make the identical restriction
for encoding?

+    The <literal>COLLATE</> and <literal>CTYPE</> settings must match
+    those of the template database, except when template0 is used as
+    template. This is because <literal>COLLATE</> and <literal>CTYPE</>

* I can't tell whether the writer of this bit thought that a blank line
would come out as a paragraph break or not.  Either add <para>'s or
remove the blank lines to make it look like one para in the source:

--- 76,105 ----    <para>    <command>initdb</command> initializes the database cluster's default
!    locale and character set encoding. 
! 
!    The character set encoding, collation order (<literal>LC_COLLATE</>)
!    and character set classes (<literal>LC_CTYPE</>, e.g. upper, lower,
!    digit) can be set separately for a database when it is created.
!    <command>initdb</command> determines those settings for the
!    <literal>template1</literal> database, which will serve as the
!    default for all other databases.
! 
!    To alter the default collation order or character set classes, use the
!    <option>--lc-collate</option> and <option>--lc-ctype</option> options.
!    Collation orders other than <literal>C</> or <literal>POSIX</> also have
!    a performance penalty.  For these reasons it is important to choose the
!    right locale when running <command>initdb</command>. 
! 
!    The remaining locale categories can be changed later when the server
!    is started.  You can also use <option>--locale</option> to set the
!    default for all locale categories, including collation order and
!    character set classes. All server locale values (<literal>lc_*</>) can
!    be displayed via <command>SHOW ALL</>.    More details can be found in <xref linkend="locale">. 
!    To alter the default encoding, use the <option>--encoding</option>.
!    More details can be found in <xref linkend="multibyte">.   </para>

* This bit for resetxlog is just wrong now:

--- 62,70 ----    by specifying the <literal>-f</> (force) switch.  In this case plausible    values will be
substitutedfor the missing data.  Most of the fields can be    expected to match, but manual assistance might be needed
forthe next OID,
 
!    next transaction ID and epoch, next multitransaction ID and offset, and
!    WAL starting address fields.
!    The first five of these can be set using the switches discussed below.    If you are not able to determine correct
valuesfor all these fields,    <literal>-f</> can still be used, but    the recovered database must be treated with
evenmore suspicion than
 

I guess it should just say "These fields can be set using the switches
discussed below.", since there aren't any that can't be.

* This is no good:

+     SetConfigOption("lc_collate", collate, PGC_INTERNAL, PGC_S_DATABASE);
+     SetConfigOption("lc_ctype", ctype, PGC_INTERNAL, PGC_S_DATABASE);

The source *must* be PGC_S_OVERRIDE to ensure it can't be overridden.
        regards, tom lane


Re: WIP patch: Collation support

From
Heikki Linnakangas
Date:
Heikki Linnakangas wrote:
> Here's an updated version of the stripped-down patch, now with
> documentation changes, plus a couple of minor bug fixes.

Another update, marching towards committing. Now with pg_dump/pg_dumpall
support, and collation/ctype is also shown in psql \l output.

I wonder if we should provide a shorthand "LOCALE=<localename>" option,
in addition to separate COLLATE and CTYPE options? Most people will
always set them together. It could also set the rest of the locale
categories, lc_numeric, lc_monetary, and so on, as database-specific
options. Like "ALTER DATABASE <dbname> SET lc_messages=<localename> ..."
would.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index f484db8..5581b10 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -130,23 +130,24 @@ initdb --locale=sv_SE

    <para>
     The nature of some locale categories is that their value has to be
-    fixed for the lifetime of a database cluster.  That is, once
-    <command>initdb</command> has run, you cannot change them anymore.
-    <literal>LC_COLLATE</literal> and <literal>LC_CTYPE</literal> are
-    those categories.  They affect the sort order of indexes, so they
-    must be kept fixed, or indexes on text columns will become corrupt.
-    <productname>PostgreSQL</productname> enforces this by recording
-    the values of <envar>LC_COLLATE</> and <envar>LC_CTYPE</> that are
-    seen by <command>initdb</>.  The server automatically adopts
-    those two values when it is started.
+    fixed when the database is created.  You can use different settings
+    for different databases, but once a database is created, you cannot
+    change them for that database anymore. <literal>LC_COLLATE</literal>
+    and <literal>LC_CTYPE</literal> are those categories.  They affect
+    the sort order of indexes, so they must be kept fixed, or indexes on
+    text columns will become corrupt.  The default values for these
+    categories are defined when <command>initdb</command> is run, and
+    those values are used when new databases are created, unless
+    explicitly specified otherwise in the <command>CREATE
+    DATABASE</command> command.
    </para>

    <para>
     The other locale categories can be changed as desired whenever the
     server is running by setting the run-time configuration variables
     that have the same name as the locale categories (see <xref
-    linkend="runtime-config-client-format"> for details).  The defaults that are
-    chosen by <command>initdb</command> are actually only written into
+    linkend="runtime-config-client-format"> for details).  The defaults
+    that are chosen by <command>initdb</command> are actually only written into
     the configuration file <filename>postgresql.conf</filename> to
     serve as defaults when the server is started.  If you delete these
     assignments from <filename>postgresql.conf</filename> then the
@@ -261,7 +262,7 @@ initdb --locale=sv_SE

    <para>
     Check that <productname>PostgreSQL</> is actually using the locale
-    that you think it is.  <envar>LC_COLLATE</> and <envar>LC_CTYPE</>
+    that you think it is.  The default <envar>LC_COLLATE</> and <envar>LC_CTYPE</>
     settings are determined at <command>initdb</> time and cannot be
     changed without repeating <command>initdb</>.  Other locale
     settings including <envar>LC_MESSAGES</> and <envar>LC_MONETARY</>
@@ -320,16 +321,10 @@ initdb --locale=sv_SE

   <para>
    An important restriction, however, is that each database character set
-   must be compatible with the server's <envar>LC_CTYPE</> setting.
+   must be compatible with the database's <envar>LC_CTYPE</> setting.
    When <envar>LC_CTYPE</> is <literal>C</> or <literal>POSIX</>, any
    character set is allowed, but for other settings of <envar>LC_CTYPE</>
    there is only one character set that will work correctly.
-   Since the <envar>LC_CTYPE</> setting is frozen by <command>initdb</>, the
-   apparent flexibility to use different encodings in different databases
-   of a cluster is more theoretical than real, except when you select
-   <literal>C</> or <literal>POSIX</> locale (thus disabling any real locale
-   awareness).  It is likely that these mechanisms will be revisited in future
-   versions of <productname>PostgreSQL</productname>.
   </para>

    <sect2 id="multibyte-charset-supported">
@@ -734,19 +729,19 @@ initdb -E EUC_JP
     </para>

     <para>
-     If you have selected <literal>C</> or <literal>POSIX</> locale,
-     you can create a database with a different character set:
+     You can specify a non-default encoding at database creation time,
+     provided that the encoding is compatible with the selected locale:

 <screen>
-createdb -E EUC_KR korean
+createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr korean
 </screen>

      This will create a database named <literal>korean</literal> that
-     uses the character set <literal>EUC_KR</literal>.  Another way to
-     accomplish this is to use this SQL command:
+     uses the character set <literal>EUC_KR</literal>, and locale <literal>ko_KR</literal>.
+     Another way to accomplish this is to use this SQL command:

 <programlisting>
-CREATE DATABASE korean WITH ENCODING 'EUC_KR';
+CREATE DATABASE korean WITH ENCODING 'EUC_KR' COLLATE='ko_KR.euckr' CTYPE='ko_KR.euckr' TEMPLATE=template0;
 </programlisting>

      The encoding for a database is stored in the system catalog
@@ -756,20 +751,17 @@ CREATE DATABASE korean WITH ENCODING 'EUC_KR';

 <screen>
 $ <userinput>psql -l</userinput>
-            List of databases
-   Database    |  Owner  |   Encoding
----------------+---------+---------------
- euc_cn        | t-ishii | EUC_CN
- euc_jp        | t-ishii | EUC_JP
- euc_kr        | t-ishii | EUC_KR
- euc_tw        | t-ishii | EUC_TW
- mule_internal | t-ishii | MULE_INTERNAL
- postgres      | t-ishii | EUC_JP
- regression    | t-ishii | SQL_ASCII
- template1     | t-ishii | EUC_JP
- test          | t-ishii | EUC_JP
- utf8          | t-ishii | UTF8
-(9 rows)
+                                         List of databases
+   Name    |  Owner   | Encoding  |  Collation  |    Ctype    |          Access Privileges
+-----------+----------+-----------+-------------+-------------+-------------------------------------
+ clocaledb | hlinnaka | SQL_ASCII | C           | C           |
+ englishdb | hlinnaka | UTF8      | en_GB.UTF8  | en_GB.UTF8  |
+ japanese  | hlinnaka | UTF8      | ja_JP.UTF8  | ja_JP.UTF8  |
+ korean    | hlinnaka | EUC_KR    | ko_KR.euckr | ko_KR.euckr |
+ postgres  | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  |
+ template0 | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  | {=c/hlinnaka,hlinnaka=CTc/hlinnaka}
+ template1 | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  | {=c/hlinnaka,hlinnaka=CTc/hlinnaka}
+(7 rows)
 </screen>
     </para>

diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 2d4a7bf..f02a957 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -157,7 +157,7 @@ CREATE INDEX test1_id_index ON test1 (id);
    <emphasis>if</emphasis> the pattern is a constant and is anchored to
    the beginning of the string — for example, <literal>col LIKE
    'foo%'</literal> or <literal>col ~ '^foo'</literal>, but not
-   <literal>col LIKE '%bar'</literal>. However, if your server does not
+   <literal>col LIKE '%bar'</literal>. However, if your database does not
    use the C locale you will need to create the index with a special
    operator class to support indexing of pattern-matching queries. See
    <xref linkend="indexes-opclass"> below. It is also possible to use
@@ -922,7 +922,7 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
       according to the locale-specific collation rules.  This makes
       these operator classes suitable for use by queries involving
       pattern matching expressions (<literal>LIKE</literal> or POSIX
-      regular expressions) when the server does not use the standard
+      regular expressions) when the database does not use the standard
       <quote>C</quote> locale.  As an example, you might index a
       <type>varchar</type> column like this:
 <programlisting>
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 95350c4..ed048b0 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -24,6 +24,8 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
     [ [ WITH ] [ OWNER [=] <replaceable class="parameter">dbowner</replaceable> ]
            [ TEMPLATE [=] <replaceable class="parameter">template</replaceable> ]
            [ ENCODING [=] <replaceable class="parameter">encoding</replaceable> ]
+           [ COLLATE [=] <replaceable class="parameter">collation</replaceable> ]
+           [ CTYPE [=] <replaceable class="parameter">ctype</replaceable> ]
            [ TABLESPACE [=] <replaceable class="parameter">tablespace</replaceable> ]
            [ CONNECTION LIMIT [=] <replaceable class="parameter">connlimit</replaceable> ] ]
 </synopsis>
@@ -113,6 +115,29 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
       </listitem>
      </varlistentry>
      <varlistentry>
+      <term><replaceable class="parameter">collation</replaceable></term>
+      <listitem>
+       <para>
+        Collation order (<literal>LC_COLLATE</>) to use in the new database.
+    This affects the sort order applied to strings, e.g in queries with
+        ORDER BY, as well as the order used in indexes on text columns.
+        The default is to use the collation order of the template database.
+        See below for additional restrictions.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><replaceable class="parameter">ctype</replaceable></term>
+      <listitem>
+       <para>
+        Character classification (<literal>LC_CTYPE</>) to use in the new
+        database. This affects the categorization of characters, e.g. lower,
+        upper and digit. The default is to use the character classification of
+        the template database. See below for additional restrictions.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
       <term><replaceable class="parameter">tablespace</replaceable></term>
       <listitem>
        <para>
@@ -181,12 +206,10 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>

   <para>
    Any character set encoding specified for the new database must be
-   compatible with the server's <envar>LC_CTYPE</> locale setting.
+   compatible with the chosen COLLATE and CTYPE settings.
    If <envar>LC_CTYPE</> is <literal>C</> (or equivalently
    <literal>POSIX</>), then all encodings are allowed, but for other
-   locale settings there is only one encoding that will work properly,
-   and so the apparent freedom to specify an encoding is illusory if
-   you didn't initialize the database cluster in <literal>C</> locale.
+   locale settings there is only one encoding that will work properly.
    <command>CREATE DATABASE</> will allow superusers to specify
    <literal>SQL_ASCII</> encoding regardless of the locale setting,
    but this choice is deprecated and may result in misbehavior of
@@ -195,6 +218,16 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
   </para>

   <para>
+   The <literal>COLLATE</> and <literal>CTYPE</> settings must match
+   those of the template database, except when template0 is used as
+   template. This is because <literal>COLLATE</> and <literal>CTYPE</>
+   affects the ordering in indexes, so that any indexes copied from the
+   template database would be invalid in the new database with different
+   settings. <literal>template0</literal>, however, is known to not
+   contain any indexes that would be affected.
+  </para>
+
+  <para>
    The <literal>CONNECTION LIMIT</> option is only enforced approximately;
    if two new sessions start at about the same time when just one
    connection <quote>slot</> remains for the database, it is possible that
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 10a9d1c..8eead18 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -76,25 +76,30 @@ PostgreSQL documentation

   <para>
    <command>initdb</command> initializes the database cluster's default
-   locale and character set encoding. The collation order
-   (<literal>LC_COLLATE</>) and character set classes
-   (<literal>LC_CTYPE</>, e.g. upper, lower, digit) are fixed for all
-   databases and cannot be changed. Collation orders other than
-   <literal>C</> or <literal>POSIX</> also have a performance penalty.
-   For these reasons it is important to choose the right locale when
-   running <command>initdb</command>. The remaining locale categories
-   can be changed later when the server is started. All server locale
-   values (<literal>lc_*</>) can be displayed via <command>SHOW ALL</>.
+   locale and character set encoding.
+
+   The character set encoding, collation order (<literal>LC_COLLATE</>)
+   and character set classes (<literal>LC_CTYPE</>, e.g. upper, lower,
+   digit) can be set separately for a database when it is created.
+   <command>initdb</command> determines those settings for the
+   <literal>template1</literal> database, which will serve as the
+   default for all other databases.
+
+   To alter the default collation order or character set classes, use the
+   <option>--lc-collate</option> and <option>--lc-ctype</option> options.
+   Collation orders other than <literal>C</> or <literal>POSIX</> also have
+   a performance penalty.  For these reasons it is important to choose the
+   right locale when running <command>initdb</command>.
+
+   The remaining locale categories can be changed later when the server
+   is started.  You can also use <option>--locale</option> to set the
+   default for all locale categories, including collation order and
+   character set classes. All server locale values (<literal>lc_*</>) can
+   be displayed via <command>SHOW ALL</>.
    More details can be found in <xref linkend="locale">.
-  </para>

-  <para>
-   The character set encoding can be set separately for a database when
-   it is created. <command>initdb</command> determines the encoding for
-   the <literal>template1</literal> database, which will serve as the
-   default for all other databases. To alter the default encoding use
-   the <option>--encoding</option> option. More details can be found in
-   <xref linkend="multibyte">.
+   To alter the default encoding, use the <option>--encoding</option>.
+   More details can be found in <xref linkend="multibyte">.
   </para>

  </refsect1>
diff --git a/doc/src/sgml/ref/pg_controldata.sgml b/doc/src/sgml/ref/pg_controldata.sgml
index b379df0..7f50b7b 100644
--- a/doc/src/sgml/ref/pg_controldata.sgml
+++ b/doc/src/sgml/ref/pg_controldata.sgml
@@ -30,7 +30,7 @@ PostgreSQL documentation
   <title>Description</title>
   <para>
    <command>pg_controldata</command> prints information initialized during
-   <command>initdb</>, such as the catalog version and server locale.
+   <command>initdb</>, such as the catalog version.
    It also shows information about write-ahead logging and checkpoint
    processing.  This information is cluster-wide, and not specific to any one
    database.
diff --git a/doc/src/sgml/ref/pg_resetxlog.sgml b/doc/src/sgml/ref/pg_resetxlog.sgml
index 1933455..f7331e4 100644
--- a/doc/src/sgml/ref/pg_resetxlog.sgml
+++ b/doc/src/sgml/ref/pg_resetxlog.sgml
@@ -62,12 +62,9 @@ PostgreSQL documentation
    by specifying the <literal>-f</> (force) switch.  In this case plausible
    values will be substituted for the missing data.  Most of the fields can be
    expected to match, but manual assistance might be needed for the next OID,
-   next transaction ID and epoch, next multitransaction ID and offset,
-   WAL starting address, and database locale fields.
-   The first six of these can be set using the switches discussed below.
-   <command>pg_resetxlog</command>'s own environment is the source for its
-   guess at the locale fields; take care that <envar>LANG</> and so forth
-   match the environment that <command>initdb</> was run in.
+   next transaction ID and epoch, next multitransaction ID and offset, and
+   WAL starting address fields.
+   The first five of these can be set using the switches discussed below.
    If you are not able to determine correct values for all these fields,
    <literal>-f</> can still be used, but
    the recovered database must be treated with even more suspicion than
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index e463d15..a1be7cb 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -747,8 +747,7 @@ SELECT name FROM distributors ORDER BY code;

    <para>
     Character-string data is sorted according to the locale-specific
-    collation order that was established when the database cluster
-    was initialized.
+    collation order that was established when the database was created.
    </para>
   </refsect2>

diff --git a/doc/src/sgml/ref/show.sgml b/doc/src/sgml/ref/show.sgml
index 3d238c3..461a9c8 100644
--- a/doc/src/sgml/ref/show.sgml
+++ b/doc/src/sgml/ref/show.sgml
@@ -82,8 +82,8 @@ SHOW ALL
          <para>
           Shows the database's locale setting for collation (text
           ordering).  At present, this parameter can be shown but not
-          set, because the setting is determined at
-          <command>initdb</> time.
+          set, because the setting is determined at database creation
+          time.
          </para>
         </listitem>
        </varlistentry>
@@ -94,8 +94,8 @@ SHOW ALL
          <para>
           Shows the database's locale setting for character
           classification.  At present, this parameter can be shown but
-          not set, because the setting is determined at
-          <command>initdb</> time.
+          not set, because the setting is determined at database creation
+      time.
          </para>
         </listitem>
        </varlistentry>
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 5ce6163..4b7f5eb 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -145,11 +145,12 @@ postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput>
    Normally, it will just take the locale settings in the environment
    and apply them to the initialized database.  It is possible to
    specify a different locale for the database; more information about
-   that can be found in <xref linkend="locale">.  The sort order used
-   within a particular database cluster is set by
-   <command>initdb</command> and cannot be changed later, short of
-   dumping all data, rerunning <command>initdb</command>, and reloading
-   the data. There is also a performance impact for using locales
+   that can be found in <xref linkend="locale">.  The default sort order used
+   within the particular database cluster is set by
+   <command>initdb</command>, and while you can create new databases using
+   different sort order, the order used in the template databases that initdb
+   creates cannot be changed without dropping and recreating them.
+   There is also a performance impact for using locales
    other than <literal>C</> or <literal>POSIX</>. Therefore, it is
    important to make this choice correctly the first time.
   </para>
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index e48f82b..98ac7f8 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -1896,7 +1896,7 @@ LIMIT 10;

   <note>
    <para>
-    The parser's notion of a <quote>letter</> is determined by the server's
+    The parser's notion of a <quote>letter</> is determined by the database's
     locale setting, specifically <varname>lc_ctype</>.  Words containing
     only the basic ASCII letters are reported as a separate token type,
     since it is sometimes useful to distinguish them.  In most European
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5645271..edd1d74 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3847,7 +3847,6 @@ WriteControlFile(void)
 {
     int            fd;
     char        buffer[PG_CONTROL_SIZE];        /* need not be aligned */
-    char       *localeptr;

     /*
      * Initialize version and compatibility-check fields
@@ -3876,18 +3875,6 @@ WriteControlFile(void)
     ControlFile->float4ByVal = FLOAT4PASSBYVAL;
     ControlFile->float8ByVal = FLOAT8PASSBYVAL;

-    ControlFile->localeBuflen = LOCALE_NAME_BUFLEN;
-    localeptr = setlocale(LC_COLLATE, NULL);
-    if (!localeptr)
-        ereport(PANIC,
-                (errmsg("invalid LC_COLLATE setting")));
-    StrNCpy(ControlFile->lc_collate, localeptr, LOCALE_NAME_BUFLEN);
-    localeptr = setlocale(LC_CTYPE, NULL);
-    if (!localeptr)
-        ereport(PANIC,
-                (errmsg("invalid LC_CTYPE setting")));
-    StrNCpy(ControlFile->lc_ctype, localeptr, LOCALE_NAME_BUFLEN);
-
     /* Contents are protected with a CRC */
     INIT_CRC32(ControlFile->crc);
     COMP_CRC32(ControlFile->crc,
@@ -4126,34 +4113,6 @@ ReadControlFile(void)
                            " but the server was compiled without USE_FLOAT8_BYVAL."),
                  errhint("It looks like you need to recompile or initdb.")));
 #endif
-
-    if (ControlFile->localeBuflen != LOCALE_NAME_BUFLEN)
-        ereport(FATAL,
-                (errmsg("database files are incompatible with server"),
-                 errdetail("The database cluster was initialized with LOCALE_NAME_BUFLEN %d,"
-                  " but the server was compiled with LOCALE_NAME_BUFLEN %d.",
-                           ControlFile->localeBuflen, LOCALE_NAME_BUFLEN),
-                 errhint("It looks like you need to recompile or initdb.")));
-    if (pg_perm_setlocale(LC_COLLATE, ControlFile->lc_collate) == NULL)
-        ereport(FATAL,
-            (errmsg("database files are incompatible with operating system"),
-             errdetail("The database cluster was initialized with LC_COLLATE \"%s\","
-                       " which is not recognized by setlocale().",
-                       ControlFile->lc_collate),
-             errhint("It looks like you need to initdb or install locale support.")));
-    if (pg_perm_setlocale(LC_CTYPE, ControlFile->lc_ctype) == NULL)
-        ereport(FATAL,
-            (errmsg("database files are incompatible with operating system"),
-        errdetail("The database cluster was initialized with LC_CTYPE \"%s\","
-                  " which is not recognized by setlocale().",
-                  ControlFile->lc_ctype),
-             errhint("It looks like you need to initdb or install locale support.")));
-
-    /* Make the fixed locale settings visible as GUC variables, too */
-    SetConfigOption("lc_collate", ControlFile->lc_collate,
-                    PGC_INTERNAL, PGC_S_OVERRIDE);
-    SetConfigOption("lc_ctype", ControlFile->lc_ctype,
-                    PGC_INTERNAL, PGC_S_OVERRIDE);
 }

 void
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 4dd6262..2d5e27b 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -53,6 +53,7 @@
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/lsyscache.h"
+#include "utils/pg_locale.h"
 #include "utils/syscache.h"
 #include "utils/tqual.h"

@@ -69,7 +70,7 @@ static bool get_db_info(const char *name, LOCKMODE lockmode,
             Oid *dbIdP, Oid *ownerIdP,
             int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
             Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
-            Oid *dbTablespace);
+            Oid *dbTablespace, char **dbCollation, char **dbCtype);
 static bool have_createdb_privilege(void);
 static void remove_dbtablespaces(Oid db_id);
 static bool check_db_file_conflict(Oid db_id);
@@ -87,6 +88,8 @@ createdb(const CreatedbStmt *stmt)
     Oid            src_dboid;
     Oid            src_owner;
     int            src_encoding;
+    char       *src_collation;
+    char       *src_ctype;
     bool        src_istemplate;
     bool        src_allowconn;
     Oid            src_lastsysoid;
@@ -104,10 +107,14 @@ createdb(const CreatedbStmt *stmt)
     DefElem    *downer = NULL;
     DefElem    *dtemplate = NULL;
     DefElem    *dencoding = NULL;
+    DefElem    *dcollation = NULL;
+    DefElem    *dctype = NULL;
     DefElem    *dconnlimit = NULL;
     char       *dbname = stmt->dbname;
     char       *dbowner = NULL;
     const char *dbtemplate = NULL;
+    char       *lc_collate = NULL;
+    char       *lc_ctype = NULL;
     int            encoding = -1;
     int            dbconnlimit = -1;
     int            ctype_encoding;
@@ -152,6 +159,22 @@ createdb(const CreatedbStmt *stmt)
                          errmsg("conflicting or redundant options")));
             dencoding = defel;
         }
+        else if (strcmp(defel->defname, "collate") == 0)
+        {
+            if (dcollation)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
+            dcollation = defel;
+        }
+        else if (strcmp(defel->defname, "ctype") == 0)
+        {
+            if (dctype)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
+            dctype = defel;
+        }
         else if (strcmp(defel->defname, "connectionlimit") == 0)
         {
             if (dconnlimit)
@@ -205,6 +228,11 @@ createdb(const CreatedbStmt *stmt)
             elog(ERROR, "unrecognized node type: %d",
                  nodeTag(dencoding->arg));
     }
+    if (dcollation && dcollation->arg)
+        lc_collate = strVal(dcollation->arg);
+    if (dctype && dctype->arg)
+        lc_ctype = strVal(dctype->arg);
+
     if (dconnlimit && dconnlimit->arg)
         dbconnlimit = intVal(dconnlimit->arg);

@@ -243,7 +271,8 @@ createdb(const CreatedbStmt *stmt)
     if (!get_db_info(dbtemplate, ShareLock,
                      &src_dboid, &src_owner, &src_encoding,
                      &src_istemplate, &src_allowconn, &src_lastsysoid,
-                     &src_frozenxid, &src_deftablespace))
+                     &src_frozenxid, &src_deftablespace,
+                     &src_collation, &src_ctype))
         ereport(ERROR,
                 (errcode(ERRCODE_UNDEFINED_DATABASE),
                  errmsg("template database \"%s\" does not exist",
@@ -262,9 +291,13 @@ createdb(const CreatedbStmt *stmt)
                             dbtemplate)));
     }

-    /* If encoding is defaulted, use source's encoding */
+    /* If encoding or locales are defaulted, use source's setting */
     if (encoding < 0)
         encoding = src_encoding;
+    if (lc_collate == NULL)
+        lc_collate = src_collation;
+    if (lc_ctype == NULL)
+        lc_ctype = src_ctype;

     /* Some encodings are client only */
     if (!PG_VALID_BE_ENCODING(encoding))
@@ -272,6 +305,16 @@ createdb(const CreatedbStmt *stmt)
                 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                  errmsg("invalid server encoding %d", encoding)));

+    /* Check that the chosen locales are valid */
+    if (!check_locale(LC_COLLATE, lc_collate))
+        ereport(ERROR,
+                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                 errmsg("invalid locale name %s", lc_collate)));
+    if (!check_locale(LC_CTYPE, lc_ctype))
+        ereport(ERROR,
+                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                 errmsg("invalid locale name %s", lc_ctype)));
+
     /*
      * Check whether encoding matches server locale settings.  We allow
      * mismatch in three cases:
@@ -290,7 +333,7 @@ createdb(const CreatedbStmt *stmt)
      *
      * Note: if you change this policy, fix initdb to match.
      */
-    ctype_encoding = pg_get_encoding_from_locale(NULL);
+    ctype_encoding = pg_get_encoding_from_locale(lc_ctype);

     if (!(ctype_encoding == encoding ||
           ctype_encoding == PG_SQL_ASCII ||
@@ -299,12 +342,32 @@ createdb(const CreatedbStmt *stmt)
 #endif
           (encoding == PG_SQL_ASCII && superuser())))
         ereport(ERROR,
-                (errmsg("encoding %s does not match server's locale %s",
+                (errmsg("encoding %s does not match locale %s",
                         pg_encoding_to_char(encoding),
-                        setlocale(LC_CTYPE, NULL)),
-             errdetail("The server's LC_CTYPE setting requires encoding %s.",
+                        lc_ctype),
+             errdetail("The chosen LC_CTYPE setting requires encoding %s.",
                        pg_encoding_to_char(ctype_encoding))));

+    /*
+     * Check that the new locale is compatible with the source database.
+     *
+     * We know that template0 doesn't contain any indexes that depend on
+     * collation or ctype, so template0 can be used as template for
+     * any locale.
+     */
+    if (strcmp(dbtemplate, "template0") != 0)
+    {
+        if (strcmp(lc_collate, src_collation))
+            ereport(ERROR,
+                    (errmsg("new collation is incompatible with the collation of the template database (%s)",
src_collation),
+                     errhint("Use the same collation as in the template database, or use template0 as template")));
+
+        if (strcmp(lc_ctype, src_ctype))
+            ereport(ERROR,
+                    (errmsg("new ctype is incompatible with the ctype of the template database (%s)", src_ctype),
+                     errhint("Use the same ctype as in the template database, or use template0 as template")));
+    }
+
     /* Resolve default tablespace for new database */
     if (dtablespacename && dtablespacename->arg)
     {
@@ -421,6 +484,8 @@ createdb(const CreatedbStmt *stmt)
         DirectFunctionCall1(namein, CStringGetDatum(dbname));
     new_record[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(datdba);
     new_record[Anum_pg_database_encoding - 1] = Int32GetDatum(encoding);
+    new_record[Anum_pg_database_collation - 1] = DirectFunctionCall1(namein, CStringGetDatum(lc_collate));
+    new_record[Anum_pg_database_ctype - 1] = DirectFunctionCall1(namein, CStringGetDatum(lc_ctype));
     new_record[Anum_pg_database_datistemplate - 1] = BoolGetDatum(false);
     new_record[Anum_pg_database_datallowconn - 1] = BoolGetDatum(true);
     new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit);
@@ -629,7 +694,7 @@ dropdb(const char *dbname, bool missing_ok)
     pgdbrel = heap_open(DatabaseRelationId, RowExclusiveLock);

     if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-                     &db_istemplate, NULL, NULL, NULL, NULL))
+                     &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL))
     {
         if (!missing_ok)
         {
@@ -781,7 +846,7 @@ RenameDatabase(const char *oldname, const char *newname)
     rel = heap_open(DatabaseRelationId, RowExclusiveLock);

     if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
-                     NULL, NULL, NULL, NULL, NULL))
+                     NULL, NULL, NULL, NULL, NULL, NULL, NULL))
         ereport(ERROR,
                 (errcode(ERRCODE_UNDEFINED_DATABASE),
                  errmsg("database \"%s\" does not exist", oldname)));
@@ -1168,7 +1233,7 @@ get_db_info(const char *name, LOCKMODE lockmode,
             Oid *dbIdP, Oid *ownerIdP,
             int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
             Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
-            Oid *dbTablespace)
+            Oid *dbTablespace, char **dbCollation, char **dbCtype)
 {
     bool        result = false;
     Relation    relation;
@@ -1259,6 +1324,11 @@ get_db_info(const char *name, LOCKMODE lockmode,
                 /* default tablespace for this database */
                 if (dbTablespace)
                     *dbTablespace = dbform->dattablespace;
+                 /* default locale settings for this database */
+                 if (dbCollation)
+                     *dbCollation = pstrdup(NameStr(dbform->collation));
+                 if (dbCtype)
+                     *dbCtype = pstrdup(NameStr(dbform->ctype));
                 ReleaseSysCache(tuple);
                 result = true;
                 break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 5ff353d..0831a94 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -398,7 +398,7 @@ static TypeName *TableFuncTypeName(List *columns);
     CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
     COMMITTED CONCURRENTLY CONFIGURATION CONNECTION CONSTRAINT CONSTRAINTS
     CONTENT_P CONTINUE_P CONVERSION_P COPY COST CREATE CREATEDB
-    CREATEROLE CREATEUSER CROSS CSV CURRENT_P CURRENT_DATE CURRENT_ROLE
+    CREATEROLE CREATEUSER CROSS CSV CTYPE CURRENT_P CURRENT_DATE CURRENT_ROLE
     CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE

     DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
@@ -5458,6 +5458,22 @@ createdb_opt_item:
                 {
                     $$ = makeDefElem("encoding", NULL);
                 }
+            | COLLATE opt_equal Sconst
+                {
+                    $$ = makeDefElem("collate", (Node *)makeString($3));
+                }
+            | COLLATE opt_equal DEFAULT
+                {
+                    $$ = makeDefElem("collate", NULL);
+                }
+            | CTYPE opt_equal Sconst
+                {
+                    $$ = makeDefElem("ctype", (Node *)makeString($3));
+                }
+            | CTYPE opt_equal DEFAULT
+                {
+                    $$ = makeDefElem("ctype", NULL);
+                }
             | CONNECTION LIMIT opt_equal SignedIconst
                 {
                     $$ = makeDefElem("connectionlimit", (Node *)makeInteger($4));
@@ -9216,6 +9232,7 @@ unreserved_keyword:
             | CREATEROLE
             | CREATEUSER
             | CSV
+            | CTYPE
             | CURRENT_P
             | CURSOR
             | CYCLE
diff --git a/src/backend/parser/keywords.c b/src/backend/parser/keywords.c
index b30a478..e2fe4bf 100644
--- a/src/backend/parser/keywords.c
+++ b/src/backend/parser/keywords.c
@@ -114,6 +114,7 @@ const ScanKeyword ScanKeywords[] = {
     {"createuser", CREATEUSER, UNRESERVED_KEYWORD},
     {"cross", CROSS, TYPE_FUNC_NAME_KEYWORD},
     {"csv", CSV, UNRESERVED_KEYWORD},
+    {"ctype", CTYPE, UNRESERVED_KEYWORD},
     {"current", CURRENT_P, UNRESERVED_KEYWORD},
     {"current_date", CURRENT_DATE, RESERVED_KEYWORD},
     {"current_role", CURRENT_ROLE, RESERVED_KEYWORD},
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index c0a01ae..4a55be5 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -189,6 +189,30 @@ pg_perm_setlocale(int category, const char *locale)
 }


+/*
+ * Is the locale name valid for the locale category?
+ */
+bool
+check_locale(int category, const char *value)
+{
+    char       *save;
+    bool        ret;
+
+    save = setlocale(category, NULL);
+    if (!save)
+        return false;            /* won't happen, we hope */
+
+    /* save may be pointing at a modifiable scratch variable, see above */
+    save = pstrdup(save);
+
+    ret = (setlocale(category, value) != NULL);
+
+    setlocale(category, save);    /* assume this won't fail */
+    pfree(save);
+
+    return ret;
+}
+
 /* GUC assign hooks */

 /*
@@ -203,21 +227,9 @@ pg_perm_setlocale(int category, const char *locale)
 static const char *
 locale_xxx_assign(int category, const char *value, bool doit, GucSource source)
 {
-    char       *save;
-
-    save = setlocale(category, NULL);
-    if (!save)
-        return NULL;            /* won't happen, we hope */
-
-    /* save may be pointing at a modifiable scratch variable, see above */
-    save = pstrdup(save);
-
-    if (!setlocale(category, value))
+    if (!check_locale(category, value))
         value = NULL;            /* set failure return marker */

-    setlocale(category, save);    /* assume this won't fail */
-    pfree(save);
-
     /* need to reload cache next time? */
     if (doit && value != NULL)
     {
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 461edd9..08fbf3b 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -159,6 +159,8 @@ CheckMyDatabase(const char *name, bool am_superuser)
 {
     HeapTuple    tup;
     Form_pg_database dbform;
+    char       *collate;
+    char       *ctype;

     /* Fetch our real pg_database row */
     tup = SearchSysCache(DATABASEOID,
@@ -240,6 +242,28 @@ CheckMyDatabase(const char *name, bool am_superuser)
     /* If we have no other source of client_encoding, use server encoding */
     SetConfigOption("client_encoding", GetDatabaseEncodingName(),
                     PGC_BACKEND, PGC_S_DEFAULT);
+
+    /* assign locale variables */
+    collate = NameStr(dbform->collation);
+    ctype = NameStr(dbform->ctype);
+
+    if (setlocale(LC_COLLATE, collate) == NULL)
+        ereport(FATAL,
+            (errmsg("database locale is incompatible with operating system"),
+            errdetail("The database was initialized with LC_COLLATE \"%s\", "
+                        " which is not recognized by setlocale().", collate),
+            errhint("Recreate the database with another locale or install the missing locale.")));
+
+    if (setlocale(LC_CTYPE, ctype) == NULL)
+        ereport(FATAL,
+            (errmsg("database locale is incompatible with operating system"),
+            errdetail("The database was initialized with LC_CTYPE \"%s\", "
+                        " which is not recognized by setlocale().", ctype),
+            errhint("Recreate the database with another locale or install the missing locale.")));
+
+    /* Make the locale settings visible as GUC variables, too */
+    SetConfigOption("lc_collate", collate, PGC_INTERNAL, PGC_S_DATABASE);
+    SetConfigOption("lc_ctype", ctype, PGC_INTERNAL, PGC_S_DATABASE);

     /*
      * Lastly, set up any database-specific configuration variables.
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 3effafb..a8cb246 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1353,6 +1353,10 @@ bootstrap_template1(char *short_version)

     bki_lines = replace_token(bki_lines, "ENCODING", encodingid);

+    bki_lines = replace_token(bki_lines, "LC_COLLATE", lc_collate);
+
+    bki_lines = replace_token(bki_lines, "LC_CTYPE", lc_ctype);
+
     /*
      * Pass correct LC_xxx environment to bootstrap.
      *
@@ -2378,12 +2382,12 @@ usage(const char *progname)
     printf(_("\nOptions:\n"));
     printf(_(" [-D, --pgdata=]DATADIR     location for this database cluster\n"));
     printf(_("  -E, --encoding=ENCODING   set default encoding for new databases\n"));
-    printf(_("  --locale=LOCALE           initialize database cluster with given locale\n"));
+    printf(_("  --locale=LOCALE           set default locale for new databases\n"));
     printf(_("  --lc-collate, --lc-ctype, --lc-messages=LOCALE\n"
              "  --lc-monetary, --lc-numeric, --lc-time=LOCALE\n"
-             "                            initialize database cluster with given locale\n"
-             "                            in the respective category (default taken from\n"
-             "                            environment)\n"));
+             "                            set default locale in the respective\n"
+             "                            category for new databases (default\n"
+             "                            taken from environment)\n"));
     printf(_("  --no-locale               equivalent to --locale=C\n"));
     printf(_("  -T, --text-search-config=CFG\n"
          "                            default text search configuration\n"));
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 7970c32..08e102f 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -220,12 +220,5 @@ main(int argc, char *argv[])
            (ControlFile.float4ByVal ? _("by value") : _("by reference")));
     printf(_("Float8 argument passing:              %s\n"),
            (ControlFile.float8ByVal ? _("by value") : _("by reference")));
-    printf(_("Maximum length of locale name:        %u\n"),
-           ControlFile.localeBuflen);
-    printf(_("LC_COLLATE:                           %s\n"),
-           ControlFile.lc_collate);
-    printf(_("LC_CTYPE:                             %s\n"),
-           ControlFile.lc_ctype);
-
     return 0;
 }
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 52e5611..7bec68a 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1542,12 +1542,16 @@ dumpDatabase(Archive *AH)
                 i_oid,
                 i_dba,
                 i_encoding,
+                i_collation,
+                i_ctype,
                 i_tablespace;
     CatalogId    dbCatId;
     DumpId        dbDumpId;
     const char *datname,
                *dba,
                *encoding,
+               *collation,
+               *ctype,
                *tablespace;

     datname = PQdb(g_conn);
@@ -1559,11 +1563,26 @@ dumpDatabase(Archive *AH)
     selectSourceSchema("pg_catalog");

     /* Get the database owner and parameters from pg_database */
-    if (g_fout->remoteVersion >= 80200)
+    if (g_fout->remoteVersion >= 80400)
+    {
+        appendPQExpBuffer(dbQry, "SELECT tableoid, oid, "
+                          "(%s datdba) as dba, "
+                          "pg_encoding_to_char(encoding) as encoding, "
+                          "collation, ctype, "
+                          "(SELECT spcname FROM pg_tablespace t WHERE t.oid = dattablespace) as tablespace, "
+                      "shobj_description(oid, 'pg_database') as description "
+
+                          "FROM pg_database "
+                          "WHERE datname = ",
+                          username_subquery);
+        appendStringLiteralAH(dbQry, datname, AH);
+    }
+    else if (g_fout->remoteVersion >= 80200)
     {
         appendPQExpBuffer(dbQry, "SELECT tableoid, oid, "
                           "(%s datdba) as dba, "
                           "pg_encoding_to_char(encoding) as encoding, "
+                          "NULL as collation, NULL as ctype, "
                           "(SELECT spcname FROM pg_tablespace t WHERE t.oid = dattablespace) as tablespace, "
                       "shobj_description(oid, 'pg_database') as description "

@@ -1577,6 +1596,7 @@ dumpDatabase(Archive *AH)
         appendPQExpBuffer(dbQry, "SELECT tableoid, oid, "
                           "(%s datdba) as dba, "
                           "pg_encoding_to_char(encoding) as encoding, "
+                          "NULL as collation, NULL as ctype, "
                           "(SELECT spcname FROM pg_tablespace t WHERE t.oid = dattablespace) as tablespace "
                           "FROM pg_database "
                           "WHERE datname = ",
@@ -1588,6 +1608,7 @@ dumpDatabase(Archive *AH)
         appendPQExpBuffer(dbQry, "SELECT tableoid, oid, "
                           "(%s datdba) as dba, "
                           "pg_encoding_to_char(encoding) as encoding, "
+                          "NULL as collation, NULL as ctype, "
                           "NULL as tablespace "
                           "FROM pg_database "
                           "WHERE datname = ",
@@ -1601,6 +1622,7 @@ dumpDatabase(Archive *AH)
                           "oid, "
                           "(%s datdba) as dba, "
                           "pg_encoding_to_char(encoding) as encoding, "
+                          "NULL as collation, NULL as ctype, "
                           "NULL as tablespace "
                           "FROM pg_database "
                           "WHERE datname = ",
@@ -1631,12 +1653,16 @@ dumpDatabase(Archive *AH)
     i_oid = PQfnumber(res, "oid");
     i_dba = PQfnumber(res, "dba");
     i_encoding = PQfnumber(res, "encoding");
+    i_collation = PQfnumber(res, "collation");
+    i_ctype = PQfnumber(res, "ctype");
     i_tablespace = PQfnumber(res, "tablespace");

     dbCatId.tableoid = atooid(PQgetvalue(res, 0, i_tableoid));
     dbCatId.oid = atooid(PQgetvalue(res, 0, i_oid));
     dba = PQgetvalue(res, 0, i_dba);
     encoding = PQgetvalue(res, 0, i_encoding);
+    collation = PQgetvalue(res, 0, i_collation);
+    ctype = PQgetvalue(res, 0, i_ctype);
     tablespace = PQgetvalue(res, 0, i_tablespace);

     appendPQExpBuffer(creaQry, "CREATE DATABASE %s WITH TEMPLATE = template0",
@@ -1646,6 +1672,16 @@ dumpDatabase(Archive *AH)
         appendPQExpBuffer(creaQry, " ENCODING = ");
         appendStringLiteralAH(creaQry, encoding, AH);
     }
+    if (strlen(collation) > 0)
+    {
+        appendPQExpBuffer(creaQry, " COLLATE = ");
+        appendStringLiteralAH(creaQry, collation, AH);
+    }
+    if (strlen(ctype) > 0)
+    {
+        appendPQExpBuffer(creaQry, " CTYPE = ");
+        appendStringLiteralAH(creaQry, ctype, AH);
+    }
     if (strlen(tablespace) > 0 && strcmp(tablespace, "pg_default") != 0)
         appendPQExpBuffer(creaQry, " TABLESPACE = %s",
                           fmtId(tablespace));
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index fa51af0..9ff59ae 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -925,11 +925,22 @@ dumpCreateDB(PGconn *conn)

     fprintf(OPF, "--\n-- Database creation\n--\n\n");

-    if (server_version >= 80100)
+    if (server_version >= 80400)
         res = executeQuery(conn,
                            "SELECT datname, "
                            "coalesce(rolname, (select rolname from pg_authid where oid=(select datdba from pg_database
wheredatname='template0'))), " 
                            "pg_encoding_to_char(d.encoding), "
+                           "collation, ctype, "
+                           "datistemplate, datacl, datconnlimit, "
+                           "(SELECT spcname FROM pg_tablespace t WHERE t.oid = d.dattablespace) AS dattablespace "
+              "FROM pg_database d LEFT JOIN pg_authid u ON (datdba = u.oid) "
+                           "WHERE datallowconn ORDER BY 1");
+    else if (server_version >= 80100)
+        res = executeQuery(conn,
+                           "SELECT datname, "
+                           "coalesce(rolname, (select rolname from pg_authid where oid=(select datdba from pg_database
wheredatname='template0'))), " 
+                           "pg_encoding_to_char(d.encoding), "
+                           "null::text AS collation, null::text AS ctype, "
                            "datistemplate, datacl, datconnlimit, "
                            "(SELECT spcname FROM pg_tablespace t WHERE t.oid = d.dattablespace) AS dattablespace "
               "FROM pg_database d LEFT JOIN pg_authid u ON (datdba = u.oid) "
@@ -939,6 +950,7 @@ dumpCreateDB(PGconn *conn)
                            "SELECT datname, "
                            "coalesce(usename, (select usename from pg_shadow where usesysid=(select datdba from
pg_databasewhere datname='template0'))), " 
                            "pg_encoding_to_char(d.encoding), "
+                           "null::text AS collation, null::text AS ctype, "
                            "datistemplate, datacl, -1 as datconnlimit, "
                            "(SELECT spcname FROM pg_tablespace t WHERE t.oid = d.dattablespace) AS dattablespace "
            "FROM pg_database d LEFT JOIN pg_shadow u ON (datdba = usesysid) "
@@ -948,6 +960,7 @@ dumpCreateDB(PGconn *conn)
                            "SELECT datname, "
                            "coalesce(usename, (select usename from pg_shadow where usesysid=(select datdba from
pg_databasewhere datname='template0'))), " 
                            "pg_encoding_to_char(d.encoding), "
+                           "null::text AS collation, null::text AS ctype, "
                            "datistemplate, datacl, -1 as datconnlimit, "
                            "'pg_default' AS dattablespace "
            "FROM pg_database d LEFT JOIN pg_shadow u ON (datdba = usesysid) "
@@ -959,6 +972,7 @@ dumpCreateDB(PGconn *conn)
                     "(select usename from pg_shadow where usesysid=datdba), "
                            "(select usename from pg_shadow where usesysid=(select datdba from pg_database where
datname='template0')))," 
                            "pg_encoding_to_char(d.encoding), "
+                           "null::text AS collation, null::text AS ctype, "
                            "datistemplate, '' as datacl, -1 as datconnlimit, "
                            "'pg_default' AS dattablespace "
                            "FROM pg_database d "
@@ -973,6 +987,7 @@ dumpCreateDB(PGconn *conn)
                            "SELECT datname, "
                     "(select usename from pg_shadow where usesysid=datdba), "
                            "pg_encoding_to_char(d.encoding), "
+                           "null::text AS collation, null::text AS ctype, "
                            "'f' as datistemplate, "
                            "'' as datacl, -1 as datconnlimit, "
                            "'pg_default' AS dattablespace "
@@ -985,10 +1000,12 @@ dumpCreateDB(PGconn *conn)
         char       *dbname = PQgetvalue(res, i, 0);
         char       *dbowner = PQgetvalue(res, i, 1);
         char       *dbencoding = PQgetvalue(res, i, 2);
-        char       *dbistemplate = PQgetvalue(res, i, 3);
-        char       *dbacl = PQgetvalue(res, i, 4);
-        char       *dbconnlimit = PQgetvalue(res, i, 5);
-        char       *dbtablespace = PQgetvalue(res, i, 6);
+        char       *dbcollation = PQgetvalue(res, i, 3);
+        char       *dbctype = PQgetvalue(res, i, 4);
+        char       *dbistemplate = PQgetvalue(res, i, 5);
+        char       *dbacl = PQgetvalue(res, i, 6);
+        char       *dbconnlimit = PQgetvalue(res, i, 7);
+        char       *dbtablespace = PQgetvalue(res, i, 8);
         char       *fdbname;

         fdbname = strdup(fmtId(dbname));
@@ -1016,6 +1033,18 @@ dumpCreateDB(PGconn *conn)
             appendPQExpBuffer(buf, " ENCODING = ");
             appendStringLiteralConn(buf, dbencoding, conn);

+            if (strlen(dbcollation) != 0)
+            {
+                appendPQExpBuffer(buf, " COLLATE = ");
+                appendStringLiteralConn(buf, dbcollation, conn);
+            }
+
+            if (strlen(dbctype) != 0)
+            {
+                appendPQExpBuffer(buf, " CTYPE = ");
+                appendStringLiteralConn(buf, dbctype, conn);
+            }
+
             /*
              * Output tablespace if it isn't the default.  For default, it
              * uses the default from the template database.  If tablespace is
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 345b89c..0df796b 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -493,22 +493,6 @@ GuessControlValues(void)
 #endif
     ControlFile.float4ByVal = FLOAT4PASSBYVAL;
     ControlFile.float8ByVal = FLOAT8PASSBYVAL;
-    ControlFile.localeBuflen = LOCALE_NAME_BUFLEN;
-
-    localeptr = setlocale(LC_COLLATE, "");
-    if (!localeptr)
-    {
-        fprintf(stderr, _("%s: invalid LC_COLLATE setting\n"), progname);
-        exit(1);
-    }
-    strlcpy(ControlFile.lc_collate, localeptr, sizeof(ControlFile.lc_collate));
-    localeptr = setlocale(LC_CTYPE, "");
-    if (!localeptr)
-    {
-        fprintf(stderr, _("%s: invalid LC_CTYPE setting\n"), progname);
-        exit(1);
-    }
-    strlcpy(ControlFile.lc_ctype, localeptr, sizeof(ControlFile.lc_ctype));

     /*
      * XXX eventually, should try to grovel through old XLOG to develop more
@@ -584,12 +568,6 @@ PrintControlValues(bool guessed)
            (ControlFile.float4ByVal ? _("by value") : _("by reference")));
     printf(_("Float8 argument passing:              %s\n"),
            (ControlFile.float8ByVal ? _("by value") : _("by reference")));
-    printf(_("Maximum length of locale name:        %u\n"),
-           ControlFile.localeBuflen);
-    printf(_("LC_COLLATE:                           %s\n"),
-           ControlFile.lc_collate);
-    printf(_("LC_CTYPE:                             %s\n"),
-           ControlFile.lc_ctype);
 }


diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 900130d..2cf3172 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -454,11 +454,18 @@ listAllDbs(bool verbose)
     printfPQExpBuffer(&buf,
                       "SELECT d.datname as \"%s\",\n"
                       "       pg_catalog.pg_get_userbyid(d.datdba) as \"%s\",\n"
-                      "       pg_catalog.pg_encoding_to_char(d.encoding) as \"%s\",\n"
-                      "       d.datacl as \"%s\"",
+                      "       pg_catalog.pg_encoding_to_char(d.encoding) as \"%s\",\n",
                       gettext_noop("Name"),
                       gettext_noop("Owner"),
-                      gettext_noop("Encoding"),
+                      gettext_noop("Encoding"));
+    if (pset.sversion >= 80400)
+        appendPQExpBuffer(&buf,
+                          "       d.collation as \"%s\",\n"
+                          "       d.ctype as \"%s\",\n",
+                          gettext_noop("Collation"),
+                          gettext_noop("Ctype"));
+    appendPQExpBuffer(&buf,
+                      "       d.datacl as \"%s\"",
                       gettext_noop("Access Privileges"));
     if (verbose && pset.sversion >= 80200)
         appendPQExpBuffer(&buf,
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 286667e..37d41de 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -32,6 +32,8 @@ main(int argc, char *argv[])
         {"tablespace", required_argument, NULL, 'D'},
         {"template", required_argument, NULL, 'T'},
         {"encoding", required_argument, NULL, 'E'},
+        {"lc-collate", required_argument, NULL, 1},
+        {"lc-ctype", required_argument, NULL, 2},
         {NULL, 0, NULL, 0}
     };

@@ -50,6 +52,8 @@ main(int argc, char *argv[])
     char       *tablespace = NULL;
     char       *template = NULL;
     char       *encoding = NULL;
+    char       *lc_collate = NULL;
+    char       *lc_ctype = NULL;

     PQExpBufferData sql;

@@ -95,6 +99,12 @@ main(int argc, char *argv[])
             case 'E':
                 encoding = optarg;
                 break;
+            case 1:
+                lc_collate = optarg;
+                break;
+            case 2:
+                lc_ctype = optarg;
+                break;
             default:
                 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
                 exit(1);
@@ -152,6 +162,11 @@ main(int argc, char *argv[])
         appendPQExpBuffer(&sql, " ENCODING '%s'", encoding);
     if (template)
         appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template));
+    if (lc_collate)
+        appendPQExpBuffer(&sql, " COLLATE '%s'", lc_collate);
+    if (lc_ctype)
+        appendPQExpBuffer(&sql, " CTYPE '%s'", lc_ctype);
+
     appendPQExpBuffer(&sql, ";\n");

     conn = connectDatabase(strcmp(dbname, "postgres") == 0 ? "template1" : "postgres",
@@ -209,6 +224,9 @@ help(const char *progname)
     printf(_("\nOptions:\n"));
     printf(_("  -D, --tablespace=TABLESPACE  default tablespace for the database\n"));
     printf(_("  -E, --encoding=ENCODING      encoding for the database\n"));
+    printf(_("  --lc-collate=LOCALE          LC_COLLATE setting for the database\n"));
+    printf(_("  --lc-ctype=LOCALE            LC_CTYPE setting for the database\n"));
+
     printf(_("  -O, --owner=OWNER            database user to own the new database\n"));
     printf(_("  -T, --template=TEMPLATE      template database to copy\n"));
     printf(_("  -e, --echo                   show the commands being sent to the server\n"));
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 38b5a84..aee6934 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -144,11 +144,6 @@ typedef struct ControlFileData
     bool        float4ByVal;    /* float4 pass-by-value? */
     bool        float8ByVal;    /* float8, int8, etc pass-by-value? */

-    /* active locales */
-    uint32        localeBuflen;
-    char        lc_collate[LOCALE_NAME_BUFLEN];
-    char        lc_ctype[LOCALE_NAME_BUFLEN];
-
     /* CRC of all above ... MUST BE LAST! */
     pg_crc32    crc;
 } ControlFileData;
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index 6e9e5d2..1b10e60 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -33,6 +33,8 @@ CATALOG(pg_database,1262) BKI_SHARED_RELATION
     NameData    datname;        /* database name */
     Oid            datdba;            /* owner of database */
     int4        encoding;        /* character encoding */
+    NameData    collation;        /* LC_COLLATE of database */
+    NameData    ctype;            /* LC_CTYPE of database */
     bool        datistemplate;    /* allowed as CREATE DATABASE template? */
     bool        datallowconn;    /* new connections allowed? */
     int4        datconnlimit;    /* max connections allowed (-1=no limit) */
@@ -54,20 +56,22 @@ typedef FormData_pg_database *Form_pg_database;
  *        compiler constants for pg_database
  * ----------------
  */
-#define Natts_pg_database                11
+#define Natts_pg_database                13
 #define Anum_pg_database_datname        1
 #define Anum_pg_database_datdba            2
 #define Anum_pg_database_encoding        3
-#define Anum_pg_database_datistemplate    4
-#define Anum_pg_database_datallowconn    5
-#define Anum_pg_database_datconnlimit    6
-#define Anum_pg_database_datlastsysoid    7
-#define Anum_pg_database_datfrozenxid    8
-#define Anum_pg_database_dattablespace    9
-#define Anum_pg_database_datconfig        10
-#define Anum_pg_database_datacl            11
+#define Anum_pg_database_collation        4
+#define Anum_pg_database_ctype            5
+#define Anum_pg_database_datistemplate    6
+#define Anum_pg_database_datallowconn    7
+#define Anum_pg_database_datconnlimit    8
+#define Anum_pg_database_datlastsysoid    9
+#define Anum_pg_database_datfrozenxid    10
+#define Anum_pg_database_dattablespace    11
+#define Anum_pg_database_datconfig        12
+#define Anum_pg_database_datacl            13

-DATA(insert OID = 1 (  template1 PGUID ENCODING t t -1 0 0 1663 _null_ _null_ ));
+DATA(insert OID = 1 (  template1 PGUID ENCODING "LC_COLLATE" "LC_CTYPE" t t -1 0 0 1663 _null_ _null_));
 SHDESCR("default template database");
 #define TemplateDbOid            1

diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 5a49823..2b60027 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -39,6 +39,7 @@ extern const char *locale_numeric_assign(const char *value,
 extern const char *locale_time_assign(const char *value,
                    bool doit, GucSource source);

+extern bool check_locale(int category, const char *locale);
 extern char *pg_perm_setlocale(int category, const char *locale);

 extern bool lc_collate_is_c(void);
diff --git a/src/interfaces/ecpg/preproc/preproc.y b/src/interfaces/ecpg/preproc/preproc.y
index 0a8b62b..949e76b 100644
--- a/src/interfaces/ecpg/preproc/preproc.y
+++ b/src/interfaces/ecpg/preproc/preproc.y
@@ -428,7 +428,7 @@ add_typedef(char *name, char * dimension, char * length, enum ECPGttype type_enu
     CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
     COMMITTED CONCURRENTLY CONFIGURATION CONNECTION CONSTRAINT CONSTRAINTS
     CONTENT_P CONTINUE_P CONVERSION_P COPY COST CREATE CREATEDB
-    CREATEROLE CREATEUSER CROSS CSV CURRENT_P CURRENT_DATE CURRENT_ROLE
+    CREATEROLE CREATEUSER CROSS CSV CTYPE CURRENT_P CURRENT_DATE CURRENT_ROLE
     CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE

     DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS

Re: WIP patch: Collation support

From
Martijn van Oosterhout
Date:
On Fri, Sep 19, 2008 at 10:13:43AM +0300, Heikki Linnakangas wrote:
> In addition to catalog changes, for finer grained collation you need
> changes in the planner and executor as well. In planner, to provide
> support for the concept of a sort order using a specific collation, and
> track that whenever sort order is handled. In executor, the capability
> to compare and sort using different locales at different times. Those
> changes seem a lot more difficult to me than adding a couple of system
> catalogs, which is pretty straightforward.

To be honest, I think that's the easy part. When I submitted the
collate patch a few years back, getting the sort order working was the
easy part, precisely because COLLATE clauses can affect the generated
plan in precisely one way: can you use index FOO or not. Once you're
past that point you can throw almost all the collation information
away.

As you point out though, the shear volume of catalog changes required
obscure that fact somewhat.

> It's not like the patch is going to disappear from planet Earth if it
> doesn't get committed for 8.4. It's still valuable and available when
> the new catalogs are needed.

I just prefer it as it was because it takes care of a useful subset of
the features people want in a way that is compatable for the future.
Whereas the stripped down version, I'm not sure it gets us anywhere.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: WIP patch: Collation support

From
Heikki Linnakangas
Date:
Martijn van Oosterhout wrote:
> On Fri, Sep 19, 2008 at 10:13:43AM +0300, Heikki Linnakangas wrote:
>> It's not like the patch is going to disappear from planet Earth if it 
>> doesn't get committed for 8.4. It's still valuable and available when 
>> the new catalogs are needed.
> 
> I just prefer it as it was because it takes care of a useful subset of
> the features people want in a way that is compatable for the future.
> Whereas the stripped down version, I'm not sure it gets us anywhere.

It gives the capability to have different collations in different 
databases within the same cluster. IOW, the same feature as the original 
patch. Finer-grained collation would be even better, of course, but 
database-level collations is a valuable feature on its own.

The critical question is how much compatibility trouble we're going to 
get by having to support the extension to CREATE DATABASE in the 
stripped-down patch, when the pg_collation catalog is introduced in a 
later version in one form or another. So let's investigate that a bit 
further:

In the stripped down version, the CREATE DATABASE syntax is:

CREATE DATABASE <name> WITH COLLATE=<locale name> CTYPE=<locale name>

In the original patch, the CREATE DATABASE syntax is:

CREATE DATABASE <name> WITH COLLATE=<collation name>

The first thing that we see is that the COLLATE keyword means different 
things, so it's probably best to change that into:

CREATE DATABASE <name> WITH LC_COLLATE=<locale name> LC_CTYPE=<locale name>

in the stripped-down version. Then we need a way to map the 
stripped-down syntax into the one in the original patch. That's just a 
matter of looking up the collation in the pg_collation catalog with the 
right LC_COLLATE and LC_CTYPE.

Things get slightly more complicated if there is no such collation in 
the pg_collation catalog. One option is to simply create it at that point.

BTW, the original patch didn't have any provision for creating rows in 
pg_collation reflecting the locales available in the OS, but I think 
we'd need that. Otherwise the DBA would need to manually run CREATE 
COLLATION for every collation they want users to be able to use. 
Assuming we do that, the situation that we can't find a row with given 
LC_COLLATE and LC_CTYPE should not arise in practice.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: WIP patch: Collation support

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> The first thing that we see is that the COLLATE keyword means different 
> things, so it's probably best to change that into:

> CREATE DATABASE <name> WITH LC_COLLATE=<locale name> LC_CTYPE=<locale name>

> in the stripped-down version. Then we need a way to map the 
> stripped-down syntax into the one in the original patch. That's just a 
> matter of looking up the collation in the pg_collation catalog with the 
> right LC_COLLATE and LC_CTYPE.

It seems to me that in an installation using libc-based collation
support, the collation names are likely to be the same as allowed values
of LC_COLLATE anyway.  So inventing different keywords doesn't really
seem necessary.

What might be sensible to ask is whether it is ever actually reasonable
for LC_COLLATE and LC_CTYPE to have different settings.  If we were
willing to enforce that they be the same, we could reduce this to just
the standard syntax COLLATE=something and be done with it.  Not being
much of a user of anything except C locale, I might be the wrong person
to opine on this; but it seems to me that having them different is far
more likely to be a mistake than desirable.
        regards, tom lane


Re: WIP patch: Collation support

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> What might be sensible to ask is whether it is ever actually reasonable
> for LC_COLLATE and LC_CTYPE to have different settings.  If we were
> willing to enforce that they be the same, we could reduce this to just
> the standard syntax COLLATE=something and be done with it.  Not being
> much of a user of anything except C locale, I might be the wrong person
> to opine on this; but it seems to me that having them different is far
> more likely to be a mistake than desirable.

Agreed, it doesn't make much sense. I find it hard to imagine anyone 
doing that on purpose, but we have supported it at initdb time for ages.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: WIP patch: Collation support

From
Zdenek Kotala
Date:
Heikki Linnakangas napsal(a):

>> For anyone counting, Firebird added support for ICU more than three
>> years ago.
> 
> ICU is orthogonal to this patch. This patch didn't provide ICU support, 
> and we could start using ICU without the catalog changes.

This patch should allow to use both system catalog and ICU. pg_collate catalog 
contains comparing function which is called for string comparing and if somebody 
creates function which will use ICU then ICU will be supported. It is advantage 
of pg_catalog. Without them you can have system or ICU but probably not both.
    Zdenek




-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: WIP patch: Collation support

From
Heikki Linnakangas
Date:
Zdenek Kotala wrote:
> Heikki Linnakangas napsal(a):
> 
>>> For anyone counting, Firebird added support for ICU more than three
>>> years ago.
>>
>> ICU is orthogonal to this patch. This patch didn't provide ICU 
>> support, and we could start using ICU without the catalog changes.
> 
> This patch should allow to use both system catalog and ICU. 

Not without another patch that actually introduces ICU support. What 
that would look like, how that would be stored in the catalogs, and 
whether we want that is whole another topic. Without that, the STRCOLFN 
part of the original patch is pointless, and I would've ripped that out 
anyway even if we decided to add the pg_collation catalog in this release.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: WIP patch: Collation support

From
Heikki Linnakangas
Date:
Committed.

Tom Lane wrote:
> * You should try to get rid of LOCALE_NAME_BUFLEN altogether.  Definitely
> the comment about it in pg_control.h is now obsolete.

Yep. I removed LOCALE_NAME_BUFLEN. The real max length of a locale name 
is now NAMEDATALEN, because it's stored in a name field in pg_database. 
NAMEDATALEN is only 64 bytes, whereas LOCALE_NAME_BUFLEN was 128. 64 
bytes should be enough for "en_GB.UTF8" style locale names, but I wonder 
if it's enough for the longer names used on Windows? Could someone 
confirm that, please?

>     An important restriction, however, is that each database's character set
>     must be compatible with the database's <envar>LC_CTYPE</> setting.
> 
> Also I wonder whether we shouldn't say that it must be compatible with
> LC_CTYPE *and* LC_COLLATE.

I think we should, but that's in fact not what is tested. Before the 
patch as well, we only tested that the encoding matches LC_CTYPE, but 
you could set LC_COLLATE to anything. I'll work on a subsequent patch to 
tighten that.

> * This makes sense, but then shouldn't we make the identical restriction
> for encoding?
> 
> +    The <literal>COLLATE</> and <literal>CTYPE</> settings must match
> +    those of the template database, except when template0 is used as
> +    template. This is because <literal>COLLATE</> and <literal>CTYPE</>

It wouldn't be as bullet-proof for encoding, because we'd still have the 
problem that the encoding in the shared system tables would be 
ill-defined. That's a pre-existing problem, though. We could simply 
remove support for per-database encodings altogether and fix it at 
initdb time, as Martijn suggest earlier, but now that we have 
per-database locales, per-database encodings is a lot more useful as well.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: WIP patch: Collation support

From
"Dave Page"
Date:
On Tue, Sep 23, 2008 at 10:20 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Committed.

*adds yet another item to his pgAdmin todo list* :-(

> Tom Lane wrote:
>>
>> * You should try to get rid of LOCALE_NAME_BUFLEN altogether.  Definitely
>> the comment about it in pg_control.h is now obsolete.
>
> Yep. I removed LOCALE_NAME_BUFLEN. The real max length of a locale name is
> now NAMEDATALEN, because it's stored in a name field in pg_database.
> NAMEDATALEN is only 64 bytes, whereas LOCALE_NAME_BUFLEN was 128. 64 bytes
> should be enough for "en_GB.UTF8" style locale names, but I wonder if it's
> enough for the longer names used on Windows? Could someone confirm that,
> please?

The longest I can find is:

Serbian (Cyrillic)_Bosnia and Herzegovina

at 42 characters.

-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com


Re: WIP patch: Collation support

From
Martijn van Oosterhout
Date:
On Mon, Sep 22, 2008 at 10:22:35AM +0300, Heikki Linnakangas wrote:
> BTW, the original patch didn't have any provision for creating rows in
> pg_collation reflecting the locales available in the OS, but I think
> we'd need that. Otherwise the DBA would need to manually run CREATE
> COLLATION for every collation they want users to be able to use.
> Assuming we do that, the situation that we can't find a row with given
> LC_COLLATE and LC_CTYPE should not arise in practice.

You're assuming collations are denumerable. They're not. There is no way
to find the list of available collations/locales. You may be able to
guess a few but certainly not all of them.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: WIP patch: Collation support

From
Martijn van Oosterhout
Date:
On Mon, Sep 22, 2008 at 06:11:04PM +0300, Heikki Linnakangas wrote:
> >This patch should allow to use both system catalog and ICU.
>
> Not without another patch that actually introduces ICU support. What
> that would look like, how that would be stored in the catalogs, and
> whether we want that is whole another topic. Without that, the STRCOLFN
> part of the original patch is pointless, and I would've ripped that out
> anyway even if we decided to add the pg_collation catalog in this release.

Eh? How you store collations names is easy. Every collation has a
textual name which is what we store in the catalog. I'm not sure why
you'd think it'd be any more complicated than that. And it has
precisely nothing to do with ICU and everything to do with being able
to support multiple source of collation information. We already have
two: builtin (strcmp/memcmp) and C library (strcoll). Letting peope add
more is the goal.

I'm sure once the catalog exists the existing ICU-for-Postgres patch
will be adjusted to use it.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: WIP patch: Collation support

From
Heikki Linnakangas
Date:
Martijn van Oosterhout wrote:
> On Mon, Sep 22, 2008 at 10:22:35AM +0300, Heikki Linnakangas wrote:
>> BTW, the original patch didn't have any provision for creating rows in 
>> pg_collation reflecting the locales available in the OS, but I think 
>> we'd need that. Otherwise the DBA would need to manually run CREATE 
>> COLLATION for every collation they want users to be able to use. 
>> Assuming we do that, the situation that we can't find a row with given 
>> LC_COLLATE and LC_CTYPE should not arise in practice.
> 
> You're assuming collations are denumerable. They're not. There is no way
> to find the list of available collations/locales. You may be able to
> guess a few but certainly not all of them.

"locale -a" manages to do it somehow...

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: WIP patch: Collation support

From
Martijn van Oosterhout
Date:
On Tue, Sep 23, 2008 at 01:32:38PM +0300, Heikki Linnakangas wrote:
> Martijn van Oosterhout wrote:
> >You're assuming collations are denumerable. They're not. There is no way
> >to find the list of available collations/locales. You may be able to
> >guess a few but certainly not all of them.
>
> "locale -a" manages to do it somehow...

Sure, by (on glibc) opening the binary archive and parsing it and then
trying to reverse lookup the alias list. We could ofcourse program
something for each platform to determine a list but who is going to
maintain that? How do you handle the list changing?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: WIP patch: Collation support

From
Zdenek Kotala
Date:
Martijn van Oosterhout napsal(a):
> On Mon, Sep 22, 2008 at 06:11:04PM +0300, Heikki Linnakangas wrote:
>>> This patch should allow to use both system catalog and ICU. 
>> Not without another patch that actually introduces ICU support. What 
>> that would look like, how that would be stored in the catalogs, and 
>> whether we want that is whole another topic. Without that, the STRCOLFN 
>> part of the original patch is pointless, and I would've ripped that out 
>> anyway even if we decided to add the pg_collation catalog in this release.
> 
> Eh? How you store collations names is easy. Every collation has a
> textual name which is what we store in the catalog. I'm not sure why
> you'd think it'd be any more complicated than that. And it has
> precisely nothing to do with ICU and everything to do with being able
> to support multiple source of collation information. We already have
> two: builtin (strcmp/memcmp) and C library (strcoll). Letting peope add
> more is the goal.

pg_collation catalog is also important for pg_dump, because system 
collation names are not compatible over OS and pg_dump output should be 
portable. pg_collation adds abstract layer which solve this problem.

> I'm sure once the catalog exists the existing ICU-for-Postgres patch
> will be adjusted to use it.

Yes, I agree with Martijn.
    Zdenek


Re: WIP patch: Collation support

From
Heikki Linnakangas
Date:
Zdenek Kotala wrote:
> pg_collation catalog is also important for pg_dump, because system 
> collation names are not compatible over OS and pg_dump output should be 
> portable. pg_collation adds abstract layer which solve this problem.

That's a valid point. We'll still need a way to map OS locale to 
whatever internal names we invent for them, though, so I'm not sure if 
the pg_collation catalog helps much, but just moves the problem 
elsewhere. The pg_dump output of the CREATE COLLATION statements still 
wouldn't be portable from one OS to another.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: WIP patch: Collation support

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Tue, Sep 23, 2008 at 01:32:38PM +0300, Heikki Linnakangas wrote:
>> "locale -a" manages to do it somehow...

> Sure, by (on glibc) opening the binary archive and parsing it and then
> trying to reverse lookup the alias list. We could ofcourse program
> something for each platform to determine a list but who is going to
> maintain that? How do you handle the list changing?

exec("locale -a") ...

I suppose we'd need something else for Windows, but I'm sure there's
a way.
        regards, tom lane


Re: WIP patch: Collation support

From
Magnus Hagander
Date:
Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
>> On Tue, Sep 23, 2008 at 01:32:38PM +0300, Heikki Linnakangas wrote:
>>> "locale -a" manages to do it somehow...
> 
>> Sure, by (on glibc) opening the binary archive and parsing it and then
>> trying to reverse lookup the alias list. We could ofcourse program
>> something for each platform to determine a list but who is going to
>> maintain that? How do you handle the list changing?
> 
> exec("locale -a") ...
> 
> I suppose we'd need something else for Windows, but I'm sure there's
> a way.

IIRC, the data is in the registry. Should be enumerable somehow - we'll
have to do it platform specific of course, but it's not the first time
we'd do that for windows...

//Magnus


Re: WIP patch: Collation support

From
Zdenek Kotala
Date:
Heikki Linnakangas napsal(a):
> Zdenek Kotala wrote:
>> pg_collation catalog is also important for pg_dump, because system 
>> collation names are not compatible over OS and pg_dump output should 
>> be portable. pg_collation adds abstract layer which solve this problem.
> 
> That's a valid point. We'll still need a way to map OS locale to 
> whatever internal names we invent for them, though, so I'm not sure if 
> the pg_collation catalog helps much, but just moves the problem 
> elsewhere. 

It is true. For names we can for example use RFC479 0IANA register) 
http://tools.ietf.org/html/rfc4790#section-7
or use UNICODE terminology CLDR.

> The pg_dump output of the CREATE COLLATION statements still 
> wouldn't be portable from one OS to another.
> 

I don't think so that pg_collation catalog should be dumped (maybe only 
with extra switch).

        Zdenek


Re: WIP patch: Collation support

From
Petr Jelinek
Date:
Magnus Hagander wrote:
>> exec("locale -a") ...
>>
>> I suppose we'd need something else for Windows, but I'm sure there's
>> a way.
> 
> IIRC, the data is in the registry. Should be enumerable somehow - we'll
> have to do it platform specific of course, but it's not the first time
> we'd do that for windows...
> 

There is EnumSystemLocales API function in Windows.

-- 
Regards
Petr Jelinek (PJMODOS)


Re: WIP patch: Collation support

From
Magnus Hagander
Date:
Petr Jelinek wrote:
> Magnus Hagander wrote:
>>> exec("locale -a") ...
>>>
>>> I suppose we'd need something else for Windows, but I'm sure there's
>>> a way.
>>
>> IIRC, the data is in the registry. Should be enumerable somehow - we'll
>> have to do it platform specific of course, but it's not the first time
>> we'd do that for windows...
>>
> 
> There is EnumSystemLocales API function in Windows.

Ha, right. We even use it in the installer :-)
Bottom line remains: we can easily do this in a Windows-specific way if
we need to.

//Magnus