Re: WIP patch: Collation support - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: WIP patch: Collation support
Date
Msg-id 48D41B5A.7050705@enterprisedb.com
Whole thread Raw
In response to Re: WIP patch: Collation support  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
Heikki Linnakangas wrote:
> Here's an updated version of the stripped-down patch, now with
> documentation changes, plus a couple of minor bug fixes.

Another update, marching towards committing. Now with pg_dump/pg_dumpall
support, and collation/ctype is also shown in psql \l output.

I wonder if we should provide a shorthand "LOCALE=<localename>" option,
in addition to separate COLLATE and CTYPE options? Most people will
always set them together. It could also set the rest of the locale
categories, lc_numeric, lc_monetary, and so on, as database-specific
options. Like "ALTER DATABASE <dbname> SET lc_messages=<localename> ..."
would.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index f484db8..5581b10 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -130,23 +130,24 @@ initdb --locale=sv_SE

    <para>
     The nature of some locale categories is that their value has to be
-    fixed for the lifetime of a database cluster.  That is, once
-    <command>initdb</command> has run, you cannot change them anymore.
-    <literal>LC_COLLATE</literal> and <literal>LC_CTYPE</literal> are
-    those categories.  They affect the sort order of indexes, so they
-    must be kept fixed, or indexes on text columns will become corrupt.
-    <productname>PostgreSQL</productname> enforces this by recording
-    the values of <envar>LC_COLLATE</> and <envar>LC_CTYPE</> that are
-    seen by <command>initdb</>.  The server automatically adopts
-    those two values when it is started.
+    fixed when the database is created.  You can use different settings
+    for different databases, but once a database is created, you cannot
+    change them for that database anymore. <literal>LC_COLLATE</literal>
+    and <literal>LC_CTYPE</literal> are those categories.  They affect
+    the sort order of indexes, so they must be kept fixed, or indexes on
+    text columns will become corrupt.  The default values for these
+    categories are defined when <command>initdb</command> is run, and
+    those values are used when new databases are created, unless
+    explicitly specified otherwise in the <command>CREATE
+    DATABASE</command> command.
    </para>

    <para>
     The other locale categories can be changed as desired whenever the
     server is running by setting the run-time configuration variables
     that have the same name as the locale categories (see <xref
-    linkend="runtime-config-client-format"> for details).  The defaults that are
-    chosen by <command>initdb</command> are actually only written into
+    linkend="runtime-config-client-format"> for details).  The defaults
+    that are chosen by <command>initdb</command> are actually only written into
     the configuration file <filename>postgresql.conf</filename> to
     serve as defaults when the server is started.  If you delete these
     assignments from <filename>postgresql.conf</filename> then the
@@ -261,7 +262,7 @@ initdb --locale=sv_SE

    <para>
     Check that <productname>PostgreSQL</> is actually using the locale
-    that you think it is.  <envar>LC_COLLATE</> and <envar>LC_CTYPE</>
+    that you think it is.  The default <envar>LC_COLLATE</> and <envar>LC_CTYPE</>
     settings are determined at <command>initdb</> time and cannot be
     changed without repeating <command>initdb</>.  Other locale
     settings including <envar>LC_MESSAGES</> and <envar>LC_MONETARY</>
@@ -320,16 +321,10 @@ initdb --locale=sv_SE

   <para>
    An important restriction, however, is that each database character set
-   must be compatible with the server's <envar>LC_CTYPE</> setting.
+   must be compatible with the database's <envar>LC_CTYPE</> setting.
    When <envar>LC_CTYPE</> is <literal>C</> or <literal>POSIX</>, any
    character set is allowed, but for other settings of <envar>LC_CTYPE</>
    there is only one character set that will work correctly.
-   Since the <envar>LC_CTYPE</> setting is frozen by <command>initdb</>, the
-   apparent flexibility to use different encodings in different databases
-   of a cluster is more theoretical than real, except when you select
-   <literal>C</> or <literal>POSIX</> locale (thus disabling any real locale
-   awareness).  It is likely that these mechanisms will be revisited in future
-   versions of <productname>PostgreSQL</productname>.
   </para>

    <sect2 id="multibyte-charset-supported">
@@ -734,19 +729,19 @@ initdb -E EUC_JP
     </para>

     <para>
-     If you have selected <literal>C</> or <literal>POSIX</> locale,
-     you can create a database with a different character set:
+     You can specify a non-default encoding at database creation time,
+     provided that the encoding is compatible with the selected locale:

 <screen>
-createdb -E EUC_KR korean
+createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr korean
 </screen>

      This will create a database named <literal>korean</literal> that
-     uses the character set <literal>EUC_KR</literal>.  Another way to
-     accomplish this is to use this SQL command:
+     uses the character set <literal>EUC_KR</literal>, and locale <literal>ko_KR</literal>.
+     Another way to accomplish this is to use this SQL command:

 <programlisting>
-CREATE DATABASE korean WITH ENCODING 'EUC_KR';
+CREATE DATABASE korean WITH ENCODING 'EUC_KR' COLLATE='ko_KR.euckr' CTYPE='ko_KR.euckr' TEMPLATE=template0;
 </programlisting>

      The encoding for a database is stored in the system catalog
@@ -756,20 +751,17 @@ CREATE DATABASE korean WITH ENCODING 'EUC_KR';

 <screen>
 $ <userinput>psql -l</userinput>
-            List of databases
-   Database    |  Owner  |   Encoding
----------------+---------+---------------
- euc_cn        | t-ishii | EUC_CN
- euc_jp        | t-ishii | EUC_JP
- euc_kr        | t-ishii | EUC_KR
- euc_tw        | t-ishii | EUC_TW
- mule_internal | t-ishii | MULE_INTERNAL
- postgres      | t-ishii | EUC_JP
- regression    | t-ishii | SQL_ASCII
- template1     | t-ishii | EUC_JP
- test          | t-ishii | EUC_JP
- utf8          | t-ishii | UTF8
-(9 rows)
+                                         List of databases
+   Name    |  Owner   | Encoding  |  Collation  |    Ctype    |          Access Privileges
+-----------+----------+-----------+-------------+-------------+-------------------------------------
+ clocaledb | hlinnaka | SQL_ASCII | C           | C           |
+ englishdb | hlinnaka | UTF8      | en_GB.UTF8  | en_GB.UTF8  |
+ japanese  | hlinnaka | UTF8      | ja_JP.UTF8  | ja_JP.UTF8  |
+ korean    | hlinnaka | EUC_KR    | ko_KR.euckr | ko_KR.euckr |
+ postgres  | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  |
+ template0 | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  | {=c/hlinnaka,hlinnaka=CTc/hlinnaka}
+ template1 | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  | {=c/hlinnaka,hlinnaka=CTc/hlinnaka}
+(7 rows)
 </screen>
     </para>

diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 2d4a7bf..f02a957 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -157,7 +157,7 @@ CREATE INDEX test1_id_index ON test1 (id);
    <emphasis>if</emphasis> the pattern is a constant and is anchored to
    the beginning of the string — for example, <literal>col LIKE
    'foo%'</literal> or <literal>col ~ '^foo'</literal>, but not
-   <literal>col LIKE '%bar'</literal>. However, if your server does not
+   <literal>col LIKE '%bar'</literal>. However, if your database does not
    use the C locale you will need to create the index with a special
    operator class to support indexing of pattern-matching queries. See
    <xref linkend="indexes-opclass"> below. It is also possible to use
@@ -922,7 +922,7 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
       according to the locale-specific collation rules.  This makes
       these operator classes suitable for use by queries involving
       pattern matching expressions (<literal>LIKE</literal> or POSIX
-      regular expressions) when the server does not use the standard
+      regular expressions) when the database does not use the standard
       <quote>C</quote> locale.  As an example, you might index a
       <type>varchar</type> column like this:
 <programlisting>
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 95350c4..ed048b0 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -24,6 +24,8 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
     [ [ WITH ] [ OWNER [=] <replaceable class="parameter">dbowner</replaceable> ]
            [ TEMPLATE [=] <replaceable class="parameter">template</replaceable> ]
            [ ENCODING [=] <replaceable class="parameter">encoding</replaceable> ]
+           [ COLLATE [=] <replaceable class="parameter">collation</replaceable> ]
+           [ CTYPE [=] <replaceable class="parameter">ctype</replaceable> ]
            [ TABLESPACE [=] <replaceable class="parameter">tablespace</replaceable> ]
            [ CONNECTION LIMIT [=] <replaceable class="parameter">connlimit</replaceable> ] ]
 </synopsis>
@@ -113,6 +115,29 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
       </listitem>
      </varlistentry>
      <varlistentry>
+      <term><replaceable class="parameter">collation</replaceable></term>
+      <listitem>
+       <para>
+        Collation order (<literal>LC_COLLATE</>) to use in the new database.
+    This affects the sort order applied to strings, e.g in queries with
+        ORDER BY, as well as the order used in indexes on text columns.
+        The default is to use the collation order of the template database.
+        See below for additional restrictions.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><replaceable class="parameter">ctype</replaceable></term>
+      <listitem>
+       <para>
+        Character classification (<literal>LC_CTYPE</>) to use in the new
+        database. This affects the categorization of characters, e.g. lower,
+        upper and digit. The default is to use the character classification of
+        the template database. See below for additional restrictions.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
       <term><replaceable class="parameter">tablespace</replaceable></term>
       <listitem>
        <para>
@@ -181,12 +206,10 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>

   <para>
    Any character set encoding specified for the new database must be
-   compatible with the server's <envar>LC_CTYPE</> locale setting.
+   compatible with the chosen COLLATE and CTYPE settings.
    If <envar>LC_CTYPE</> is <literal>C</> (or equivalently
    <literal>POSIX</>), then all encodings are allowed, but for other
-   locale settings there is only one encoding that will work properly,
-   and so the apparent freedom to specify an encoding is illusory if
-   you didn't initialize the database cluster in <literal>C</> locale.
+   locale settings there is only one encoding that will work properly.
    <command>CREATE DATABASE</> will allow superusers to specify
    <literal>SQL_ASCII</> encoding regardless of the locale setting,
    but this choice is deprecated and may result in misbehavior of
@@ -195,6 +218,16 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
   </para>

   <para>
+   The <literal>COLLATE</> and <literal>CTYPE</> settings must match
+   those of the template database, except when template0 is used as
+   template. This is because <literal>COLLATE</> and <literal>CTYPE</>
+   affects the ordering in indexes, so that any indexes copied from the
+   template database would be invalid in the new database with different
+   settings. <literal>template0</literal>, however, is known to not
+   contain any indexes that would be affected.
+  </para>
+
+  <para>
    The <literal>CONNECTION LIMIT</> option is only enforced approximately;
    if two new sessions start at about the same time when just one
    connection <quote>slot</> remains for the database, it is possible that
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 10a9d1c..8eead18 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -76,25 +76,30 @@ PostgreSQL documentation

   <para>
    <command>initdb</command> initializes the database cluster's default
-   locale and character set encoding. The collation order
-   (<literal>LC_COLLATE</>) and character set classes
-   (<literal>LC_CTYPE</>, e.g. upper, lower, digit) are fixed for all
-   databases and cannot be changed. Collation orders other than
-   <literal>C</> or <literal>POSIX</> also have a performance penalty.
-   For these reasons it is important to choose the right locale when
-   running <command>initdb</command>. The remaining locale categories
-   can be changed later when the server is started. All server locale
-   values (<literal>lc_*</>) can be displayed via <command>SHOW ALL</>.
+   locale and character set encoding.
+
+   The character set encoding, collation order (<literal>LC_COLLATE</>)
+   and character set classes (<literal>LC_CTYPE</>, e.g. upper, lower,
+   digit) can be set separately for a database when it is created.
+   <command>initdb</command> determines those settings for the
+   <literal>template1</literal> database, which will serve as the
+   default for all other databases.
+
+   To alter the default collation order or character set classes, use the
+   <option>--lc-collate</option> and <option>--lc-ctype</option> options.
+   Collation orders other than <literal>C</> or <literal>POSIX</> also have
+   a performance penalty.  For these reasons it is important to choose the
+   right locale when running <command>initdb</command>.
+
+   The remaining locale categories can be changed later when the server
+   is started.  You can also use <option>--locale</option> to set the
+   default for all locale categories, including collation order and
+   character set classes. All server locale values (<literal>lc_*</>) can
+   be displayed via <command>SHOW ALL</>.
    More details can be found in <xref linkend="locale">.
-  </para>

-  <para>
-   The character set encoding can be set separately for a database when
-   it is created. <command>initdb</command> determines the encoding for
-   the <literal>template1</literal> database, which will serve as the
-   default for all other databases. To alter the default encoding use
-   the <option>--encoding</option> option. More details can be found in
-   <xref linkend="multibyte">.
+   To alter the default encoding, use the <option>--encoding</option>.
+   More details can be found in <xref linkend="multibyte">.
   </para>

  </refsect1>
diff --git a/doc/src/sgml/ref/pg_controldata.sgml b/doc/src/sgml/ref/pg_controldata.sgml
index b379df0..7f50b7b 100644
--- a/doc/src/sgml/ref/pg_controldata.sgml
+++ b/doc/src/sgml/ref/pg_controldata.sgml
@@ -30,7 +30,7 @@ PostgreSQL documentation
   <title>Description</title>
   <para>
    <command>pg_controldata</command> prints information initialized during
-   <command>initdb</>, such as the catalog version and server locale.
+   <command>initdb</>, such as the catalog version.
    It also shows information about write-ahead logging and checkpoint
    processing.  This information is cluster-wide, and not specific to any one
    database.
diff --git a/doc/src/sgml/ref/pg_resetxlog.sgml b/doc/src/sgml/ref/pg_resetxlog.sgml
index 1933455..f7331e4 100644
--- a/doc/src/sgml/ref/pg_resetxlog.sgml
+++ b/doc/src/sgml/ref/pg_resetxlog.sgml
@@ -62,12 +62,9 @@ PostgreSQL documentation
    by specifying the <literal>-f</> (force) switch.  In this case plausible
    values will be substituted for the missing data.  Most of the fields can be
    expected to match, but manual assistance might be needed for the next OID,
-   next transaction ID and epoch, next multitransaction ID and offset,
-   WAL starting address, and database locale fields.
-   The first six of these can be set using the switches discussed below.
-   <command>pg_resetxlog</command>'s own environment is the source for its
-   guess at the locale fields; take care that <envar>LANG</> and so forth
-   match the environment that <command>initdb</> was run in.
+   next transaction ID and epoch, next multitransaction ID and offset, and
+   WAL starting address fields.
+   The first five of these can be set using the switches discussed below.
    If you are not able to determine correct values for all these fields,
    <literal>-f</> can still be used, but
    the recovered database must be treated with even more suspicion than
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index e463d15..a1be7cb 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -747,8 +747,7 @@ SELECT name FROM distributors ORDER BY code;

    <para>
     Character-string data is sorted according to the locale-specific
-    collation order that was established when the database cluster
-    was initialized.
+    collation order that was established when the database was created.
    </para>
   </refsect2>

diff --git a/doc/src/sgml/ref/show.sgml b/doc/src/sgml/ref/show.sgml
index 3d238c3..461a9c8 100644
--- a/doc/src/sgml/ref/show.sgml
+++ b/doc/src/sgml/ref/show.sgml
@@ -82,8 +82,8 @@ SHOW ALL
          <para>
           Shows the database's locale setting for collation (text
           ordering).  At present, this parameter can be shown but not
-          set, because the setting is determined at
-          <command>initdb</> time.
+          set, because the setting is determined at database creation
+          time.
          </para>
         </listitem>
        </varlistentry>
@@ -94,8 +94,8 @@ SHOW ALL
          <para>
           Shows the database's locale setting for character
           classification.  At present, this parameter can be shown but
-          not set, because the setting is determined at
-          <command>initdb</> time.
+          not set, because the setting is determined at database creation
+      time.
          </para>
         </listitem>
        </varlistentry>
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 5ce6163..4b7f5eb 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -145,11 +145,12 @@ postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput>
    Normally, it will just take the locale settings in the environment
    and apply them to the initialized database.  It is possible to
    specify a different locale for the database; more information about
-   that can be found in <xref linkend="locale">.  The sort order used
-   within a particular database cluster is set by
-   <command>initdb</command> and cannot be changed later, short of
-   dumping all data, rerunning <command>initdb</command>, and reloading
-   the data. There is also a performance impact for using locales
+   that can be found in <xref linkend="locale">.  The default sort order used
+   within the particular database cluster is set by
+   <command>initdb</command>, and while you can create new databases using
+   different sort order, the order used in the template databases that initdb
+   creates cannot be changed without dropping and recreating them.
+   There is also a performance impact for using locales
    other than <literal>C</> or <literal>POSIX</>. Therefore, it is
    important to make this choice correctly the first time.
   </para>
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index e48f82b..98ac7f8 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -1896,7 +1896,7 @@ LIMIT 10;

   <note>
    <para>
-    The parser's notion of a <quote>letter</> is determined by the server's
+    The parser's notion of a <quote>letter</> is determined by the database's
     locale setting, specifically <varname>lc_ctype</>.  Words containing
     only the basic ASCII letters are reported as a separate token type,
     since it is sometimes useful to distinguish them.  In most European
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5645271..edd1d74 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3847,7 +3847,6 @@ WriteControlFile(void)
 {
     int            fd;
     char        buffer[PG_CONTROL_SIZE];        /* need not be aligned */
-    char       *localeptr;

     /*
      * Initialize version and compatibility-check fields
@@ -3876,18 +3875,6 @@ WriteControlFile(void)
     ControlFile->float4ByVal = FLOAT4PASSBYVAL;
     ControlFile->float8ByVal = FLOAT8PASSBYVAL;

-    ControlFile->localeBuflen = LOCALE_NAME_BUFLEN;
-    localeptr = setlocale(LC_COLLATE, NULL);
-    if (!localeptr)
-        ereport(PANIC,
-                (errmsg("invalid LC_COLLATE setting")));
-    StrNCpy(ControlFile->lc_collate, localeptr, LOCALE_NAME_BUFLEN);
-    localeptr = setlocale(LC_CTYPE, NULL);
-    if (!localeptr)
-        ereport(PANIC,
-                (errmsg("invalid LC_CTYPE setting")));
-    StrNCpy(ControlFile->lc_ctype, localeptr, LOCALE_NAME_BUFLEN);
-
     /* Contents are protected with a CRC */
     INIT_CRC32(ControlFile->crc);
     COMP_CRC32(ControlFile->crc,
@@ -4126,34 +4113,6 @@ ReadControlFile(void)
                            " but the server was compiled without USE_FLOAT8_BYVAL."),
                  errhint("It looks like you need to recompile or initdb.")));
 #endif
-
-    if (ControlFile->localeBuflen != LOCALE_NAME_BUFLEN)
-        ereport(FATAL,
-                (errmsg("database files are incompatible with server"),
-                 errdetail("The database cluster was initialized with LOCALE_NAME_BUFLEN %d,"
-                  " but the server was compiled with LOCALE_NAME_BUFLEN %d.",
-                           ControlFile->localeBuflen, LOCALE_NAME_BUFLEN),
-                 errhint("It looks like you need to recompile or initdb.")));
-    if (pg_perm_setlocale(LC_COLLATE, ControlFile->lc_collate) == NULL)
-        ereport(FATAL,
-            (errmsg("database files are incompatible with operating system"),
-             errdetail("The database cluster was initialized with LC_COLLATE \"%s\","
-                       " which is not recognized by setlocale().",
-                       ControlFile->lc_collate),
-             errhint("It looks like you need to initdb or install locale support.")));
-    if (pg_perm_setlocale(LC_CTYPE, ControlFile->lc_ctype) == NULL)
-        ereport(FATAL,
-            (errmsg("database files are incompatible with operating system"),
-        errdetail("The database cluster was initialized with LC_CTYPE \"%s\","
-                  " which is not recognized by setlocale().",
-                  ControlFile->lc_ctype),
-             errhint("It looks like you need to initdb or install locale support.")));
-
-    /* Make the fixed locale settings visible as GUC variables, too */
-    SetConfigOption("lc_collate", ControlFile->lc_collate,
-                    PGC_INTERNAL, PGC_S_OVERRIDE);
-    SetConfigOption("lc_ctype", ControlFile->lc_ctype,
-                    PGC_INTERNAL, PGC_S_OVERRIDE);
 }

 void
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 4dd6262..2d5e27b 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -53,6 +53,7 @@
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/lsyscache.h"
+#include "utils/pg_locale.h"
 #include "utils/syscache.h"
 #include "utils/tqual.h"

@@ -69,7 +70,7 @@ static bool get_db_info(const char *name, LOCKMODE lockmode,
             Oid *dbIdP, Oid *ownerIdP,
             int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
             Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
-            Oid *dbTablespace);
+            Oid *dbTablespace, char **dbCollation, char **dbCtype);
 static bool have_createdb_privilege(void);
 static void remove_dbtablespaces(Oid db_id);
 static bool check_db_file_conflict(Oid db_id);
@@ -87,6 +88,8 @@ createdb(const CreatedbStmt *stmt)
     Oid            src_dboid;
     Oid            src_owner;
     int            src_encoding;
+    char       *src_collation;
+    char       *src_ctype;
     bool        src_istemplate;
     bool        src_allowconn;
     Oid            src_lastsysoid;
@@ -104,10 +107,14 @@ createdb(const CreatedbStmt *stmt)
     DefElem    *downer = NULL;
     DefElem    *dtemplate = NULL;
     DefElem    *dencoding = NULL;
+    DefElem    *dcollation = NULL;
+    DefElem    *dctype = NULL;
     DefElem    *dconnlimit = NULL;
     char       *dbname = stmt->dbname;
     char       *dbowner = NULL;
     const char *dbtemplate = NULL;
+    char       *lc_collate = NULL;
+    char       *lc_ctype = NULL;
     int            encoding = -1;
     int            dbconnlimit = -1;
     int            ctype_encoding;
@@ -152,6 +159,22 @@ createdb(const CreatedbStmt *stmt)
                          errmsg("conflicting or redundant options")));
             dencoding = defel;
         }
+        else if (strcmp(defel->defname, "collate") == 0)
+        {
+            if (dcollation)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
+            dcollation = defel;
+        }
+        else if (strcmp(defel->defname, "ctype") == 0)
+        {
+            if (dctype)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
+            dctype = defel;
+        }
         else if (strcmp(defel->defname, "connectionlimit") == 0)
         {
             if (dconnlimit)
@@ -205,6 +228,11 @@ createdb(const CreatedbStmt *stmt)
             elog(ERROR, "unrecognized node type: %d",
                  nodeTag(dencoding->arg));
     }
+    if (dcollation && dcollation->arg)
+        lc_collate = strVal(dcollation->arg);
+    if (dctype && dctype->arg)
+        lc_ctype = strVal(dctype->arg);
+
     if (dconnlimit && dconnlimit->arg)
         dbconnlimit = intVal(dconnlimit->arg);

@@ -243,7 +271,8 @@ createdb(const CreatedbStmt *stmt)
     if (!get_db_info(dbtemplate, ShareLock,
                      &src_dboid, &src_owner, &src_encoding,
                      &src_istemplate, &src_allowconn, &src_lastsysoid,
-                     &src_frozenxid, &src_deftablespace))
+                     &src_frozenxid, &src_deftablespace,
+                     &src_collation, &src_ctype))
         ereport(ERROR,
                 (errcode(ERRCODE_UNDEFINED_DATABASE),
                  errmsg("template database \"%s\" does not exist",
@@ -262,9 +291,13 @@ createdb(const CreatedbStmt *stmt)
                             dbtemplate)));
     }

-    /* If encoding is defaulted, use source's encoding */
+    /* If encoding or locales are defaulted, use source's setting */
     if (encoding < 0)
         encoding = src_encoding;
+    if (lc_collate == NULL)
+        lc_collate = src_collation;
+    if (lc_ctype == NULL)
+        lc_ctype = src_ctype;

     /* Some encodings are client only */
     if (!PG_VALID_BE_ENCODING(encoding))
@@ -272,6 +305,16 @@ createdb(const CreatedbStmt *stmt)
                 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                  errmsg("invalid server encoding %d", encoding)));

+    /* Check that the chosen locales are valid */
+    if (!check_locale(LC_COLLATE, lc_collate))
+        ereport(ERROR,
+                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                 errmsg("invalid locale name %s", lc_collate)));
+    if (!check_locale(LC_CTYPE, lc_ctype))
+        ereport(ERROR,
+                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                 errmsg("invalid locale name %s", lc_ctype)));
+
     /*
      * Check whether encoding matches server locale settings.  We allow
      * mismatch in three cases:
@@ -290,7 +333,7 @@ createdb(const CreatedbStmt *stmt)
      *
      * Note: if you change this policy, fix initdb to match.
      */
-    ctype_encoding = pg_get_encoding_from_locale(NULL);
+    ctype_encoding = pg_get_encoding_from_locale(lc_ctype);

     if (!(ctype_encoding == encoding ||
           ctype_encoding == PG_SQL_ASCII ||
@@ -299,12 +342,32 @@ createdb(const CreatedbStmt *stmt)
 #endif
           (encoding == PG_SQL_ASCII && superuser())))
         ereport(ERROR,
-                (errmsg("encoding %s does not match server's locale %s",
+                (errmsg("encoding %s does not match locale %s",
                         pg_encoding_to_char(encoding),
-                        setlocale(LC_CTYPE, NULL)),
-             errdetail("The server's LC_CTYPE setting requires encoding %s.",
+                        lc_ctype),
+             errdetail("The chosen LC_CTYPE setting requires encoding %s.",
                        pg_encoding_to_char(ctype_encoding))));

+    /*
+     * Check that the new locale is compatible with the source database.
+     *
+     * We know that template0 doesn't contain any indexes that depend on
+     * collation or ctype, so template0 can be used as template for
+     * any locale.
+     */
+    if (strcmp(dbtemplate, "template0") != 0)
+    {
+        if (strcmp(lc_collate, src_collation))
+            ereport(ERROR,
+                    (errmsg("new collation is incompatible with the collation of the template database (%s)",
src_collation),
+                     errhint("Use the same collation as in the template database, or use template0 as template")));
+
+        if (strcmp(lc_ctype, src_ctype))
+            ereport(ERROR,
+                    (errmsg("new ctype is incompatible with the ctype of the template database (%s)", src_ctype),
+                     errhint("Use the same ctype as in the template database, or use template0 as template")));
+    }
+
     /* Resolve default tablespace for new database */
     if (dtablespacename && dtablespacename->arg)
     {
@@ -421,6 +484,8 @@ createdb(const CreatedbStmt *stmt)
         DirectFunctionCall1(namein, CStringGetDatum(dbname));
     new_record[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(datdba);
     new_record[Anum_pg_database_encoding - 1] = Int32GetDatum(encoding);
+    new_record[Anum_pg_database_collation - 1] = DirectFunctionCall1(namein, CStringGetDatum(lc_collate));
+    new_record[Anum_pg_database_ctype - 1] = DirectFunctionCall1(namein, CStringGetDatum(lc_ctype));
     new_record[Anum_pg_database_datistemplate - 1] = BoolGetDatum(false);
     new_record[Anum_pg_database_datallowconn - 1] = BoolGetDatum(true);
     new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit);
@@ -629,7 +694,7 @@ dropdb(const char *dbname, bool missing_ok)
     pgdbrel = heap_open(DatabaseRelationId, RowExclusiveLock);

     if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-                     &db_istemplate, NULL, NULL, NULL, NULL))
+                     &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL))
     {
         if (!missing_ok)
         {
@@ -781,7 +846,7 @@ RenameDatabase(const char *oldname, const char *newname)
     rel = heap_open(DatabaseRelationId, RowExclusiveLock);

     if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
-                     NULL, NULL, NULL, NULL, NULL))
+                     NULL, NULL, NULL, NULL, NULL, NULL, NULL))
         ereport(ERROR,
                 (errcode(ERRCODE_UNDEFINED_DATABASE),
                  errmsg("database \"%s\" does not exist", oldname)));
@@ -1168,7 +1233,7 @@ get_db_info(const char *name, LOCKMODE lockmode,
             Oid *dbIdP, Oid *ownerIdP,
             int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
             Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
-            Oid *dbTablespace)
+            Oid *dbTablespace, char **dbCollation, char **dbCtype)
 {
     bool        result = false;
     Relation    relation;
@@ -1259,6 +1324,11 @@ get_db_info(const char *name, LOCKMODE lockmode,
                 /* default tablespace for this database */
                 if (dbTablespace)
                     *dbTablespace = dbform->dattablespace;
+                 /* default locale settings for this database */
+                 if (dbCollation)
+                     *dbCollation = pstrdup(NameStr(dbform->collation));
+                 if (dbCtype)
+                     *dbCtype = pstrdup(NameStr(dbform->ctype));
                 ReleaseSysCache(tuple);
                 result = true;
                 break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 5ff353d..0831a94 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -398,7 +398,7 @@ static TypeName *TableFuncTypeName(List *columns);
     CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
     COMMITTED CONCURRENTLY CONFIGURATION CONNECTION CONSTRAINT CONSTRAINTS
     CONTENT_P CONTINUE_P CONVERSION_P COPY COST CREATE CREATEDB
-    CREATEROLE CREATEUSER CROSS CSV CURRENT_P CURRENT_DATE CURRENT_ROLE
+    CREATEROLE CREATEUSER CROSS CSV CTYPE CURRENT_P CURRENT_DATE CURRENT_ROLE
     CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE

     DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
@@ -5458,6 +5458,22 @@ createdb_opt_item:
                 {
                     $$ = makeDefElem("encoding", NULL);
                 }
+            | COLLATE opt_equal Sconst
+                {
+                    $$ = makeDefElem("collate", (Node *)makeString($3));
+                }
+            | COLLATE opt_equal DEFAULT
+                {
+                    $$ = makeDefElem("collate", NULL);
+                }
+            | CTYPE opt_equal Sconst
+                {
+                    $$ = makeDefElem("ctype", (Node *)makeString($3));
+                }
+            | CTYPE opt_equal DEFAULT
+                {
+                    $$ = makeDefElem("ctype", NULL);
+                }
             | CONNECTION LIMIT opt_equal SignedIconst
                 {
                     $$ = makeDefElem("connectionlimit", (Node *)makeInteger($4));
@@ -9216,6 +9232,7 @@ unreserved_keyword:
             | CREATEROLE
             | CREATEUSER
             | CSV
+            | CTYPE
             | CURRENT_P
             | CURSOR
             | CYCLE
diff --git a/src/backend/parser/keywords.c b/src/backend/parser/keywords.c
index b30a478..e2fe4bf 100644
--- a/src/backend/parser/keywords.c
+++ b/src/backend/parser/keywords.c
@@ -114,6 +114,7 @@ const ScanKeyword ScanKeywords[] = {
     {"createuser", CREATEUSER, UNRESERVED_KEYWORD},
     {"cross", CROSS, TYPE_FUNC_NAME_KEYWORD},
     {"csv", CSV, UNRESERVED_KEYWORD},
+    {"ctype", CTYPE, UNRESERVED_KEYWORD},
     {"current", CURRENT_P, UNRESERVED_KEYWORD},
     {"current_date", CURRENT_DATE, RESERVED_KEYWORD},
     {"current_role", CURRENT_ROLE, RESERVED_KEYWORD},
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index c0a01ae..4a55be5 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -189,6 +189,30 @@ pg_perm_setlocale(int category, const char *locale)
 }


+/*
+ * Is the locale name valid for the locale category?
+ */
+bool
+check_locale(int category, const char *value)
+{
+    char       *save;
+    bool        ret;
+
+    save = setlocale(category, NULL);
+    if (!save)
+        return false;            /* won't happen, we hope */
+
+    /* save may be pointing at a modifiable scratch variable, see above */
+    save = pstrdup(save);
+
+    ret = (setlocale(category, value) != NULL);
+
+    setlocale(category, save);    /* assume this won't fail */
+    pfree(save);
+
+    return ret;
+}
+
 /* GUC assign hooks */

 /*
@@ -203,21 +227,9 @@ pg_perm_setlocale(int category, const char *locale)
 static const char *
 locale_xxx_assign(int category, const char *value, bool doit, GucSource source)
 {
-    char       *save;
-
-    save = setlocale(category, NULL);
-    if (!save)
-        return NULL;            /* won't happen, we hope */
-
-    /* save may be pointing at a modifiable scratch variable, see above */
-    save = pstrdup(save);
-
-    if (!setlocale(category, value))
+    if (!check_locale(category, value))
         value = NULL;            /* set failure return marker */

-    setlocale(category, save);    /* assume this won't fail */
-    pfree(save);
-
     /* need to reload cache next time? */
     if (doit && value != NULL)
     {
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 461edd9..08fbf3b 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -159,6 +159,8 @@ CheckMyDatabase(const char *name, bool am_superuser)
 {
     HeapTuple    tup;
     Form_pg_database dbform;
+    char       *collate;
+    char       *ctype;

     /* Fetch our real pg_database row */
     tup = SearchSysCache(DATABASEOID,
@@ -240,6 +242,28 @@ CheckMyDatabase(const char *name, bool am_superuser)
     /* If we have no other source of client_encoding, use server encoding */
     SetConfigOption("client_encoding", GetDatabaseEncodingName(),
                     PGC_BACKEND, PGC_S_DEFAULT);
+
+    /* assign locale variables */
+    collate = NameStr(dbform->collation);
+    ctype = NameStr(dbform->ctype);
+
+    if (setlocale(LC_COLLATE, collate) == NULL)
+        ereport(FATAL,
+            (errmsg("database locale is incompatible with operating system"),
+            errdetail("The database was initialized with LC_COLLATE \"%s\", "
+                        " which is not recognized by setlocale().", collate),
+            errhint("Recreate the database with another locale or install the missing locale.")));
+
+    if (setlocale(LC_CTYPE, ctype) == NULL)
+        ereport(FATAL,
+            (errmsg("database locale is incompatible with operating system"),
+            errdetail("The database was initialized with LC_CTYPE \"%s\", "
+                        " which is not recognized by setlocale().", ctype),
+            errhint("Recreate the database with another locale or install the missing locale.")));
+
+    /* Make the locale settings visible as GUC variables, too */
+    SetConfigOption("lc_collate", collate, PGC_INTERNAL, PGC_S_DATABASE);
+    SetConfigOption("lc_ctype", ctype, PGC_INTERNAL, PGC_S_DATABASE);

     /*
      * Lastly, set up any database-specific configuration variables.
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 3effafb..a8cb246 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1353,6 +1353,10 @@ bootstrap_template1(char *short_version)

     bki_lines = replace_token(bki_lines, "ENCODING", encodingid);

+    bki_lines = replace_token(bki_lines, "LC_COLLATE", lc_collate);
+
+    bki_lines = replace_token(bki_lines, "LC_CTYPE", lc_ctype);
+
     /*
      * Pass correct LC_xxx environment to bootstrap.
      *
@@ -2378,12 +2382,12 @@ usage(const char *progname)
     printf(_("\nOptions:\n"));
     printf(_(" [-D, --pgdata=]DATADIR     location for this database cluster\n"));
     printf(_("  -E, --encoding=ENCODING   set default encoding for new databases\n"));
-    printf(_("  --locale=LOCALE           initialize database cluster with given locale\n"));
+    printf(_("  --locale=LOCALE           set default locale for new databases\n"));
     printf(_("  --lc-collate, --lc-ctype, --lc-messages=LOCALE\n"
              "  --lc-monetary, --lc-numeric, --lc-time=LOCALE\n"
-             "                            initialize database cluster with given locale\n"
-             "                            in the respective category (default taken from\n"
-             "                            environment)\n"));
+             "                            set default locale in the respective\n"
+             "                            category for new databases (default\n"
+             "                            taken from environment)\n"));
     printf(_("  --no-locale               equivalent to --locale=C\n"));
     printf(_("  -T, --text-search-config=CFG\n"
          "                            default text search configuration\n"));
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 7970c32..08e102f 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -220,12 +220,5 @@ main(int argc, char *argv[])
            (ControlFile.float4ByVal ? _("by value") : _("by reference")));
     printf(_("Float8 argument passing:              %s\n"),
            (ControlFile.float8ByVal ? _("by value") : _("by reference")));
-    printf(_("Maximum length of locale name:        %u\n"),
-           ControlFile.localeBuflen);
-    printf(_("LC_COLLATE:                           %s\n"),
-           ControlFile.lc_collate);
-    printf(_("LC_CTYPE:                             %s\n"),
-           ControlFile.lc_ctype);
-
     return 0;
 }
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 52e5611..7bec68a 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1542,12 +1542,16 @@ dumpDatabase(Archive *AH)
                 i_oid,
                 i_dba,
                 i_encoding,
+                i_collation,
+                i_ctype,
                 i_tablespace;
     CatalogId    dbCatId;
     DumpId        dbDumpId;
     const char *datname,
                *dba,
                *encoding,
+               *collation,
+               *ctype,
                *tablespace;

     datname = PQdb(g_conn);
@@ -1559,11 +1563,26 @@ dumpDatabase(Archive *AH)
     selectSourceSchema("pg_catalog");

     /* Get the database owner and parameters from pg_database */
-    if (g_fout->remoteVersion >= 80200)
+    if (g_fout->remoteVersion >= 80400)
+    {
+        appendPQExpBuffer(dbQry, "SELECT tableoid, oid, "
+                          "(%s datdba) as dba, "
+                          "pg_encoding_to_char(encoding) as encoding, "
+                          "collation, ctype, "
+                          "(SELECT spcname FROM pg_tablespace t WHERE t.oid = dattablespace) as tablespace, "
+                      "shobj_description(oid, 'pg_database') as description "
+
+                          "FROM pg_database "
+                          "WHERE datname = ",
+                          username_subquery);
+        appendStringLiteralAH(dbQry, datname, AH);
+    }
+    else if (g_fout->remoteVersion >= 80200)
     {
         appendPQExpBuffer(dbQry, "SELECT tableoid, oid, "
                           "(%s datdba) as dba, "
                           "pg_encoding_to_char(encoding) as encoding, "
+                          "NULL as collation, NULL as ctype, "
                           "(SELECT spcname FROM pg_tablespace t WHERE t.oid = dattablespace) as tablespace, "
                       "shobj_description(oid, 'pg_database') as description "

@@ -1577,6 +1596,7 @@ dumpDatabase(Archive *AH)
         appendPQExpBuffer(dbQry, "SELECT tableoid, oid, "
                           "(%s datdba) as dba, "
                           "pg_encoding_to_char(encoding) as encoding, "
+                          "NULL as collation, NULL as ctype, "
                           "(SELECT spcname FROM pg_tablespace t WHERE t.oid = dattablespace) as tablespace "
                           "FROM pg_database "
                           "WHERE datname = ",
@@ -1588,6 +1608,7 @@ dumpDatabase(Archive *AH)
         appendPQExpBuffer(dbQry, "SELECT tableoid, oid, "
                           "(%s datdba) as dba, "
                           "pg_encoding_to_char(encoding) as encoding, "
+                          "NULL as collation, NULL as ctype, "
                           "NULL as tablespace "
                           "FROM pg_database "
                           "WHERE datname = ",
@@ -1601,6 +1622,7 @@ dumpDatabase(Archive *AH)
                           "oid, "
                           "(%s datdba) as dba, "
                           "pg_encoding_to_char(encoding) as encoding, "
+                          "NULL as collation, NULL as ctype, "
                           "NULL as tablespace "
                           "FROM pg_database "
                           "WHERE datname = ",
@@ -1631,12 +1653,16 @@ dumpDatabase(Archive *AH)
     i_oid = PQfnumber(res, "oid");
     i_dba = PQfnumber(res, "dba");
     i_encoding = PQfnumber(res, "encoding");
+    i_collation = PQfnumber(res, "collation");
+    i_ctype = PQfnumber(res, "ctype");
     i_tablespace = PQfnumber(res, "tablespace");

     dbCatId.tableoid = atooid(PQgetvalue(res, 0, i_tableoid));
     dbCatId.oid = atooid(PQgetvalue(res, 0, i_oid));
     dba = PQgetvalue(res, 0, i_dba);
     encoding = PQgetvalue(res, 0, i_encoding);
+    collation = PQgetvalue(res, 0, i_collation);
+    ctype = PQgetvalue(res, 0, i_ctype);
     tablespace = PQgetvalue(res, 0, i_tablespace);

     appendPQExpBuffer(creaQry, "CREATE DATABASE %s WITH TEMPLATE = template0",
@@ -1646,6 +1672,16 @@ dumpDatabase(Archive *AH)
         appendPQExpBuffer(creaQry, " ENCODING = ");
         appendStringLiteralAH(creaQry, encoding, AH);
     }
+    if (strlen(collation) > 0)
+    {
+        appendPQExpBuffer(creaQry, " COLLATE = ");
+        appendStringLiteralAH(creaQry, collation, AH);
+    }
+    if (strlen(ctype) > 0)
+    {
+        appendPQExpBuffer(creaQry, " CTYPE = ");
+        appendStringLiteralAH(creaQry, ctype, AH);
+    }
     if (strlen(tablespace) > 0 && strcmp(tablespace, "pg_default") != 0)
         appendPQExpBuffer(creaQry, " TABLESPACE = %s",
                           fmtId(tablespace));
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index fa51af0..9ff59ae 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -925,11 +925,22 @@ dumpCreateDB(PGconn *conn)

     fprintf(OPF, "--\n-- Database creation\n--\n\n");

-    if (server_version >= 80100)
+    if (server_version >= 80400)
         res = executeQuery(conn,
                            "SELECT datname, "
                            "coalesce(rolname, (select rolname from pg_authid where oid=(select datdba from pg_database
wheredatname='template0'))), " 
                            "pg_encoding_to_char(d.encoding), "
+                           "collation, ctype, "
+                           "datistemplate, datacl, datconnlimit, "
+                           "(SELECT spcname FROM pg_tablespace t WHERE t.oid = d.dattablespace) AS dattablespace "
+              "FROM pg_database d LEFT JOIN pg_authid u ON (datdba = u.oid) "
+                           "WHERE datallowconn ORDER BY 1");
+    else if (server_version >= 80100)
+        res = executeQuery(conn,
+                           "SELECT datname, "
+                           "coalesce(rolname, (select rolname from pg_authid where oid=(select datdba from pg_database
wheredatname='template0'))), " 
+                           "pg_encoding_to_char(d.encoding), "
+                           "null::text AS collation, null::text AS ctype, "
                            "datistemplate, datacl, datconnlimit, "
                            "(SELECT spcname FROM pg_tablespace t WHERE t.oid = d.dattablespace) AS dattablespace "
               "FROM pg_database d LEFT JOIN pg_authid u ON (datdba = u.oid) "
@@ -939,6 +950,7 @@ dumpCreateDB(PGconn *conn)
                            "SELECT datname, "
                            "coalesce(usename, (select usename from pg_shadow where usesysid=(select datdba from
pg_databasewhere datname='template0'))), " 
                            "pg_encoding_to_char(d.encoding), "
+                           "null::text AS collation, null::text AS ctype, "
                            "datistemplate, datacl, -1 as datconnlimit, "
                            "(SELECT spcname FROM pg_tablespace t WHERE t.oid = d.dattablespace) AS dattablespace "
            "FROM pg_database d LEFT JOIN pg_shadow u ON (datdba = usesysid) "
@@ -948,6 +960,7 @@ dumpCreateDB(PGconn *conn)
                            "SELECT datname, "
                            "coalesce(usename, (select usename from pg_shadow where usesysid=(select datdba from
pg_databasewhere datname='template0'))), " 
                            "pg_encoding_to_char(d.encoding), "
+                           "null::text AS collation, null::text AS ctype, "
                            "datistemplate, datacl, -1 as datconnlimit, "
                            "'pg_default' AS dattablespace "
            "FROM pg_database d LEFT JOIN pg_shadow u ON (datdba = usesysid) "
@@ -959,6 +972,7 @@ dumpCreateDB(PGconn *conn)
                     "(select usename from pg_shadow where usesysid=datdba), "
                            "(select usename from pg_shadow where usesysid=(select datdba from pg_database where
datname='template0')))," 
                            "pg_encoding_to_char(d.encoding), "
+                           "null::text AS collation, null::text AS ctype, "
                            "datistemplate, '' as datacl, -1 as datconnlimit, "
                            "'pg_default' AS dattablespace "
                            "FROM pg_database d "
@@ -973,6 +987,7 @@ dumpCreateDB(PGconn *conn)
                            "SELECT datname, "
                     "(select usename from pg_shadow where usesysid=datdba), "
                            "pg_encoding_to_char(d.encoding), "
+                           "null::text AS collation, null::text AS ctype, "
                            "'f' as datistemplate, "
                            "'' as datacl, -1 as datconnlimit, "
                            "'pg_default' AS dattablespace "
@@ -985,10 +1000,12 @@ dumpCreateDB(PGconn *conn)
         char       *dbname = PQgetvalue(res, i, 0);
         char       *dbowner = PQgetvalue(res, i, 1);
         char       *dbencoding = PQgetvalue(res, i, 2);
-        char       *dbistemplate = PQgetvalue(res, i, 3);
-        char       *dbacl = PQgetvalue(res, i, 4);
-        char       *dbconnlimit = PQgetvalue(res, i, 5);
-        char       *dbtablespace = PQgetvalue(res, i, 6);
+        char       *dbcollation = PQgetvalue(res, i, 3);
+        char       *dbctype = PQgetvalue(res, i, 4);
+        char       *dbistemplate = PQgetvalue(res, i, 5);
+        char       *dbacl = PQgetvalue(res, i, 6);
+        char       *dbconnlimit = PQgetvalue(res, i, 7);
+        char       *dbtablespace = PQgetvalue(res, i, 8);
         char       *fdbname;

         fdbname = strdup(fmtId(dbname));
@@ -1016,6 +1033,18 @@ dumpCreateDB(PGconn *conn)
             appendPQExpBuffer(buf, " ENCODING = ");
             appendStringLiteralConn(buf, dbencoding, conn);

+            if (strlen(dbcollation) != 0)
+            {
+                appendPQExpBuffer(buf, " COLLATE = ");
+                appendStringLiteralConn(buf, dbcollation, conn);
+            }
+
+            if (strlen(dbctype) != 0)
+            {
+                appendPQExpBuffer(buf, " CTYPE = ");
+                appendStringLiteralConn(buf, dbctype, conn);
+            }
+
             /*
              * Output tablespace if it isn't the default.  For default, it
              * uses the default from the template database.  If tablespace is
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 345b89c..0df796b 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -493,22 +493,6 @@ GuessControlValues(void)
 #endif
     ControlFile.float4ByVal = FLOAT4PASSBYVAL;
     ControlFile.float8ByVal = FLOAT8PASSBYVAL;
-    ControlFile.localeBuflen = LOCALE_NAME_BUFLEN;
-
-    localeptr = setlocale(LC_COLLATE, "");
-    if (!localeptr)
-    {
-        fprintf(stderr, _("%s: invalid LC_COLLATE setting\n"), progname);
-        exit(1);
-    }
-    strlcpy(ControlFile.lc_collate, localeptr, sizeof(ControlFile.lc_collate));
-    localeptr = setlocale(LC_CTYPE, "");
-    if (!localeptr)
-    {
-        fprintf(stderr, _("%s: invalid LC_CTYPE setting\n"), progname);
-        exit(1);
-    }
-    strlcpy(ControlFile.lc_ctype, localeptr, sizeof(ControlFile.lc_ctype));

     /*
      * XXX eventually, should try to grovel through old XLOG to develop more
@@ -584,12 +568,6 @@ PrintControlValues(bool guessed)
            (ControlFile.float4ByVal ? _("by value") : _("by reference")));
     printf(_("Float8 argument passing:              %s\n"),
            (ControlFile.float8ByVal ? _("by value") : _("by reference")));
-    printf(_("Maximum length of locale name:        %u\n"),
-           ControlFile.localeBuflen);
-    printf(_("LC_COLLATE:                           %s\n"),
-           ControlFile.lc_collate);
-    printf(_("LC_CTYPE:                             %s\n"),
-           ControlFile.lc_ctype);
 }


diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 900130d..2cf3172 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -454,11 +454,18 @@ listAllDbs(bool verbose)
     printfPQExpBuffer(&buf,
                       "SELECT d.datname as \"%s\",\n"
                       "       pg_catalog.pg_get_userbyid(d.datdba) as \"%s\",\n"
-                      "       pg_catalog.pg_encoding_to_char(d.encoding) as \"%s\",\n"
-                      "       d.datacl as \"%s\"",
+                      "       pg_catalog.pg_encoding_to_char(d.encoding) as \"%s\",\n",
                       gettext_noop("Name"),
                       gettext_noop("Owner"),
-                      gettext_noop("Encoding"),
+                      gettext_noop("Encoding"));
+    if (pset.sversion >= 80400)
+        appendPQExpBuffer(&buf,
+                          "       d.collation as \"%s\",\n"
+                          "       d.ctype as \"%s\",\n",
+                          gettext_noop("Collation"),
+                          gettext_noop("Ctype"));
+    appendPQExpBuffer(&buf,
+                      "       d.datacl as \"%s\"",
                       gettext_noop("Access Privileges"));
     if (verbose && pset.sversion >= 80200)
         appendPQExpBuffer(&buf,
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 286667e..37d41de 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -32,6 +32,8 @@ main(int argc, char *argv[])
         {"tablespace", required_argument, NULL, 'D'},
         {"template", required_argument, NULL, 'T'},
         {"encoding", required_argument, NULL, 'E'},
+        {"lc-collate", required_argument, NULL, 1},
+        {"lc-ctype", required_argument, NULL, 2},
         {NULL, 0, NULL, 0}
     };

@@ -50,6 +52,8 @@ main(int argc, char *argv[])
     char       *tablespace = NULL;
     char       *template = NULL;
     char       *encoding = NULL;
+    char       *lc_collate = NULL;
+    char       *lc_ctype = NULL;

     PQExpBufferData sql;

@@ -95,6 +99,12 @@ main(int argc, char *argv[])
             case 'E':
                 encoding = optarg;
                 break;
+            case 1:
+                lc_collate = optarg;
+                break;
+            case 2:
+                lc_ctype = optarg;
+                break;
             default:
                 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
                 exit(1);
@@ -152,6 +162,11 @@ main(int argc, char *argv[])
         appendPQExpBuffer(&sql, " ENCODING '%s'", encoding);
     if (template)
         appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template));
+    if (lc_collate)
+        appendPQExpBuffer(&sql, " COLLATE '%s'", lc_collate);
+    if (lc_ctype)
+        appendPQExpBuffer(&sql, " CTYPE '%s'", lc_ctype);
+
     appendPQExpBuffer(&sql, ";\n");

     conn = connectDatabase(strcmp(dbname, "postgres") == 0 ? "template1" : "postgres",
@@ -209,6 +224,9 @@ help(const char *progname)
     printf(_("\nOptions:\n"));
     printf(_("  -D, --tablespace=TABLESPACE  default tablespace for the database\n"));
     printf(_("  -E, --encoding=ENCODING      encoding for the database\n"));
+    printf(_("  --lc-collate=LOCALE          LC_COLLATE setting for the database\n"));
+    printf(_("  --lc-ctype=LOCALE            LC_CTYPE setting for the database\n"));
+
     printf(_("  -O, --owner=OWNER            database user to own the new database\n"));
     printf(_("  -T, --template=TEMPLATE      template database to copy\n"));
     printf(_("  -e, --echo                   show the commands being sent to the server\n"));
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 38b5a84..aee6934 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -144,11 +144,6 @@ typedef struct ControlFileData
     bool        float4ByVal;    /* float4 pass-by-value? */
     bool        float8ByVal;    /* float8, int8, etc pass-by-value? */

-    /* active locales */
-    uint32        localeBuflen;
-    char        lc_collate[LOCALE_NAME_BUFLEN];
-    char        lc_ctype[LOCALE_NAME_BUFLEN];
-
     /* CRC of all above ... MUST BE LAST! */
     pg_crc32    crc;
 } ControlFileData;
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index 6e9e5d2..1b10e60 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -33,6 +33,8 @@ CATALOG(pg_database,1262) BKI_SHARED_RELATION
     NameData    datname;        /* database name */
     Oid            datdba;            /* owner of database */
     int4        encoding;        /* character encoding */
+    NameData    collation;        /* LC_COLLATE of database */
+    NameData    ctype;            /* LC_CTYPE of database */
     bool        datistemplate;    /* allowed as CREATE DATABASE template? */
     bool        datallowconn;    /* new connections allowed? */
     int4        datconnlimit;    /* max connections allowed (-1=no limit) */
@@ -54,20 +56,22 @@ typedef FormData_pg_database *Form_pg_database;
  *        compiler constants for pg_database
  * ----------------
  */
-#define Natts_pg_database                11
+#define Natts_pg_database                13
 #define Anum_pg_database_datname        1
 #define Anum_pg_database_datdba            2
 #define Anum_pg_database_encoding        3
-#define Anum_pg_database_datistemplate    4
-#define Anum_pg_database_datallowconn    5
-#define Anum_pg_database_datconnlimit    6
-#define Anum_pg_database_datlastsysoid    7
-#define Anum_pg_database_datfrozenxid    8
-#define Anum_pg_database_dattablespace    9
-#define Anum_pg_database_datconfig        10
-#define Anum_pg_database_datacl            11
+#define Anum_pg_database_collation        4
+#define Anum_pg_database_ctype            5
+#define Anum_pg_database_datistemplate    6
+#define Anum_pg_database_datallowconn    7
+#define Anum_pg_database_datconnlimit    8
+#define Anum_pg_database_datlastsysoid    9
+#define Anum_pg_database_datfrozenxid    10
+#define Anum_pg_database_dattablespace    11
+#define Anum_pg_database_datconfig        12
+#define Anum_pg_database_datacl            13

-DATA(insert OID = 1 (  template1 PGUID ENCODING t t -1 0 0 1663 _null_ _null_ ));
+DATA(insert OID = 1 (  template1 PGUID ENCODING "LC_COLLATE" "LC_CTYPE" t t -1 0 0 1663 _null_ _null_));
 SHDESCR("default template database");
 #define TemplateDbOid            1

diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 5a49823..2b60027 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -39,6 +39,7 @@ extern const char *locale_numeric_assign(const char *value,
 extern const char *locale_time_assign(const char *value,
                    bool doit, GucSource source);

+extern bool check_locale(int category, const char *locale);
 extern char *pg_perm_setlocale(int category, const char *locale);

 extern bool lc_collate_is_c(void);
diff --git a/src/interfaces/ecpg/preproc/preproc.y b/src/interfaces/ecpg/preproc/preproc.y
index 0a8b62b..949e76b 100644
--- a/src/interfaces/ecpg/preproc/preproc.y
+++ b/src/interfaces/ecpg/preproc/preproc.y
@@ -428,7 +428,7 @@ add_typedef(char *name, char * dimension, char * length, enum ECPGttype type_enu
     CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
     COMMITTED CONCURRENTLY CONFIGURATION CONNECTION CONSTRAINT CONSTRAINTS
     CONTENT_P CONTINUE_P CONVERSION_P COPY COST CREATE CREATEDB
-    CREATEROLE CREATEUSER CROSS CSV CURRENT_P CURRENT_DATE CURRENT_ROLE
+    CREATEROLE CREATEUSER CROSS CSV CTYPE CURRENT_P CURRENT_DATE CURRENT_ROLE
     CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE

     DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS

pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Re: Assert Levels
Next
From: Tom Lane
Date:
Subject: Re: Assert Levels