Thread: [HACKERS] initdb initalization failure for collation "ja_JP"

[HACKERS] initdb initalization failure for collation "ja_JP"

From
Marco Atzeri
Date:
Building on Cygwin latest 10  beta1 or head sourece,
make check fails as:

---------------------initdb.log -----------------------------
The database cluster will be initialized with locales  COLLATE:  en_US.UTF-8  CTYPE:    en_US.UTF-8  MESSAGES: C
MONETARY:en_US.UTF-8  NUMERIC:  en_US.UTF-8  TIME:     en_US.UTF-8
 
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory 
/cygdrive/e/cyg_pub/devel/postgresql/prova/postgresql-10.0-0.1.x86_64/build/src/test/regress/./tmp_check/data 
... ok
creating subdirectories ... ok
selecting default max_connections ... 30
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... sysv
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... 2017-05-31 23:23:22.214 
CEST [16860] FATAL:  collation "ja_JP" for encoding "EUC_JP" already exists
-------------------------------------

This does not happen on 9.6.x.
Any idea what to look ?

Regards
Marco



Re: [HACKERS] initdb initalization failure for collation "ja_JP"

From
Tom Lane
Date:
Marco Atzeri <marco.atzeri@gmail.com> writes:
> Building on Cygwin latest 10  beta1 or head sourece,
> make check fails as:
> ...
> performing post-bootstrap initialization ... 2017-05-31 23:23:22.214 
> CEST [16860] FATAL:  collation "ja_JP" for encoding "EUC_JP" already exists

Hmph.  Could we see the results of "locale -a | grep ja_JP" ?
        regards, tom lane



Re: [HACKERS] initdb initalization failure for collation "ja_JP"

From
Tom Lane
Date:
I wrote:
> Marco Atzeri <marco.atzeri@gmail.com> writes:
>> Building on Cygwin latest 10  beta1 or head sourece,
>> make check fails as:
>> ...
>> performing post-bootstrap initialization ... 2017-05-31 23:23:22.214 
>> CEST [16860] FATAL:  collation "ja_JP" for encoding "EUC_JP" already exists

> Hmph.  Could we see the results of "locale -a | grep ja_JP" ?

Despite the lack of followup from the OP, I'm pretty troubled by this
report.  It shows that the reimplementation of OS collation data import
as pg_import_system_collations() is a whole lot more fragile than the
original coding.  We have never before trusted "locale -a" to not produce
duplicate outputs, not since the very beginning in 414c5a2e.  AFAICS,
the current coding has also lost the protections we added very shortly
after that in 853c1750f; and it has also lost the admittedly rather
arbitrary, but at least deterministic, preference order for conflicting
short aliases that was in the original initdb code.

I suppose the idea was to see whether we actually needed those defenses,
but since we have here a failure report after less than a month of beta,
it seems clear to me that we do.  I think we need to upgrade
pg_import_system_collations to have all the same logic that was there
before.

Now the hard part of that is that because pg_import_system_collations
isn't using a temporary staging table, but is just inserting directly
into pg_collation, there isn't any way for it to eliminate duplicates
unless it uses if_not_exists behavior all the time.  So there seem to
be two ways to proceed:

1. Drop pg_import_system_collations' if_not_exists argument and just
define it as adding any collations not already known in pg_collation.

2. Significantly rewrite it so that it de-dups the collation set by
hand before trying to insert into pg_collation.

#2 seems like a lot more work, but on the other hand, we might need
most of that logic anyway to get back deterministic alias handling.
However, since I cannot see any real-world use case at all for
if_not_exists = false, I figure we might as well do #1 and take
whatever simplification we can get that way.

I'm willing to do the legwork on this, but before I start, does
anyone have any ideas or objections?
        regards, tom lane



Re: [HACKERS] initdb initalization failure for collation "ja_JP"

From
Marco Atzeri
Date:
On 18/06/2017 16:48, Tom Lane wrote:
> Marco Atzeri <marco.atzeri@gmail.com> writes:
>> Building on Cygwin latest 10  beta1 or head sourece,
>> make check fails as:
>> ...
>> performing post-bootstrap initialization ... 2017-05-31 23:23:22.214
>> CEST [16860] FATAL:  collation "ja_JP" for encoding "EUC_JP" already exists
>
> Hmph.  Could we see the results of "locale -a | grep ja_JP" ?
>
>             regards, tom lane
>

$ locale -a |grep -i ja
ja_JP
ja_JP
ja_JP.utf8
ja_JP.ujis
ja_JP@cjknarrow
ja_JP.utf8@cjknarrow
japanese
japanese.euc
japanese.sjis

$ locale -a |grep -i "euc"
japanese.euc
korean.euc

Regards
Marco



Re: [HACKERS] initdb initalization failure for collation "ja_JP"

From
Marco Atzeri
Date:
On 20/06/2017 17:37, Tom Lane wrote:
> I wrote:
>> Marco Atzeri <marco.atzeri@gmail.com> writes:
>>> Building on Cygwin latest 10  beta1 or head sourece,
>>> make check fails as:
>>> ...
>>> performing post-bootstrap initialization ... 2017-05-31 23:23:22.214
>>> CEST [16860] FATAL:  collation "ja_JP" for encoding "EUC_JP" already exists
>
>> Hmph.  Could we see the results of "locale -a | grep ja_JP" ?
>
> Despite the lack of followup from the OP, I'm pretty troubled by this
> report.  It shows that the reimplementation of OS collation data import
> as pg_import_system_collations() is a whole lot more fragile than the
> original coding.  We have never before trusted "locale -a" to not produce
> duplicate outputs, not since the very beginning in 414c5a2e.  AFAICS,
> the current coding has also lost the protections we added very shortly
> after that in 853c1750f; and it has also lost the admittedly rather
> arbitrary, but at least deterministic, preference order for conflicting
> short aliases that was in the original initdb code.

Hi Tom,
I raised the duplication issue on the cygwin mailing list,
and one of the core developer reports that
they saw the same issues on Linux in the past.

https://cygwin.com/ml/cygwin/2017-06/msg00253.html


>             regards, tom lane

Regards
Marco





Re: [HACKERS] initdb initalization failure for collation "ja_JP"

From
Tom Lane
Date:
I wrote:
> Now the hard part of that is that because pg_import_system_collations
> isn't using a temporary staging table, but is just inserting directly
> into pg_collation, there isn't any way for it to eliminate duplicates
> unless it uses if_not_exists behavior all the time.  So there seem to
> be two ways to proceed:
> 1. Drop pg_import_system_collations' if_not_exists argument and just
> define it as adding any collations not already known in pg_collation.
> 2. Significantly rewrite it so that it de-dups the collation set by
> hand before trying to insert into pg_collation.

After further thought, I realized that there's another argument for
making pg_import_system_collations() always do if-not-exists, which
is that as it stands it's inconsistent anyway because it silently
uses if-not-exists behavior for aliases.

So attached is a draft patch that drops if_not_exists and tweaks the
alias insertion code to guarantee deterministic behavior.  I also
corrected failure to use CommandCounterIncrement() in the ICU part
of the code, which would cause problems if ICU can ever report
duplicate names; something I don't especially care to assume is
impossible.  Also, I fixed things so that pg_import_system_collations()
doesn't emit any chatter about duplicate existing names, because it
didn't take long to realize that that behavior was useless and
irritating.  (Perhaps we should change the function to return the
number of entries successfully added?  But I didn't do that here.)

I've tested this with a faked version of "locale -a" that generates
duplicated entries, and it does what I think it should.

One question that I've got is why the ICU portion refuses to load
any entries unless is_encoding_supported_by_icu(GetDatabaseEncoding()).
Surely this is completely wrong?  I should think that what we load into
pg_collation ought to be independent of template1's encoding, the same
as it is for libc collations, and the right place to be making a test
like that is where somebody attempts to use an ICU collation.  But
I've not tried to test it.

Also, I'm a bit tempted to remove setup_collation()'s manual insertion of
'ucs_basic' in favor of adding a DATA() line for that to pg_collation.h.
As things stand, if for some reason "locale -a" were to report a locale
named that, initdb would fail.  In the old code, it would not have failed
--- it's uncertain whether you would have gotten the forced UTF8/C
definition or libc's definition, but you would have gotten only one.
However, that approach would result in 'ucs_basic' being treated as
pinned, which perhaps we don't want.  If we don't want it to be pinned,
another idea is just to make setup_collation() insert it first not last,
thereby ensuring that the SQL definition wins over any libc entries.

Comments?

            regards, tom lane

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e073f7b..bd2eaaa 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*************** postgres=# SELECT * FROM pg_walfile_name
*** 19711,19717 ****
        <row>
         <entry>
          <indexterm><primary>pg_import_system_collations</primary></indexterm>
!         <literal><function>pg_import_system_collations(<parameter>if_not_exists</> <type>boolean</>,
<parameter>schema</><type>regnamespace</>)</function></literal> 
         </entry>
         <entry><type>void</type></entry>
         <entry>Import operating system collations</entry>
--- 19711,19717 ----
        <row>
         <entry>
          <indexterm><primary>pg_import_system_collations</primary></indexterm>
!         <literal><function>pg_import_system_collations(<parameter>schema</>
<type>regnamespace</>)</function></literal>
         </entry>
         <entry><type>void</type></entry>
         <entry>Import operating system collations</entry>
*************** postgres=# SELECT * FROM pg_walfile_name
*** 19730,19747 ****
     </para>

     <para>
!     <function>pg_import_system_collations</> populates the system
!     catalog <literal>pg_collation</literal> with collations based on all the
!     locales it finds on the operating system.  This is
      what <command>initdb</command> uses;
      see <xref linkend="collation-managing"> for more details.  If additional
      locales are installed into the operating system later on, this function
!     can be run again to add collations for the new locales.  In that case, the
!     parameter <parameter>if_not_exists</parameter> should be set to true to
!     skip over existing collations.  The <parameter>schema</parameter>
!     parameter would typically be <literal>pg_catalog</literal>, but that is
!     not a requirement.  (Collation objects based on locales that are no longer
!     present on the operating system are never removed by this function.)
     </para>

    </sect2>
--- 19730,19748 ----
     </para>

     <para>
!     <function>pg_import_system_collations</> adds collations to the system
!     catalog <literal>pg_collation</literal> based on all the
!     locales it finds in the operating system.  This is
      what <command>initdb</command> uses;
      see <xref linkend="collation-managing"> for more details.  If additional
      locales are installed into the operating system later on, this function
!     can be run again to add collations for the new locales.  Locales that
!     match existing entries in <literal>pg_collation</literal> will be skipped.
!     (But collation objects based on locales that are no longer
!     present in the operating system are not removed by this function.)
!     The <parameter>schema</parameter> parameter would typically
!     be <literal>pg_catalog</literal>, but that is not a requirement;
!     the collations could be installed into some other schema as well.
     </para>

    </sect2>
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index 6b1b72f..bf66bde 100644
*** a/src/backend/catalog/pg_collation.c
--- b/src/backend/catalog/pg_collation.c
***************
*** 37,42 ****
--- 37,47 ----
   * CollationCreate
   *
   * Add a new tuple to pg_collation.
+  *
+  * if_not_exists: if true, don't fail on duplicate name, just print a notice
+  * and return InvalidOid.
+  * quiet: if true, don't fail on duplicate name, just silently return
+  * InvalidOid (overrides if_not_exists).
   */
  Oid
  CollationCreate(const char *collname, Oid collnamespace,
*************** CollationCreate(const char *collname, Oi
*** 45,51 ****
                  int32 collencoding,
                  const char *collcollate, const char *collctype,
                  const char *collversion,
!                 bool if_not_exists)
  {
      Relation    rel;
      TupleDesc    tupDesc;
--- 50,57 ----
                  int32 collencoding,
                  const char *collcollate, const char *collctype,
                  const char *collversion,
!                 bool if_not_exists,
!                 bool quiet)
  {
      Relation    rel;
      TupleDesc    tupDesc;
*************** CollationCreate(const char *collname, Oi
*** 77,83 ****
                                Int32GetDatum(collencoding),
                                ObjectIdGetDatum(collnamespace)))
      {
!         if (if_not_exists)
          {
              ereport(NOTICE,
                      (errcode(ERRCODE_DUPLICATE_OBJECT),
--- 83,91 ----
                                Int32GetDatum(collencoding),
                                ObjectIdGetDatum(collnamespace)))
      {
!         if (quiet)
!             return InvalidOid;
!         else if (if_not_exists)
          {
              ereport(NOTICE,
                      (errcode(ERRCODE_DUPLICATE_OBJECT),
*************** CollationCreate(const char *collname, Oi
*** 119,125 ****
                                 Int32GetDatum(-1),
                                 ObjectIdGetDatum(collnamespace))))
      {
!         if (if_not_exists)
          {
              heap_close(rel, NoLock);
              ereport(NOTICE,
--- 127,135 ----
                                 Int32GetDatum(-1),
                                 ObjectIdGetDatum(collnamespace))))
      {
!         if (quiet)
!             return InvalidOid;
!         else if (if_not_exists)
          {
              heap_close(rel, NoLock);
              ereport(NOTICE,
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index bdfd739..ec56320 100644
*** a/src/backend/commands/collationcmds.c
--- b/src/backend/commands/collationcmds.c
***************
*** 37,42 ****
--- 37,50 ----
  #include "utils/syscache.h"


+ typedef struct
+ {
+     char       *localename;        /* name of locale, as per "locale -a" */
+     char       *alias;            /* shortened alias for same */
+     int            enc;            /* encoding */
+ } CollAliasData;
+
+
  /*
   * CREATE COLLATION
   */
*************** DefineCollation(ParseState *pstate, List
*** 196,202 ****
                               collcollate,
                               collctype,
                               collversion,
!                              if_not_exists);

      if (!OidIsValid(newoid))
          return InvalidObjectAddress;
--- 204,211 ----
                               collcollate,
                               collctype,
                               collversion,
!                              if_not_exists,
!                              false);    /* not quiet */

      if (!OidIsValid(newoid))
          return InvalidObjectAddress;
*************** pg_collation_actual_version(PG_FUNCTION_
*** 344,356 ****
  }


  /*
   * "Normalize" a libc locale name, stripping off encoding tags such as
   * ".utf8" (e.g., "en_US.utf8" -> "en_US", but "br_FR.iso885915@euro"
   * -> "br_FR@euro").  Return true if a new, different name was
   * generated.
   */
- pg_attribute_unused()
  static bool
  normalize_libc_locale_name(char *new, const char *old)
  {
--- 353,370 ----
  }


+ /* will we use "locale -a" in pg_import_system_collations? */
+ #if defined(HAVE_LOCALE_T) && !defined(WIN32)
+ #define READ_LOCALE_A_OUTPUT
+ #endif
+
+ #ifdef READ_LOCALE_A_OUTPUT
  /*
   * "Normalize" a libc locale name, stripping off encoding tags such as
   * ".utf8" (e.g., "en_US.utf8" -> "en_US", but "br_FR.iso885915@euro"
   * -> "br_FR@euro").  Return true if a new, different name was
   * generated.
   */
  static bool
  normalize_libc_locale_name(char *new, const char *old)
  {
*************** normalize_libc_locale_name(char *new, co
*** 379,384 ****
--- 393,412 ----
      return changed;
  }

+ /*
+  * qsort comparator for CollAliasData items
+  */
+ static int
+ cmpaliases(const void *a, const void *b)
+ {
+     const CollAliasData *ca = (const CollAliasData *) a;
+     const CollAliasData *cb = (const CollAliasData *) b;
+
+     /* comparing localename is enough because other fields are derived */
+     return strcmp(ca->localename, cb->localename);
+ }
+ #endif                            /* READ_LOCALE_A_OUTPUT */
+

  #ifdef USE_ICU
  static char *
*************** get_icu_locale_comment(const char *local
*** 420,455 ****
  #endif                            /* USE_ICU */


  Datum
  pg_import_system_collations(PG_FUNCTION_ARGS)
  {
!     bool        if_not_exists = PG_GETARG_BOOL(0);
!     Oid            nspid = PG_GETARG_OID(1);

! #if defined(HAVE_LOCALE_T) && !defined(WIN32)
      FILE       *locale_a_handle;
      char        localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
      int            count = 0;
!     List       *aliaslist = NIL;
!     List       *localelist = NIL;
!     List       *enclist = NIL;
!     ListCell   *lca,
!                *lcl,
!                *lce;
  #endif

      if (!superuser())
          ereport(ERROR,
                  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
                   (errmsg("must be superuser to import system collations"))));

! #if !(defined(HAVE_LOCALE_T) && !defined(WIN32)) && !defined(USE_ICU)
!     /* silence compiler warnings */
!     (void) if_not_exists;
!     (void) nspid;
! #endif

- #if defined(HAVE_LOCALE_T) && !defined(WIN32)
      locale_a_handle = OpenPipeStream("locale -a", "r");
      if (locale_a_handle == NULL)
          ereport(ERROR,
--- 448,486 ----
  #endif                            /* USE_ICU */


+ /*
+  * pg_import_system_collations: add known system collations to pg_collation
+  */
  Datum
  pg_import_system_collations(PG_FUNCTION_ARGS)
  {
!     Oid            nspid = PG_GETARG_OID(0);

! #ifdef READ_LOCALE_A_OUTPUT
      FILE       *locale_a_handle;
      char        localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
      int            count = 0;
!     CollAliasData *aliases;
!     int            naliases,
!                 maxaliases,
!                 i;
  #endif

+     /* silence compiler warning if we have no locale implementation at all */
+     (void) nspid;
+
      if (!superuser())
          ereport(ERROR,
                  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
                   (errmsg("must be superuser to import system collations"))));

!     /* Load collations known to libc, using "locale -a" to enumerate them */
! #ifdef READ_LOCALE_A_OUTPUT
!     /* expansible array of aliases */
!     maxaliases = 100;
!     aliases = (CollAliasData *) palloc(maxaliases * sizeof(CollAliasData));
!     naliases = 0;

      locale_a_handle = OpenPipeStream("locale -a", "r");
      if (locale_a_handle == NULL)
          ereport(ERROR,
*************** pg_import_system_collations(PG_FUNCTION_
*** 459,465 ****

      while (fgets(localebuf, sizeof(localebuf), locale_a_handle))
      {
-         int            i;
          size_t        len;
          int            enc;
          bool        skip;
--- 490,495 ----
*************** pg_import_system_collations(PG_FUNCTION_
*** 507,519 ****
          if (enc == PG_SQL_ASCII)
              continue;            /* C/POSIX are already in the catalog */

          count++;

          CollationCreate(localebuf, nspid, GetUserId(), COLLPROVIDER_LIBC, enc,
                          localebuf, localebuf,
                          get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
!                         if_not_exists);

          CommandCounterIncrement();

          /*
--- 537,559 ----
          if (enc == PG_SQL_ASCII)
              continue;            /* C/POSIX are already in the catalog */

+         /* count valid locales found in operating system */
          count++;

+         /*
+          * Create a collation named the same as the locale, but quietly doing
+          * nothing if it already exists.  This is the behavior we need even at
+          * initdb time, because some implementations of "locale -a" can report
+          * the same locale name more than once.  And it's convenient for later
+          * import runs, too, since you just about always want to add on new
+          * locales without a lot of chatter about existing ones.
+          */
          CollationCreate(localebuf, nspid, GetUserId(), COLLPROVIDER_LIBC, enc,
                          localebuf, localebuf,
                          get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
!                         true, true);

+         /* Must do CCI between inserts to handle duplicates correctly */
          CommandCounterIncrement();

          /*
*************** pg_import_system_collations(PG_FUNCTION_
*** 527,559 ****
           */
          if (normalize_libc_locale_name(alias, localebuf))
          {
!             aliaslist = lappend(aliaslist, pstrdup(alias));
!             localelist = lappend(localelist, pstrdup(localebuf));
!             enclist = lappend_int(enclist, enc);
          }
      }

      ClosePipeStream(locale_a_handle);

!     /* Now try to add any aliases we created */
!     forthree(lca, aliaslist, lcl, localelist, lce, enclist)
      {
!         char       *alias = (char *) lfirst(lca);
!         char       *locale = (char *) lfirst(lcl);
!         int            enc = lfirst_int(lce);

          CollationCreate(alias, nspid, GetUserId(), COLLPROVIDER_LIBC, enc,
                          locale, locale,
                          get_collation_actual_version(COLLPROVIDER_LIBC, locale),
!                         true);
          CommandCounterIncrement();
      }

      if (count == 0)
          ereport(WARNING,
                  (errmsg("no usable system locales were found")));
  #endif                            /* not HAVE_LOCALE_T && not WIN32 */

  #ifdef USE_ICU
      if (!is_encoding_supported_by_icu(GetDatabaseEncoding()))
      {
--- 567,622 ----
           */
          if (normalize_libc_locale_name(alias, localebuf))
          {
!             if (naliases >= maxaliases)
!             {
!                 maxaliases *= 2;
!                 aliases = (CollAliasData *)
!                     repalloc(aliases, maxaliases * sizeof(CollAliasData));
!             }
!             aliases[naliases].localename = pstrdup(localebuf);
!             aliases[naliases].alias = pstrdup(alias);
!             aliases[naliases].enc = enc;
!             naliases++;
          }
      }

      ClosePipeStream(locale_a_handle);

!     /*
!      * Before processing the aliases, sort them by locale name.  The point
!      * here is that if "locale -a" gives us multiple locale names with the
!      * same encoding and base name, say "en_US.utf8" and "en_US.utf-8", we
!      * want to pick a deterministic one of them.  First in ASCII sort order is
!      * a good enough rule.  (Before PG 10, the code corresponding to this
!      * logic in initdb.c had an additional ordering rule, to prefer the locale
!      * name exactly matching the alias, if any.  We don't need to consider
!      * that here, because we would have already created such a pg_collation
!      * entry above, and that one will win.)
!      */
!     if (naliases > 1)
!         qsort((void *) aliases, naliases, sizeof(CollAliasData), cmpaliases);
!
!     /* Now add aliases, ignoring any that match pre-existing entries */
!     for (i = 0; i < naliases; i++)
      {
!         char       *locale = aliases[i].localename;
!         char       *alias = aliases[i].alias;
!         int            enc = aliases[i].enc;

          CollationCreate(alias, nspid, GetUserId(), COLLPROVIDER_LIBC, enc,
                          locale, locale,
                          get_collation_actual_version(COLLPROVIDER_LIBC, locale),
!                         true, true);
          CommandCounterIncrement();
      }

+     /* Give a warning if "locale -a" seems to be malfunctioning */
      if (count == 0)
          ereport(WARNING,
                  (errmsg("no usable system locales were found")));
  #endif                            /* not HAVE_LOCALE_T && not WIN32 */

+     /* Load collations known to ICU */
  #ifdef USE_ICU
      if (!is_encoding_supported_by_icu(GetDatabaseEncoding()))
      {
*************** pg_import_system_collations(PG_FUNCTION_
*** 591,597 ****
                                       nspid, GetUserId(), COLLPROVIDER_ICU, -1,
                                       collcollate, collcollate,
                                       get_collation_actual_version(COLLPROVIDER_ICU, collcollate),
!                                      if_not_exists);

              CreateComments(collid, CollationRelationId, 0,
                             get_icu_locale_comment(name));
--- 654,661 ----
                                       nspid, GetUserId(), COLLPROVIDER_ICU, -1,
                                       collcollate, collcollate,
                                       get_collation_actual_version(COLLPROVIDER_ICU, collcollate),
!                                      true, true);
!             CommandCounterIncrement();

              CreateComments(collid, CollationRelationId, 0,
                             get_icu_locale_comment(name));
*************** pg_import_system_collations(PG_FUNCTION_
*** 618,624 ****
                                           nspid, GetUserId(), COLLPROVIDER_ICU, -1,
                                           collcollate, collcollate,
                                           get_collation_actual_version(COLLPROVIDER_ICU, collcollate),
!                                          if_not_exists);
                  CreateComments(collid, CollationRelationId, 0,
                                 get_icu_locale_comment(localeid));
              }
--- 682,690 ----
                                           nspid, GetUserId(), COLLPROVIDER_ICU, -1,
                                           collcollate, collcollate,
                                           get_collation_actual_version(COLLPROVIDER_ICU, collcollate),
!                                          true, true);
!                 CommandCounterIncrement();
!
                  CreateComments(collid, CollationRelationId, 0,
                                 get_icu_locale_comment(localeid));
              }
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 0f22e6d..20c7d07 100644
*** a/src/bin/initdb/initdb.c
--- b/src/bin/initdb/initdb.c
*************** setup_description(FILE *cmdfd)
*** 1627,1633 ****
  static void
  setup_collation(FILE *cmdfd)
  {
!     PG_CMD_PUTS("SELECT pg_import_system_collations(if_not_exists => false, schema => 'pg_catalog');\n\n");

      /* Add an SQL-standard name */
      PG_CMD_PRINTF3("INSERT INTO pg_collation (collname, collnamespace, collowner, collprovider, collencoding,
collcollate,collctype) VALUES ('ucs_basic', 'pg_catalog'::regnamespace, %u, '%c', %d, 'C', 'C');\n\n",
BOOTSTRAP_SUPERUSERID,COLLPROVIDER_LIBC, PG_UTF8); 
--- 1627,1633 ----
  static void
  setup_collation(FILE *cmdfd)
  {
!     PG_CMD_PUTS("SELECT pg_import_system_collations('pg_catalog');\n\n");

      /* Add an SQL-standard name */
      PG_CMD_PRINTF3("INSERT INTO pg_collation (collname, collnamespace, collowner, collprovider, collencoding,
collcollate,collctype) VALUES ('ucs_basic', 'pg_catalog'::regnamespace, %u, '%c', %d, 'C', 'C');\n\n",
BOOTSTRAP_SUPERUSERID,COLLPROVIDER_LIBC, PG_UTF8); 
diff --git a/src/include/catalog/pg_collation_fn.h b/src/include/catalog/pg_collation_fn.h
index 26b8f0d..0ef3138 100644
*** a/src/include/catalog/pg_collation_fn.h
--- b/src/include/catalog/pg_collation_fn.h
*************** extern Oid CollationCreate(const char *c
*** 20,26 ****
                  int32 collencoding,
                  const char *collcollate, const char *collctype,
                  const char *collversion,
!                 bool if_not_exists);
  extern void RemoveCollationById(Oid collationOid);

  #endif                            /* PG_COLLATION_FN_H */
--- 20,27 ----
                  int32 collencoding,
                  const char *collcollate, const char *collctype,
                  const char *collversion,
!                 bool if_not_exists,
!                 bool quiet);
  extern void RemoveCollationById(Oid collationOid);

  #endif                            /* PG_COLLATION_FN_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 1191b4a..282a2ac 100644
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DESCR("pg_controldata recovery state inf
*** 5462,5472 ****
  DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 2249 ""
"{23,23,23,23,23,23,23,23,23,16,16,23}""{o,o,o,o,o,o,o,o,o,o,o,o}"
"{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}"
_null__null_ pg_control_init _null_ _null_ _null_ )); 
  DESCR("pg_controldata init state information as a function");

! DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f f t f v r 2 0 2278 "16 4089"
_null__null_ "{if_not_exists,schema}" _null_ _null_ pg_import_system_collations _null_ _null_ _null_ )); 
  DESCR("import collations from operating system");

  DATA(insert OID = 3448 ( pg_collation_actual_version PGNSP PGUID 12 100 0 0 0 f f f f t f v s 1 0 25 "26" _null_
_null__null_ _null_ _null_ pg_collation_actual_version _null_ _null_ _null_ )); 
! DESCR("import collations from operating system");

  /* system management/monitoring related functions */
  DATA(insert OID = 3353 (  pg_ls_logdir                 PGNSP PGUID 12 10 20 0 0 f f f f t t v s 0 0 2249 ""
"{25,20,1184}""{o,o,o}" "{name,size,modification}" _null_ _null_ pg_ls_logdir _null_ _null_ _null_ )); 
--- 5462,5473 ----
  DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 2249 ""
"{23,23,23,23,23,23,23,23,23,16,16,23}""{o,o,o,o,o,o,o,o,o,o,o,o}"
"{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}"
_null__null_ pg_control_init _null_ _null_ _null_ )); 
  DESCR("pg_controldata init state information as a function");

! /* collation management functions */
! DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f f t f v r 1 0 2278 "4089" _null_
_null__null_ _null_ _null_ pg_import_system_collations _null_ _null_ _null_ )); 
  DESCR("import collations from operating system");

  DATA(insert OID = 3448 ( pg_collation_actual_version PGNSP PGUID 12 100 0 0 0 f f f f t f v s 1 0 25 "26" _null_
_null__null_ _null_ _null_ pg_collation_actual_version _null_ _null_ _null_ )); 
! DESCR("get actual version of collation from operating system");

  /* system management/monitoring related functions */
  DATA(insert OID = 3353 (  pg_ls_logdir                 PGNSP PGUID 12 10 20 0 0 f f f f t t v s 0 0 2249 ""
"{25,20,1184}""{o,o,o}" "{name,size,modification}" _null_ _null_ pg_ls_logdir _null_ _null_ _null_ )); 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] initdb initalization failure for collation "ja_JP"

From
Tom Lane
Date:
I wrote:
> One question that I've got is why the ICU portion refuses to load
> any entries unless is_encoding_supported_by_icu(GetDatabaseEncoding()).
> Surely this is completely wrong?  I should think that what we load into
> pg_collation ought to be independent of template1's encoding, the same
> as it is for libc collations, and the right place to be making a test
> like that is where somebody attempts to use an ICU collation.  But
> I've not tried to test it.

So I did test that, and found out the presumable reason why that's there:
icu_from_uchar() falls over if the database encoding is unsupported, and
we use that to convert ICU "display names" for use as comments for the
ICU collations.  But that's not very much less wrongheaded, because it will
allow non-ASCII characters into the initial database contents, which is
absolutely not acceptable.  We assume we can bit-copy the contents of
template0 and it will be valid in any encoding.

Therefore, I think the right thing to do is remove that test and change
get_icu_locale_comment() so that it rejects non-ASCII text, making the
encoding conversion trivial, as in the attached patch.

On my Fedora 25 laptop, the only collations that go without a comment
in this approach are the "nb" ones (Norwegian Bokmål).  As I recall,
that locale is a second-class citizen for other reasons already,
precisely because of its loony insistence on a non-ASCII name even
when we're asking for an Anglicized version.

I'm inclined to add a test to reject non-ASCII in the ICU locale names as
well as the comments.  We've had to do that for libc locale names, and
this experience shows that the ICU locale maintainers don't have their
heads screwed on any straighter.  But this patch doesn't do that.

            regards, tom lane

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 1c43f0b..6febe63 100644
*** a/src/backend/commands/collationcmds.c
--- b/src/backend/commands/collationcmds.c
*************** get_icu_language_tag(const char *localen
*** 431,437 ****

  /*
   * Get a comment (specifically, the display name) for an ICU locale.
!  * The result is a palloc'd string.
   */
  static char *
  get_icu_locale_comment(const char *localename)
--- 431,439 ----

  /*
   * Get a comment (specifically, the display name) for an ICU locale.
!  * The result is a palloc'd string, or NULL if we can't get a comment
!  * or find that it's not all ASCII.  (We can *not* accept non-ASCII
!  * comments, because the contents of template0 must be encoding-agnostic.)
   */
  static char *
  get_icu_locale_comment(const char *localename)
*************** get_icu_locale_comment(const char *local
*** 439,444 ****
--- 441,447 ----
      UErrorCode    status;
      UChar        displayname[128];
      int32        len_uchar;
+     int32        i;
      char       *result;

      status = U_ZERO_ERROR;
*************** get_icu_locale_comment(const char *local
*** 446,456 ****
                                      displayname, lengthof(displayname),
                                      &status);
      if (U_FAILURE(status))
!         ereport(ERROR,
!                 (errmsg("could not get display name for locale \"%s\": %s",
!                         localename, u_errorName(status))));

!     icu_from_uchar(&result, displayname, len_uchar);

      return result;
  }
--- 449,468 ----
                                      displayname, lengthof(displayname),
                                      &status);
      if (U_FAILURE(status))
!         return NULL;            /* no good reason to raise an error */

!     /* Check for non-ASCII comment */
!     for (i = 0; i < len_uchar; i++)
!     {
!         if (displayname[i] > 127)
!             return NULL;
!     }
!
!     /* OK, transcribe */
!     result = palloc(len_uchar + 1);
!     for (i = 0; i < len_uchar; i++)
!         result[i] = displayname[i];
!     result[len_uchar] = '\0';

      return result;
  }
*************** pg_import_system_collations(PG_FUNCTION_
*** 642,655 ****

      /* Load collations known to ICU */
  #ifdef USE_ICU
-     if (!is_encoding_supported_by_icu(GetDatabaseEncoding()))
-     {
-         ereport(NOTICE,
-                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-                  errmsg("encoding \"%s\" not supported by ICU",
-                         pg_encoding_to_char(GetDatabaseEncoding()))));
-     }
-     else
      {
          int            i;

--- 654,659 ----
*************** pg_import_system_collations(PG_FUNCTION_
*** 661,666 ****
--- 665,671 ----
          {
              const char *name;
              char       *langtag;
+             char       *icucomment;
              const char *collcollate;
              UEnumeration *en;
              UErrorCode    status;
*************** pg_import_system_collations(PG_FUNCTION_
*** 686,693 ****

                  CommandCounterIncrement();

!                 CreateComments(collid, CollationRelationId, 0,
!                                get_icu_locale_comment(name));
              }

              /*
--- 691,700 ----

                  CommandCounterIncrement();

!                 icucomment = get_icu_locale_comment(name);
!                 if (icucomment)
!                     CreateComments(collid, CollationRelationId, 0,
!                                    icucomment);
              }

              /*
*************** pg_import_system_collations(PG_FUNCTION_
*** 720,727 ****

                      CommandCounterIncrement();

!                     CreateComments(collid, CollationRelationId, 0,
!                                    get_icu_locale_comment(localeid));
                  }
              }
              if (U_FAILURE(status))
--- 727,736 ----

                      CommandCounterIncrement();

!                     icucomment = get_icu_locale_comment(name);
!                     if (icucomment)
!                         CreateComments(collid, CollationRelationId, 0,
!                                        icucomment);
                  }
              }
              if (U_FAILURE(status))

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] initdb initalization failure for collation "ja_JP"

From
Tom Lane
Date:
I wrote:
>> One question that I've got is why the ICU portion refuses to load
>> any entries unless is_encoding_supported_by_icu(GetDatabaseEncoding()).
>> Surely this is completely wrong?  I should think that what we load into
>> pg_collation ought to be independent of template1's encoding, the same
>> as it is for libc collations, and the right place to be making a test
>> like that is where somebody attempts to use an ICU collation.

Pursuant to the second part of that: I checked on what happens if you
try to use an ICU collation in a database with a not-supported-by-ICU
encoding.  We have to cope with that scenario even with the current
(broken IMO) initdb behavior, because even if template1 has a supported
encoding, it's possible to create another database that doesn't.
It does fail more or less cleanly; you get an "encoding "foo" not
supported by ICU" message at runtime (out of get_encoding_name_for_icu).
But that's quite a bit unlike the behavior for libc collations: with
those, you get an error in collation name lookup, along the lines of

collation "en_DK.utf8" for encoding "SQL_ASCII" does not exist

The attached proposed patch makes the behavior for ICU collations the
same, by dint of injecting the is_encoding_supported_by_icu() check
into collation name lookup.

            regards, tom lane

diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 64f6fee..029a132 100644
*** a/src/backend/catalog/namespace.c
--- b/src/backend/catalog/namespace.c
*************** OpfamilyIsVisible(Oid opfid)
*** 1915,1923 ****
--- 1915,1974 ----
  }

  /*
+  * lookup_collation
+  *        If there's a collation of the given name/namespace, and it works
+  *        with the given encoding, return its OID.  Else return InvalidOid.
+  */
+ static Oid
+ lookup_collation(const char *collname, Oid collnamespace, int32 encoding)
+ {
+     Oid            collid;
+     HeapTuple    colltup;
+     Form_pg_collation collform;
+
+     /* Check for encoding-specific entry (exact match) */
+     collid = GetSysCacheOid3(COLLNAMEENCNSP,
+                              PointerGetDatum(collname),
+                              Int32GetDatum(encoding),
+                              ObjectIdGetDatum(collnamespace));
+     if (OidIsValid(collid))
+         return collid;
+
+     /*
+      * Check for any-encoding entry.  This takes a bit more work: while libc
+      * collations with collencoding = -1 do work with all encodings, ICU
+      * collations only work with certain encodings, so we have to check that
+      * aspect before deciding it's a match.
+      */
+     colltup = SearchSysCache3(COLLNAMEENCNSP,
+                               PointerGetDatum(collname),
+                               Int32GetDatum(-1),
+                               ObjectIdGetDatum(collnamespace));
+     if (!HeapTupleIsValid(colltup))
+         return InvalidOid;
+     collform = (Form_pg_collation) GETSTRUCT(colltup);
+     if (collform->collprovider == COLLPROVIDER_ICU)
+     {
+         if (is_encoding_supported_by_icu(encoding))
+             collid = HeapTupleGetOid(colltup);
+         else
+             collid = InvalidOid;
+     }
+     else
+     {
+         collid = HeapTupleGetOid(colltup);
+     }
+     ReleaseSysCache(colltup);
+     return collid;
+ }
+
+ /*
   * CollationGetCollid
   *        Try to resolve an unqualified collation name.
   *        Returns OID if collation found in search path, else InvalidOid.
+  *
+  * Note that this will only find collations that work with the current
+  * database's encoding.
   */
  Oid
  CollationGetCollid(const char *collname)
*************** CollationGetCollid(const char *collname)
*** 1935,1953 ****
          if (namespaceId == myTempNamespace)
              continue;            /* do not look in temp namespace */

!         /* Check for database-encoding-specific entry */
!         collid = GetSysCacheOid3(COLLNAMEENCNSP,
!                                  PointerGetDatum(collname),
!                                  Int32GetDatum(dbencoding),
!                                  ObjectIdGetDatum(namespaceId));
!         if (OidIsValid(collid))
!             return collid;
!
!         /* Check for any-encoding entry */
!         collid = GetSysCacheOid3(COLLNAMEENCNSP,
!                                  PointerGetDatum(collname),
!                                  Int32GetDatum(-1),
!                                  ObjectIdGetDatum(namespaceId));
          if (OidIsValid(collid))
              return collid;
      }
--- 1986,1992 ----
          if (namespaceId == myTempNamespace)
              continue;            /* do not look in temp namespace */

!         collid = lookup_collation(collname, namespaceId, dbencoding);
          if (OidIsValid(collid))
              return collid;
      }
*************** CollationGetCollid(const char *collname)
*** 1961,1966 ****
--- 2000,2008 ----
   *        Determine whether a collation (identified by OID) is visible in the
   *        current search path.  Visible means "would be found by searching
   *        for the unqualified collation name".
+  *
+  * Note that only collations that work with the current database's encoding
+  * will be considered visible.
   */
  bool
  CollationIsVisible(Oid collid)
*************** CollationIsVisible(Oid collid)
*** 1990,1998 ****
      {
          /*
           * If it is in the path, it might still not be visible; it could be
!          * hidden by another conversion of the same name earlier in the path.
!          * So we must do a slow check to see if this conversion would be found
!          * by CollationGetCollid.
           */
          char       *collname = NameStr(collform->collname);

--- 2032,2041 ----
      {
          /*
           * If it is in the path, it might still not be visible; it could be
!          * hidden by another collation of the same name earlier in the path,
!          * or it might not work with the current DB encoding.  So we must do a
!          * slow check to see if this collation would be found by
!          * CollationGetCollid.
           */
          char       *collname = NameStr(collform->collname);

*************** PopOverrideSearchPath(void)
*** 3442,3447 ****
--- 3485,3493 ----

  /*
   * get_collation_oid - find a collation by possibly qualified name
+  *
+  * Note that this will only find collations that work with the current
+  * database's encoding.
   */
  Oid
  get_collation_oid(List *name, bool missing_ok)
*************** get_collation_oid(List *name, bool missi
*** 3463,3479 ****
          if (missing_ok && !OidIsValid(namespaceId))
              return InvalidOid;

!         /* first try for encoding-specific entry, then any-encoding */
!         colloid = GetSysCacheOid3(COLLNAMEENCNSP,
!                                   PointerGetDatum(collation_name),
!                                   Int32GetDatum(dbencoding),
!                                   ObjectIdGetDatum(namespaceId));
!         if (OidIsValid(colloid))
!             return colloid;
!         colloid = GetSysCacheOid3(COLLNAMEENCNSP,
!                                   PointerGetDatum(collation_name),
!                                   Int32GetDatum(-1),
!                                   ObjectIdGetDatum(namespaceId));
          if (OidIsValid(colloid))
              return colloid;
      }
--- 3509,3515 ----
          if (missing_ok && !OidIsValid(namespaceId))
              return InvalidOid;

!         colloid = lookup_collation(collation_name, namespaceId, dbencoding);
          if (OidIsValid(colloid))
              return colloid;
      }
*************** get_collation_oid(List *name, bool missi
*** 3489,3504 ****
              if (namespaceId == myTempNamespace)
                  continue;        /* do not look in temp namespace */

!             colloid = GetSysCacheOid3(COLLNAMEENCNSP,
!                                       PointerGetDatum(collation_name),
!                                       Int32GetDatum(dbencoding),
!                                       ObjectIdGetDatum(namespaceId));
!             if (OidIsValid(colloid))
!                 return colloid;
!             colloid = GetSysCacheOid3(COLLNAMEENCNSP,
!                                       PointerGetDatum(collation_name),
!                                       Int32GetDatum(-1),
!                                       ObjectIdGetDatum(namespaceId));
              if (OidIsValid(colloid))
                  return colloid;
          }
--- 3525,3531 ----
              if (namespaceId == myTempNamespace)
                  continue;        /* do not look in temp namespace */

!             colloid = lookup_collation(collation_name, namespaceId, dbencoding);
              if (OidIsValid(colloid))
                  return colloid;
          }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers