I wrote:
> Now the hard part of that is that because pg_import_system_collations
> isn't using a temporary staging table, but is just inserting directly
> into pg_collation, there isn't any way for it to eliminate duplicates
> unless it uses if_not_exists behavior all the time. So there seem to
> be two ways to proceed:
> 1. Drop pg_import_system_collations' if_not_exists argument and just
> define it as adding any collations not already known in pg_collation.
> 2. Significantly rewrite it so that it de-dups the collation set by
> hand before trying to insert into pg_collation.
After further thought, I realized that there's another argument for
making pg_import_system_collations() always do if-not-exists, which
is that as it stands it's inconsistent anyway because it silently
uses if-not-exists behavior for aliases.
So attached is a draft patch that drops if_not_exists and tweaks the
alias insertion code to guarantee deterministic behavior. I also
corrected failure to use CommandCounterIncrement() in the ICU part
of the code, which would cause problems if ICU can ever report
duplicate names; something I don't especially care to assume is
impossible. Also, I fixed things so that pg_import_system_collations()
doesn't emit any chatter about duplicate existing names, because it
didn't take long to realize that that behavior was useless and
irritating. (Perhaps we should change the function to return the
number of entries successfully added? But I didn't do that here.)
I've tested this with a faked version of "locale -a" that generates
duplicated entries, and it does what I think it should.
One question that I've got is why the ICU portion refuses to load
any entries unless is_encoding_supported_by_icu(GetDatabaseEncoding()).
Surely this is completely wrong? I should think that what we load into
pg_collation ought to be independent of template1's encoding, the same
as it is for libc collations, and the right place to be making a test
like that is where somebody attempts to use an ICU collation. But
I've not tried to test it.
Also, I'm a bit tempted to remove setup_collation()'s manual insertion of
'ucs_basic' in favor of adding a DATA() line for that to pg_collation.h.
As things stand, if for some reason "locale -a" were to report a locale
named that, initdb would fail. In the old code, it would not have failed
--- it's uncertain whether you would have gotten the forced UTF8/C
definition or libc's definition, but you would have gotten only one.
However, that approach would result in 'ucs_basic' being treated as
pinned, which perhaps we don't want. If we don't want it to be pinned,
another idea is just to make setup_collation() insert it first not last,
thereby ensuring that the SQL definition wins over any libc entries.
Comments?
regards, tom lane
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e073f7b..bd2eaaa 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*************** postgres=# SELECT * FROM pg_walfile_name
*** 19711,19717 ****
<row>
<entry>
<indexterm><primary>pg_import_system_collations</primary></indexterm>
! <literal><function>pg_import_system_collations(<parameter>if_not_exists</> <type>boolean</>,
<parameter>schema</><type>regnamespace</>)</function></literal>
</entry>
<entry><type>void</type></entry>
<entry>Import operating system collations</entry>
--- 19711,19717 ----
<row>
<entry>
<indexterm><primary>pg_import_system_collations</primary></indexterm>
! <literal><function>pg_import_system_collations(<parameter>schema</>
<type>regnamespace</>)</function></literal>
</entry>
<entry><type>void</type></entry>
<entry>Import operating system collations</entry>
*************** postgres=# SELECT * FROM pg_walfile_name
*** 19730,19747 ****
</para>
<para>
! <function>pg_import_system_collations</> populates the system
! catalog <literal>pg_collation</literal> with collations based on all the
! locales it finds on the operating system. This is
what <command>initdb</command> uses;
see <xref linkend="collation-managing"> for more details. If additional
locales are installed into the operating system later on, this function
! can be run again to add collations for the new locales. In that case, the
! parameter <parameter>if_not_exists</parameter> should be set to true to
! skip over existing collations. The <parameter>schema</parameter>
! parameter would typically be <literal>pg_catalog</literal>, but that is
! not a requirement. (Collation objects based on locales that are no longer
! present on the operating system are never removed by this function.)
</para>
</sect2>
--- 19730,19748 ----
</para>
<para>
! <function>pg_import_system_collations</> adds collations to the system
! catalog <literal>pg_collation</literal> based on all the
! locales it finds in the operating system. This is
what <command>initdb</command> uses;
see <xref linkend="collation-managing"> for more details. If additional
locales are installed into the operating system later on, this function
! can be run again to add collations for the new locales. Locales that
! match existing entries in <literal>pg_collation</literal> will be skipped.
! (But collation objects based on locales that are no longer
! present in the operating system are not removed by this function.)
! The <parameter>schema</parameter> parameter would typically
! be <literal>pg_catalog</literal>, but that is not a requirement;
! the collations could be installed into some other schema as well.
</para>
</sect2>
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index 6b1b72f..bf66bde 100644
*** a/src/backend/catalog/pg_collation.c
--- b/src/backend/catalog/pg_collation.c
***************
*** 37,42 ****
--- 37,47 ----
* CollationCreate
*
* Add a new tuple to pg_collation.
+ *
+ * if_not_exists: if true, don't fail on duplicate name, just print a notice
+ * and return InvalidOid.
+ * quiet: if true, don't fail on duplicate name, just silently return
+ * InvalidOid (overrides if_not_exists).
*/
Oid
CollationCreate(const char *collname, Oid collnamespace,
*************** CollationCreate(const char *collname, Oi
*** 45,51 ****
int32 collencoding,
const char *collcollate, const char *collctype,
const char *collversion,
! bool if_not_exists)
{
Relation rel;
TupleDesc tupDesc;
--- 50,57 ----
int32 collencoding,
const char *collcollate, const char *collctype,
const char *collversion,
! bool if_not_exists,
! bool quiet)
{
Relation rel;
TupleDesc tupDesc;
*************** CollationCreate(const char *collname, Oi
*** 77,83 ****
Int32GetDatum(collencoding),
ObjectIdGetDatum(collnamespace)))
{
! if (if_not_exists)
{
ereport(NOTICE,
(errcode(ERRCODE_DUPLICATE_OBJECT),
--- 83,91 ----
Int32GetDatum(collencoding),
ObjectIdGetDatum(collnamespace)))
{
! if (quiet)
! return InvalidOid;
! else if (if_not_exists)
{
ereport(NOTICE,
(errcode(ERRCODE_DUPLICATE_OBJECT),
*************** CollationCreate(const char *collname, Oi
*** 119,125 ****
Int32GetDatum(-1),
ObjectIdGetDatum(collnamespace))))
{
! if (if_not_exists)
{
heap_close(rel, NoLock);
ereport(NOTICE,
--- 127,135 ----
Int32GetDatum(-1),
ObjectIdGetDatum(collnamespace))))
{
! if (quiet)
! return InvalidOid;
! else if (if_not_exists)
{
heap_close(rel, NoLock);
ereport(NOTICE,
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index bdfd739..ec56320 100644
*** a/src/backend/commands/collationcmds.c
--- b/src/backend/commands/collationcmds.c
***************
*** 37,42 ****
--- 37,50 ----
#include "utils/syscache.h"
+ typedef struct
+ {
+ char *localename; /* name of locale, as per "locale -a" */
+ char *alias; /* shortened alias for same */
+ int enc; /* encoding */
+ } CollAliasData;
+
+
/*
* CREATE COLLATION
*/
*************** DefineCollation(ParseState *pstate, List
*** 196,202 ****
collcollate,
collctype,
collversion,
! if_not_exists);
if (!OidIsValid(newoid))
return InvalidObjectAddress;
--- 204,211 ----
collcollate,
collctype,
collversion,
! if_not_exists,
! false); /* not quiet */
if (!OidIsValid(newoid))
return InvalidObjectAddress;
*************** pg_collation_actual_version(PG_FUNCTION_
*** 344,356 ****
}
/*
* "Normalize" a libc locale name, stripping off encoding tags such as
* ".utf8" (e.g., "en_US.utf8" -> "en_US", but "br_FR.iso885915@euro"
* -> "br_FR@euro"). Return true if a new, different name was
* generated.
*/
- pg_attribute_unused()
static bool
normalize_libc_locale_name(char *new, const char *old)
{
--- 353,370 ----
}
+ /* will we use "locale -a" in pg_import_system_collations? */
+ #if defined(HAVE_LOCALE_T) && !defined(WIN32)
+ #define READ_LOCALE_A_OUTPUT
+ #endif
+
+ #ifdef READ_LOCALE_A_OUTPUT
/*
* "Normalize" a libc locale name, stripping off encoding tags such as
* ".utf8" (e.g., "en_US.utf8" -> "en_US", but "br_FR.iso885915@euro"
* -> "br_FR@euro"). Return true if a new, different name was
* generated.
*/
static bool
normalize_libc_locale_name(char *new, const char *old)
{
*************** normalize_libc_locale_name(char *new, co
*** 379,384 ****
--- 393,412 ----
return changed;
}
+ /*
+ * qsort comparator for CollAliasData items
+ */
+ static int
+ cmpaliases(const void *a, const void *b)
+ {
+ const CollAliasData *ca = (const CollAliasData *) a;
+ const CollAliasData *cb = (const CollAliasData *) b;
+
+ /* comparing localename is enough because other fields are derived */
+ return strcmp(ca->localename, cb->localename);
+ }
+ #endif /* READ_LOCALE_A_OUTPUT */
+
#ifdef USE_ICU
static char *
*************** get_icu_locale_comment(const char *local
*** 420,455 ****
#endif /* USE_ICU */
Datum
pg_import_system_collations(PG_FUNCTION_ARGS)
{
! bool if_not_exists = PG_GETARG_BOOL(0);
! Oid nspid = PG_GETARG_OID(1);
! #if defined(HAVE_LOCALE_T) && !defined(WIN32)
FILE *locale_a_handle;
char localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
int count = 0;
! List *aliaslist = NIL;
! List *localelist = NIL;
! List *enclist = NIL;
! ListCell *lca,
! *lcl,
! *lce;
#endif
if (!superuser())
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
(errmsg("must be superuser to import system collations"))));
! #if !(defined(HAVE_LOCALE_T) && !defined(WIN32)) && !defined(USE_ICU)
! /* silence compiler warnings */
! (void) if_not_exists;
! (void) nspid;
! #endif
- #if defined(HAVE_LOCALE_T) && !defined(WIN32)
locale_a_handle = OpenPipeStream("locale -a", "r");
if (locale_a_handle == NULL)
ereport(ERROR,
--- 448,486 ----
#endif /* USE_ICU */
+ /*
+ * pg_import_system_collations: add known system collations to pg_collation
+ */
Datum
pg_import_system_collations(PG_FUNCTION_ARGS)
{
! Oid nspid = PG_GETARG_OID(0);
! #ifdef READ_LOCALE_A_OUTPUT
FILE *locale_a_handle;
char localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
int count = 0;
! CollAliasData *aliases;
! int naliases,
! maxaliases,
! i;
#endif
+ /* silence compiler warning if we have no locale implementation at all */
+ (void) nspid;
+
if (!superuser())
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
(errmsg("must be superuser to import system collations"))));
! /* Load collations known to libc, using "locale -a" to enumerate them */
! #ifdef READ_LOCALE_A_OUTPUT
! /* expansible array of aliases */
! maxaliases = 100;
! aliases = (CollAliasData *) palloc(maxaliases * sizeof(CollAliasData));
! naliases = 0;
locale_a_handle = OpenPipeStream("locale -a", "r");
if (locale_a_handle == NULL)
ereport(ERROR,
*************** pg_import_system_collations(PG_FUNCTION_
*** 459,465 ****
while (fgets(localebuf, sizeof(localebuf), locale_a_handle))
{
- int i;
size_t len;
int enc;
bool skip;
--- 490,495 ----
*************** pg_import_system_collations(PG_FUNCTION_
*** 507,519 ****
if (enc == PG_SQL_ASCII)
continue; /* C/POSIX are already in the catalog */
count++;
CollationCreate(localebuf, nspid, GetUserId(), COLLPROVIDER_LIBC, enc,
localebuf, localebuf,
get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
! if_not_exists);
CommandCounterIncrement();
/*
--- 537,559 ----
if (enc == PG_SQL_ASCII)
continue; /* C/POSIX are already in the catalog */
+ /* count valid locales found in operating system */
count++;
+ /*
+ * Create a collation named the same as the locale, but quietly doing
+ * nothing if it already exists. This is the behavior we need even at
+ * initdb time, because some implementations of "locale -a" can report
+ * the same locale name more than once. And it's convenient for later
+ * import runs, too, since you just about always want to add on new
+ * locales without a lot of chatter about existing ones.
+ */
CollationCreate(localebuf, nspid, GetUserId(), COLLPROVIDER_LIBC, enc,
localebuf, localebuf,
get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
! true, true);
+ /* Must do CCI between inserts to handle duplicates correctly */
CommandCounterIncrement();
/*
*************** pg_import_system_collations(PG_FUNCTION_
*** 527,559 ****
*/
if (normalize_libc_locale_name(alias, localebuf))
{
! aliaslist = lappend(aliaslist, pstrdup(alias));
! localelist = lappend(localelist, pstrdup(localebuf));
! enclist = lappend_int(enclist, enc);
}
}
ClosePipeStream(locale_a_handle);
! /* Now try to add any aliases we created */
! forthree(lca, aliaslist, lcl, localelist, lce, enclist)
{
! char *alias = (char *) lfirst(lca);
! char *locale = (char *) lfirst(lcl);
! int enc = lfirst_int(lce);
CollationCreate(alias, nspid, GetUserId(), COLLPROVIDER_LIBC, enc,
locale, locale,
get_collation_actual_version(COLLPROVIDER_LIBC, locale),
! true);
CommandCounterIncrement();
}
if (count == 0)
ereport(WARNING,
(errmsg("no usable system locales were found")));
#endif /* not HAVE_LOCALE_T && not WIN32 */
#ifdef USE_ICU
if (!is_encoding_supported_by_icu(GetDatabaseEncoding()))
{
--- 567,622 ----
*/
if (normalize_libc_locale_name(alias, localebuf))
{
! if (naliases >= maxaliases)
! {
! maxaliases *= 2;
! aliases = (CollAliasData *)
! repalloc(aliases, maxaliases * sizeof(CollAliasData));
! }
! aliases[naliases].localename = pstrdup(localebuf);
! aliases[naliases].alias = pstrdup(alias);
! aliases[naliases].enc = enc;
! naliases++;
}
}
ClosePipeStream(locale_a_handle);
! /*
! * Before processing the aliases, sort them by locale name. The point
! * here is that if "locale -a" gives us multiple locale names with the
! * same encoding and base name, say "en_US.utf8" and "en_US.utf-8", we
! * want to pick a deterministic one of them. First in ASCII sort order is
! * a good enough rule. (Before PG 10, the code corresponding to this
! * logic in initdb.c had an additional ordering rule, to prefer the locale
! * name exactly matching the alias, if any. We don't need to consider
! * that here, because we would have already created such a pg_collation
! * entry above, and that one will win.)
! */
! if (naliases > 1)
! qsort((void *) aliases, naliases, sizeof(CollAliasData), cmpaliases);
!
! /* Now add aliases, ignoring any that match pre-existing entries */
! for (i = 0; i < naliases; i++)
{
! char *locale = aliases[i].localename;
! char *alias = aliases[i].alias;
! int enc = aliases[i].enc;
CollationCreate(alias, nspid, GetUserId(), COLLPROVIDER_LIBC, enc,
locale, locale,
get_collation_actual_version(COLLPROVIDER_LIBC, locale),
! true, true);
CommandCounterIncrement();
}
+ /* Give a warning if "locale -a" seems to be malfunctioning */
if (count == 0)
ereport(WARNING,
(errmsg("no usable system locales were found")));
#endif /* not HAVE_LOCALE_T && not WIN32 */
+ /* Load collations known to ICU */
#ifdef USE_ICU
if (!is_encoding_supported_by_icu(GetDatabaseEncoding()))
{
*************** pg_import_system_collations(PG_FUNCTION_
*** 591,597 ****
nspid, GetUserId(), COLLPROVIDER_ICU, -1,
collcollate, collcollate,
get_collation_actual_version(COLLPROVIDER_ICU, collcollate),
! if_not_exists);
CreateComments(collid, CollationRelationId, 0,
get_icu_locale_comment(name));
--- 654,661 ----
nspid, GetUserId(), COLLPROVIDER_ICU, -1,
collcollate, collcollate,
get_collation_actual_version(COLLPROVIDER_ICU, collcollate),
! true, true);
! CommandCounterIncrement();
CreateComments(collid, CollationRelationId, 0,
get_icu_locale_comment(name));
*************** pg_import_system_collations(PG_FUNCTION_
*** 618,624 ****
nspid, GetUserId(), COLLPROVIDER_ICU, -1,
collcollate, collcollate,
get_collation_actual_version(COLLPROVIDER_ICU, collcollate),
! if_not_exists);
CreateComments(collid, CollationRelationId, 0,
get_icu_locale_comment(localeid));
}
--- 682,690 ----
nspid, GetUserId(), COLLPROVIDER_ICU, -1,
collcollate, collcollate,
get_collation_actual_version(COLLPROVIDER_ICU, collcollate),
! true, true);
! CommandCounterIncrement();
!
CreateComments(collid, CollationRelationId, 0,
get_icu_locale_comment(localeid));
}
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 0f22e6d..20c7d07 100644
*** a/src/bin/initdb/initdb.c
--- b/src/bin/initdb/initdb.c
*************** setup_description(FILE *cmdfd)
*** 1627,1633 ****
static void
setup_collation(FILE *cmdfd)
{
! PG_CMD_PUTS("SELECT pg_import_system_collations(if_not_exists => false, schema => 'pg_catalog');\n\n");
/* Add an SQL-standard name */
PG_CMD_PRINTF3("INSERT INTO pg_collation (collname, collnamespace, collowner, collprovider, collencoding,
collcollate,collctype) VALUES ('ucs_basic', 'pg_catalog'::regnamespace, %u, '%c', %d, 'C', 'C');\n\n",
BOOTSTRAP_SUPERUSERID,COLLPROVIDER_LIBC, PG_UTF8);
--- 1627,1633 ----
static void
setup_collation(FILE *cmdfd)
{
! PG_CMD_PUTS("SELECT pg_import_system_collations('pg_catalog');\n\n");
/* Add an SQL-standard name */
PG_CMD_PRINTF3("INSERT INTO pg_collation (collname, collnamespace, collowner, collprovider, collencoding,
collcollate,collctype) VALUES ('ucs_basic', 'pg_catalog'::regnamespace, %u, '%c', %d, 'C', 'C');\n\n",
BOOTSTRAP_SUPERUSERID,COLLPROVIDER_LIBC, PG_UTF8);
diff --git a/src/include/catalog/pg_collation_fn.h b/src/include/catalog/pg_collation_fn.h
index 26b8f0d..0ef3138 100644
*** a/src/include/catalog/pg_collation_fn.h
--- b/src/include/catalog/pg_collation_fn.h
*************** extern Oid CollationCreate(const char *c
*** 20,26 ****
int32 collencoding,
const char *collcollate, const char *collctype,
const char *collversion,
! bool if_not_exists);
extern void RemoveCollationById(Oid collationOid);
#endif /* PG_COLLATION_FN_H */
--- 20,27 ----
int32 collencoding,
const char *collcollate, const char *collctype,
const char *collversion,
! bool if_not_exists,
! bool quiet);
extern void RemoveCollationById(Oid collationOid);
#endif /* PG_COLLATION_FN_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 1191b4a..282a2ac 100644
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DESCR("pg_controldata recovery state inf
*** 5462,5472 ****
DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 2249 ""
"{23,23,23,23,23,23,23,23,23,16,16,23}""{o,o,o,o,o,o,o,o,o,o,o,o}"
"{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}"
_null__null_ pg_control_init _null_ _null_ _null_ ));
DESCR("pg_controldata init state information as a function");
! DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f f t f v r 2 0 2278 "16 4089"
_null__null_ "{if_not_exists,schema}" _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
DESCR("import collations from operating system");
DATA(insert OID = 3448 ( pg_collation_actual_version PGNSP PGUID 12 100 0 0 0 f f f f t f v s 1 0 25 "26" _null_
_null__null_ _null_ _null_ pg_collation_actual_version _null_ _null_ _null_ ));
! DESCR("import collations from operating system");
/* system management/monitoring related functions */
DATA(insert OID = 3353 ( pg_ls_logdir PGNSP PGUID 12 10 20 0 0 f f f f t t v s 0 0 2249 ""
"{25,20,1184}""{o,o,o}" "{name,size,modification}" _null_ _null_ pg_ls_logdir _null_ _null_ _null_ ));
--- 5462,5473 ----
DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 2249 ""
"{23,23,23,23,23,23,23,23,23,16,16,23}""{o,o,o,o,o,o,o,o,o,o,o,o}"
"{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}"
_null__null_ pg_control_init _null_ _null_ _null_ ));
DESCR("pg_controldata init state information as a function");
! /* collation management functions */
! DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f f t f v r 1 0 2278 "4089" _null_
_null__null_ _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
DESCR("import collations from operating system");
DATA(insert OID = 3448 ( pg_collation_actual_version PGNSP PGUID 12 100 0 0 0 f f f f t f v s 1 0 25 "26" _null_
_null__null_ _null_ _null_ pg_collation_actual_version _null_ _null_ _null_ ));
! DESCR("get actual version of collation from operating system");
/* system management/monitoring related functions */
DATA(insert OID = 3353 ( pg_ls_logdir PGNSP PGUID 12 10 20 0 0 f f f f t t v s 0 0 2249 ""
"{25,20,1184}""{o,o,o}" "{name,size,modification}" _null_ _null_ pg_ls_logdir _null_ _null_ _null_ ));
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers