I wrote:
> Marco Atzeri <marco.atzeri@gmail.com> writes:
>> Building on Cygwin latest 10 beta1 or head sourece,
>> make check fails as:
>> ...
>> performing post-bootstrap initialization ... 2017-05-31 23:23:22.214
>> CEST [16860] FATAL: collation "ja_JP" for encoding "EUC_JP" already exists
> Hmph. Could we see the results of "locale -a | grep ja_JP" ?
Despite the lack of followup from the OP, I'm pretty troubled by this
report. It shows that the reimplementation of OS collation data import
as pg_import_system_collations() is a whole lot more fragile than the
original coding. We have never before trusted "locale -a" to not produce
duplicate outputs, not since the very beginning in 414c5a2e. AFAICS,
the current coding has also lost the protections we added very shortly
after that in 853c1750f; and it has also lost the admittedly rather
arbitrary, but at least deterministic, preference order for conflicting
short aliases that was in the original initdb code.
I suppose the idea was to see whether we actually needed those defenses,
but since we have here a failure report after less than a month of beta,
it seems clear to me that we do. I think we need to upgrade
pg_import_system_collations to have all the same logic that was there
before.
Now the hard part of that is that because pg_import_system_collations
isn't using a temporary staging table, but is just inserting directly
into pg_collation, there isn't any way for it to eliminate duplicates
unless it uses if_not_exists behavior all the time. So there seem to
be two ways to proceed:
1. Drop pg_import_system_collations' if_not_exists argument and just
define it as adding any collations not already known in pg_collation.
2. Significantly rewrite it so that it de-dups the collation set by
hand before trying to insert into pg_collation.
#2 seems like a lot more work, but on the other hand, we might need
most of that logic anyway to get back deterministic alias handling.
However, since I cannot see any real-world use case at all for
if_not_exists = false, I figure we might as well do #1 and take
whatever simplification we can get that way.
I'm willing to do the legwork on this, but before I start, does
anyone have any ideas or objections?
regards, tom lane