Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE andwork_mem values - Mailing list pgsql-bugs

From Peter Geoghegan
Subject Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE andwork_mem values
Date
Msg-id CAH2-WzmeMSgjO-TocH7+v7vuwzv8C77eqGcTCTGSRattOK2ZLw@mail.gmail.com
Whole thread Raw
In response to Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE andwork_mem values  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
List pgsql-bugs
On Mon, Aug 14, 2017 at 4:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
>> Do we really need to pass "if_not_exists = true", anyway? Why
>> shouldn't initdb fail if there are apparent duplicate ICU collations?
>
> Because the villagers will come after you; see commit
> 0b13b2a7712b6f91590b7589a314240a14520c2f.

Fair enough. Changing it is actually unnecessary anyway.
pg_import_system_collations() should simply use uloc_countAvailable()
+ uloc_getAvailable, rather than ucol_countAvailable() +
ucol_getAvailable(). Just like my test program.

FYI, without the keyword variants, but with all base locales, ICU 55
adds 684 base locales to pg_collation. Should I write a real patch to
make the changes I deem necessary, or do you prefer to Peter E.?

The comments that pg_import_system_collations() adds to each locale
have an annoying tendency to exclude anything that isn't ascii-safe,
which is much more noticeable when locales populate pg_collation
instead of collations:

...foo    │ fr-BE-x-icu            │ fr-BE            │ fr-BE
│ icu      │ French (Belgium)foo    │ fr-BF-x-icu            │ fr-BF            │ fr-BF
│ icu      │ French (Burkina Faso)foo    │ fr-BI-x-icu            │ fr-BI            │ fr-BI
│ icu      │ French (Burundi)foo    │ fr-BJ-x-icu            │ fr-BJ            │ fr-BJ
│ icu      │ French (Benin)foo    │ fr-BL-x-icu            │ fr-BL            │ fr-BL
│ icu      │foo    │ fr-CA-x-icu            │ fr-CA            │ fr-CA
│ icu      │ French (Canada)foo    │ fr-CD-x-icu            │ fr-CD            │ fr-CD
│ icu      │ French (Congo - Kinshasa)foo    │ fr-CF-x-icu            │ fr-CF            │ fr-CF
│ icu      │ French (Central African Republic)foo    │ fr-CG-x-icu            │ fr-CG            │ fr-CG
│ icu      │ French (Congo - Brazzaville)foo    │ fr-CH-x-icu            │ fr-CH            │ fr-CH
│ icu      │ French (Switzerland)foo    │ fr-CI-x-icu            │ fr-CI            │ fr-CI
│ icu      │foo    │ fr-CM-x-icu            │ fr-CM            │ fr-CM
│ icu      │ French (Cameroon)foo    │ fr-DJ-x-icu            │ fr-DJ            │ fr-DJ
│ icu      │ French (Djibouti)
...

Notice the lack of descriptions here, just because they weren't
ascii-safe. I now wonder if we should even keep the comment thing at
all. We can eventually add an SQL function, that is called by \dOS+,
that interrogates ICU for a description on the fly. No need to store
anything.

Another thing that I dislike about get_icu_locale_comment() is that
the descriptions we show are always English display names, regardless
of pg_database collation or anything else. We shouldn't hard-code the
use of English for display name, on general principle.

> It's possible that there's an argument for not de-duplicating ICU
> collation names even though we must do so for libc collation names.
> But I'm not exactly sure what that would be.  Having initdb fail if
> the ICU configuration is funny isn't going to endear us to anyone.

BTW, get_icu_language_tag() has a buglet. ULOC_FULLNAME_CAPACITY does
not bound the size of a language tag; those can be almost arbitrarily
long, unlike locale identifiers. That should probably be fixed, if
only to avoid setting a bad example.

--
Peter Geoghegan


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

pgsql-bugs by date:

Previous
From: Noah Misch
Date:
Subject: Re: [BUGS] [postgresql 10 beta3] unrecognized node type: 90
Next
From: Tom Lane
Date:
Subject: Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values