Re: ICU locale validation / canonicalization - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: ICU locale validation / canonicalization
Date
Msg-id b29cdc9b-1176-4456-e909-8fd64af03af8@enterprisedb.com
Whole thread Raw
In response to Re: ICU locale validation / canonicalization  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: ICU locale validation / canonicalization  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On 17.03.23 18:55, Jeff Davis wrote:
> On Wed, 2023-03-15 at 15:18 -0700, Jeff Davis wrote:
>> I left out the validation patch for now, and I'm evaluating a
>> different
>> approach that will attempt to match to the locales retrieved with
>> uloc_countAvailable()/uloc_getAvailable().
> 
> I like this approach, attached new patch series with that included as
> 0006.

I have looked at the first three patches.  I think we want what those 
patches do.


[PATCH v6 1/6] Support language tags in older ICU versions (53 and
  earlier).

In pg_import_system_collations(), this is now redundant and can be 
simplified:

-        if (!pg_is_ascii(langtag) || !pg_is_ascii(iculocstr))
+        if (!pg_is_ascii(langtag) || !pg_is_ascii(langtag))

icu_set_collation_attributes() needs more commenting about what is going 
on.  My guess is that uloc_canonicalize() converts from language tag to 
ICU locale ID, and then the existing logic to parse that apart would 
apply.  Is that how it works?


[PATCH v6 2/6] Wrap ICU ucol_open().

It makes sense to try to unify some of this.  But I find the naming 
confusing.  If I see pg_ucol_open(), then I would expect that all calls 
to ucol_open() would be replaced by this.  But here it's only a few, 
without explanation.  (pg_ucol_open() has no explanation at all AFAICT.)

I have in my notes that check_icu_locale() and make_icu_collator() 
should be combined into a single function.  I think that would be a 
better way to slice it.

Btw., I had intentionally not written code like this

+#if U_ICU_VERSION_MAJOR_NUM < 54
+    icu_set_collation_attributes(collator, loc_str);
+#endif

The disadvantage of doing it that way is that you then need to dig out 
an old version of ICU in order to check whether the code compiles at 
all.  With the current code, you can be sure that that code compiles if 
you make changes elsewhere.


[PATCH v6 3/6] Handle the "und" locale in ICU versions 54 and older.

This makes sense, but the same comment about not #if'ing out code for 
old ICU versions applies here.

The

+#ifdef USE_ICU
+

before pg_ucol_open() probably belongs in patch 2.



pgsql-hackers by date:

Previous
From: Melih Mutlu
Date:
Subject: Re: Allow logical replication to copy tables in binary format
Next
From: Etsuro Fujita
Date:
Subject: Comment in preptlist.c