Re: [PATCH] Completed unaccent dictionary with many missing characters - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [PATCH] Completed unaccent dictionary with many missing characters
Date
Msg-id YrPufpsLPpnr8YY5@paquier.xyz
Whole thread Raw
In response to Re: [PATCH] Completed unaccent dictionary with many missing characters  (Przemysław Sztoch <przemyslaw@sztoch.pl>)
Responses Re: [PATCH] Completed unaccent dictionary with many missing characters
List pgsql-hackers
On Tue, Jun 21, 2022 at 03:41:48PM +0200, Przemysław Sztoch wrote:
> Thomas Munro wrote on 21.06.2022 02:53:
>> Oh, we're using CLDR 41, which reminds me: CLDR 36 added SOUND
>> RECORDING COPYRIGHT[1] so we could drop it from special_cases().

Indeed.

>> Hmm, is it possible to get rid of CYRILLIC CAPITAL LETTER IO and
>> CYRILLIC SMALL LETTER IO by adding Cyrillic to PLAIN_LETTER_RANGES?

That's a good point.  There are quite a bit of cyrillic characters
missing a conversion, visibly.

>> That'd leave just DEGREE CELSIUS and DEGREE FAHRENHEIT.  Not sure how
>> to kill those last two special cases -- they should be directly
>> replaced by their decomposition.
>>
>> [1] https://unicode-org.atlassian.net/browse/CLDR-11383
>
> I patch v3 support for cirilic is added.
> Special character function has been purged.
> Added support for category: So - Other Symbol. This category include
> characters from special_cases().

I think that we'd better split v3 into more patches to keep each
improvement isolated.  The addition of cyrillic characters in the
range of letters and the removal of the sound copyright from the
special cases can be done on their own, before considering the
original case tackled by this thread.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Add header support to text format and matching feature
Next
From: Julien Rouhaud
Date:
Subject: Re: Add header support to text format and matching feature