Hi Thomas,
I found:
https://www.unicode.org/Public/15.1.0/ucd/CompositionExclusions.txt
that might be useful to tackle characters that we are searching for.
Hope this helps.
Cees
On 01/03/2024 02:53, Thomas Munro wrote:
> On Tue, Feb 27, 2024 at 1:33 AM Cees van Zeeland
> <cees.van.zeeland@freedom.nl> wrote:
>> I'm not an expert, but obviously computers make a difference between the two versions of the characters.
>> We are talking about this series:
>> U+1F70 - U+1F7D: ὰ ά ὲ έ ὴ ή ὶ ί ὸ ό ὺ ύ ὼ ώ
>> Is it possible to filter / limit in some way the redirection in the script to this range?
> Right, so to get this in we either need to decide that we're OK with
> adding that many characters, or figure out some systematic way to
> select just the ones we want. One hint that might be helpful if
> someone wants to investigate: I suspect that a lot of those mappings
> might be marked with <font>, which seems to be for code points for
> alternative renderings ("mathematical" bold, italic, fraktur etc), so
> perhaps we could filter them out that way without losing the
> oxia-marked characters if that's the way it has to be.
>
> I think all the relevant part of the character database file is described here:
>
> https://unicode.org/reports/tr44/#Property_Values
>
> The file we're currently using is 15.1:
>
> https://www.unicode.org/Public/15.1.0/ucd/UnicodeData.txt
>
> I registered this thread as https://commitfest.postgresql.org/47/4873/ .