Home > mailing lists

Re: BUG #18362: unaccent rules and Old Greek text - Mailing list pgsql-bugs

From	Thomas Munro
Subject	Re: BUG #18362: unaccent rules and Old Greek text
Date	March 1 04:53:06
Msg-id	CA+hUKGK7OKZcCpvD92RyJtu6m_b6XRuZRNqSu_5Y3vHDn7KDpA@mail.gmail.com Whole thread Raw
In response to	Re: BUG #18362: unaccent rules and Old Greek text (Cees van Zeeland <cees.van.zeeland@freedom.nl>)
Responses	Re: BUG #18362: unaccent rules and Old Greek text (Cees van Zeeland <cees.van.zeeland@freedom.nl>) Re: BUG #18362: unaccent rules and Old Greek text (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-bugs

Tree view

On Tue, Feb 27, 2024 at 1:33 AM Cees van Zeeland
<cees.van.zeeland@freedom.nl> wrote:
> I'm not an expert, but obviously computers make a difference between the two versions of the characters.
> We are talking about this series:
> U+1F70 - U+1F7D:    ὰ     ά     ὲ     έ     ὴ     ή     ὶ     ί     ὸ     ό     ὺ     ύ     ὼ     ώ
> Is it possible to filter / limit in some way the redirection in the script to this range?

Right, so to get this in we either need to decide that we're OK with
adding that many characters, or figure out some systematic way to
select just the ones we want.  One hint that might be helpful if
someone wants to investigate: I suspect that a lot of those mappings
might be marked with <font>, which seems to be for code points for
alternative renderings ("mathematical" bold, italic, fraktur etc), so
perhaps we could filter them out that way without losing the
oxia-marked characters if that's the way it has to be.

I think all the relevant part of the character database file is described here:

https://unicode.org/reports/tr44/#Property_Values

The file we're currently using is 15.1:

https://www.unicode.org/Public/15.1.0/ucd/UnicodeData.txt

I registered this thread as https://commitfest.postgresql.org/47/4873/ .

pgsql-bugs by date:

From: Tom Lane
Date: 01 March, 01:31:02
Subject: Re: Record returning function accept not matched columns declaration

From: "Dian Fay"
Date: 01 March, 05:14:10
Subject: Re: `order by random()` makes select-list `random()` invocations deterministic

Re: BUG #18362: unaccent rules and Old Greek text - Mailing list pgsql-bugs

Previous

Next