Re: updating unaccent.rules for Arabic letters - Mailing list pgsql-hackers

From Tom Lane
Subject Re: updating unaccent.rules for Arabic letters
Date
Msg-id 5527.1572797535@sss.pgh.pa.us
Whole thread Raw
In response to updating unaccent.rules for Arabic letters  (kerbrose khaled <kerbrose@hotmail.com>)
List pgsql-hackers
kerbrose khaled <kerbrose@hotmail.com> writes:
> I would like to update unaccent.rules file to support Arabic letters. so could someone help me or tell me how could I
addsuch contribution. I attached the file including the modifications, only the last 4 lines. 

Hi!  I've got no objection to including Arabic in the set of covered
languages, but handing us a new unaccent.rules file isn't the way to
do it, because that's a generated file.  The adjacent script
generate_unaccent_rules.py generates it from the official Unicode
source data (see comments in that script).  What we need, ultimately,
is a patch to that script so it will emit these additional translations.
Past commits that might be useful sources of inspiration include

https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=456e3718e7b72efe4d2639437fcbca2e4ad83099
https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=5e8d670c313531c0dca245943fb84c94a477ddc4
https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=ec0a69e49bf41a37b5c2d6f6be66d8abae00ee05

If you're not good with Python, maybe you could just explain to us
how to recognize these characters from Unicode character properties.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH] contrib/seg: Fix PG_GETARG_SEG_P definition
Next
From: Pavel Stehule
Date:
Subject: Re: [HACKERS] proposal: schema variables