Re: BUG #15548: Unaccent does not remove combining diacritical characters - Mailing list pgsql-bugs

From Hugh Ranalli
Subject Re: BUG #15548: Unaccent does not remove combining diacritical characters
Date
Msg-id CAAhbUMOX4QLj6c0O3GnjZYtR2dpAowss832Bq1n7oJyByeR7kQ@mail.gmail.com
Whole thread Raw
In response to Re: BUG #15548: Unaccent does not remove combining diacritical characters  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: BUG #15548: Unaccent does not remove combining diacritical characters
List pgsql-bugs

On Sat, 15 Dec 2018 at 21:26, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
+1 for updating to the latest file from time to time.  After
http://unicode.org/cldr/trac/ticket/11383 makes it into a new release,
our special_cases() function will have just the two Cyrillic
characters, which should almost certainly be handled by adding
Cyrillic to the ranges we handle via the usual code path, and DEGREE
CELSIUS and DEGREE FAHRENHEIT.  Those degree signs could possibly be
extracted from Unicode.txt (or we could just forget about them), and
then we could drop special_cases().
Well, when I modified the code to handle the new version of the transliteration file, I discovered that was sufficient to handle the old version as well. That's not the way things usually go, but I'll take it. ;-)

I've attached two patches, one to update generate_unaccent_rules.py, and another that updates unaccent.rules from the v34 transliteration file. I'll be happy to add these to the CF. Does anyone need to review them and give me approval before I do so?

Best wishes,
Hugh 

pgsql-bugs by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: BUG #15556: Duplicate key violations even when using ON CONFLICTDO UPDATE
Next
From: Tom Lane
Date:
Subject: Re: BUG #15555: Syntax errors when using the COMMENT command in plpgsql and a "comment" variable