Home > mailing lists

Re: BUG #15548: Unaccent does not remove combining diacritical characters - Mailing list pgsql-hackers

From	Hugh Ranalli
Subject	Re: BUG #15548: Unaccent does not remove combining diacritical characters
Date	February 11, 2019 22:20:42
Msg-id	CAAhbUMODj1cCHjCpZ-=kxJxnVWyTsqu6ZnWe8+gCsb5SGnv=zA@mail.gmail.com Whole thread Raw
In response to	Re: BUG #15548: Unaccent does not remove combining diacritical characters (raam narayana <raam.soft@gmail.com>)
Responses	Re: BUG #15548: Unaccent does not remove combining diacritical characters (Ramanarayana <raam.soft@gmail.com>)
List	pgsql-hackers

Tree view

On Sun, 10 Feb 2019 at 15:07, raam narayana <raam.soft@gmail.com> wrote:

Hi,

After the latest commit in master branch, I was trying to test the python script. Ironically I still see that the output from the script is completely different from the unaccent.rules file content. Am I missing anything.My testing includes the following

Downloaded the following files

http://unicode.org/Public/8.0.0/ucd/UnicodeData.txt

http://unicode.org/cldr/trac/export/14746/tags/release-34/common/transforms/Latin-ASCII.xml

Executed the below python script

python generate_unaccent_rules.py --unicode-data-file UnicodeData.txt --latin-ascii-file Latin-ASCII.xml > unaccent.rules

I am using python 3.7.1 and running on Windows 10 Platform

The new status of this patch is: Needs review

Hi Raam,

I just ran generate_unaccent_rules.py under two environments, using the data files given above :

- Python 3.4.3 on Linux Mint 17.3 (equivalent to Ubuntu 14.04)

- Python 3.6.7 on Ubuntu 18.04

In both cases, the output was identical to that generated by the program under Python 2.7. So yes, more information would help. Unfortunately I don't have a Windows Python environment readily available, but could set one up if I had to.

Thanks,

Hugh

pgsql-hackers by date:

From: Alvaro Herrera
Date: 11 February 2019, 21:58:01
Subject: Re: PG_RE_THROW is mandatory (was Re: jsonpath)

From: Tom Lane
Date: 11 February 2019, 22:46:43
Subject: Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

Re: BUG #15548: Unaccent does not remove combining diacritical characters - Mailing list pgsql-hackers

Previous

Next