Re: BUG #13440: unaccent does not remove all diacritics - Mailing list pgsql-bugs

From Léonard Benedetti
Subject Re: BUG #13440: unaccent does not remove all diacritics
Date
Msg-id 56E84A19.4010901@mlpo.fr
Whole thread Raw
In response to Re: BUG #13440: unaccent does not remove all diacritics  (Teodor Sigaev <teodor@sigaev.ru>)
Responses Re: BUG #13440: unaccent does not remove all diacritics
List pgsql-bugs
15/03/2016 18:01, Teodor Sigaev wrote:
>> So I think we can keep just a version for Python 2 for now. If everyone
>> agrees, I'll update the files and patch.
>
> Attached patch is a my try to make script works for both 2 & 3
> versions of Python. At least it produces the same result for 2.7 and
> 3.4. Pls, could you check? I'm not a Python developer at all.
>
> BTW, I revomed unicode characted from code and leaved it only in
> comments.
>
Unfortunately, this script is not functional: the characters managed by
“parse_cldr_latin_ascii_transliterator” are absent from output. It is
probably a compatibility problem with the regex (the two versions of the
language are not compatible, it is not always possible to write a code
that works with both).

After the various feedbacks, and since: the PostgreSQL source uses only
Python 2, the end of support for this version will not happen soon, and
mostly this script must be run very rarely (only when the Unicode
Standard is updated, or transliterator, it is not part of the build
process), the easiest way seems to be to have a single Python 2 script.

So, you will find attached a new patch, it’s the same script, compatible
with Python 2, *with only ASCII characters*.

Regards.

Léonard Benedetti

Attachment

pgsql-bugs by date:

Previous
From: Teodor Sigaev
Date:
Subject: Re: BUG #13440: unaccent does not remove all diacritics
Next
From: Robins Tharakan
Date:
Subject: pgbench -C -M prepared gives an error