I tested the script in python 2.7 and it works perfect. The problem is in python 3.7(and may be only in windows as you were not getting the issue) and I was getting the following error
UnicodeEncodeError: 'charmap' codec can't encode character '\u0100' in position 0: character maps to <undefined>
I went through the python script and found that the stdout encoding is set to utf-8 only if python version is <=2.
I have made the same change for python version 3 as well. Please find the patch for the same.Let me know if it makes sense
Regards,
Ram.
On Tue, 12 Feb 2019 at 00:50, Hugh Ranalli <hugh@whtc.ca> wrote:
After the latest commit in master branch, I was trying to test the python script. Ironically I still see that the output from the script is completely different from the unaccent.rules file content. Am I missing anything.My testing includes the following
I am using python 3.7.1 and running on Windows 10 Platform
The new status of this patch is: Needs review
Hi Raam,
I just ran generate_unaccent_rules.py under two environments, using the data files given above :
- Python 3.4.3 on Linux Mint 17.3 (equivalent to Ubuntu 14.04)
- Python 3.6.7 on Ubuntu 18.04
In both cases, the output was identical to that generated by the program under Python 2.7. So yes, more information would help. Unfortunately I don't have a Windows Python environment readily available, but could set one up if I had to.