Re: Unaccent extension python script Issue in Windows - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Unaccent extension python script Issue in Windows
Date
Msg-id 20190318.152755.73288474.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Unaccent extension python script Issue in Windows  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
Hello.

At Mon, 18 Mar 2019 14:13:34 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20190318.141334.186469242.horiguchi.kyotaro@lab.ntt.co.jp>
> Hello.
> 
> At Sun, 17 Mar 2019 20:23:05 -0400, Hugh Ranalli <hugh@whtc.ca> wrote in
<CAAhbUMNoBLu7jAbyK5MK0LXEyt03PzNQt_Apkg0z9bsAjcLV4g@mail.gmail.com>
> > Hi Ram,
> > Thanks for doing this; I've been overestimating my ability to get to things
> > over the last couple of weeks.
> > 
> > I've looked at the patch and have made one minor change. I had moved all
> > the imports up to the top, to keep them in one place (and I think some had
> > originally been used only by the Python 2 code. You added them there, but
> > didn't remove them from their original positions. So I've incorporated that
> > into your patch, attached as v2. I've tested this under Python 2 and 3 on
> > Linux, not Windows.
> 
> Though I'm not sure the necessity of running the script on
> Windows, the problem is not specific for Windows, but general one
> that haven't accidentially found on non-Windows environment.
> 
> On CentOS7:
> > export LANG="ja_JP.EUCJP"
> > python <..snipped..>
> ..
> > UnicodeEncodeError: 'euc_jp' codec can't encode character '\xab' in position 0: illegal multibyte sequence
> 
> So this is not an issue with Windows but with python3.
> 
> The script generates identical files with the both versions of
> python with the pach on Linux and Windows 7. Python3 on Windows
> emits CRLF as a new line but it doesn't seem to harm. (I didn't
> confirmed that due to extreme slowness of build from uncertain
> reasons now..)

I confirmed that CRLF actually doesn't harm and unaccent works
correctly. (t_isspace() excludes them as white space).

> This patch contains irrelevant changes. The minimal required
> change would be the attached. If you want refacotor the
> UnicodeData reader or rearrange import sutff, it should be
> separate patches.
> 
> It would be better use IOBase for Python3 especially for stdout
> replacement but I didin't since it *is* working.
> 
> > Everything else looks correct. I apologise for not having replied to your
> > question in the original bug report. I had intended to, but as I said,
> > there's been an increase in the things I need to juggle at the moment.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Google Summer of Code
Next
From: Stephen Frost
Date:
Subject: Re: Data-only pg_rewind, take 2