Re: Mac OS: invalid byte sequence for encoding "UTF8" - Mailing list pgsql-hackers

From Shulgin, Oleksandr
Subject Re: Mac OS: invalid byte sequence for encoding "UTF8"
Date
Msg-id CACACo5TvwOJ_7xbsyf8MPF2kkTfQ6knXerRcJN3DYKpEruX9Vw@mail.gmail.com
Whole thread Raw
In response to Mac OS: invalid byte sequence for encoding "UTF8"  (Artur Zakirov <a.zakirov@postgrespro.ru>)
Responses Re: Mac OS: invalid byte sequence for encoding "UTF8"  (Artur Zakirov <a.zakirov@postgrespro.ru>)
List pgsql-hackers
On Wed, Jan 27, 2016 at 10:59 AM, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:
Hello.

When a user try to create a text search dictionary for the russian language on Mac OS then called the following error message:

  CREATE EXTENSION hunspell_ru_ru;
+ ERROR:  invalid byte sequence for encoding "UTF8": 0xd1
+ CONTEXT:  line 341 of configuration file "/Users/stas/code/postgrespro2/tmp_install/Users/stas/code/postgrespro2/install/share/tsearch_data/ru_ru.affix": "SFX Y   хаться шутся        хаться

Russian dictionary was downloaded from http://extensions.openoffice.org/en/project/slovari-dlya-russkogo-yazyka-dictionaries-russian
Affix and dictionary files was extracted from the archive and converted to UTF-8. Also a converted dictionary can be downloaded from https://github.com/select-artur/hunspell_dicts/tree/master/ru_ru

Not sure why the file uses "SET KOI8-R" directive then? 

This behavior occurs on:
- Mac OS X 10.10 Yosemite and Mac OS X 10.11 El Capitan.
- latest PostgreSQL version from git and PostgreSQL 9.5 (probably also on 9.4.5).

There is also the test to reproduce this bug in the attachment.

What error message do you get with this test program?  (I don't get any, but I'm not on Mac OS.)
 
--
Alex


pgsql-hackers by date:

Previous
From: "Shulgin, Oleksandr"
Date:
Subject: Trivial doc fix in logicaldecoding.sgml
Next
From: Alvaro Herrera
Date:
Subject: Re: pgbench stats per script & other stuff