Home > mailing lists

Re: Mac OS: invalid byte sequence for encoding "UTF8" - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Mac OS: invalid byte sequence for encoding "UTF8"
Date	February 10, 2016 16:58:09
Msg-id	28139.1455123480@sss.pgh.pa.us Whole thread Raw
In response to	Re: Mac OS: invalid byte sequence for encoding "UTF8" (Artur Zakirov <a.zakirov@postgrespro.ru>)
Responses	Re: Mac OS: invalid byte sequence for encoding "UTF8"
List	pgsql-hackers

Tree view

Artur Zakirov <a.zakirov@postgrespro.ru> writes:
> I agree that previous patch is wrong. Instead of using new 
> parse_ooaffentry() function maybe better to use sscanf() with %ls 
> format. The %ls format is used to read a wide character string.

No, that way is going to give you worse portability problems than what
we have now.  Older implementations won't have %ls, and even if they
do, they might not have wcstombs() which is the only way you'd get from
libc's idea of wide characters to an encoding we recognize.

> I think this is not a bug. It is a normal behavior. In Mac OS sscanf() 
> with the %s format reads the string one character at a time. The size of 
> letter 'х' is 2. And sscanf() separate it into two wrong characters.

That argument might be convincing if OSX behaved that way for all
multibyte characters, but it doesn't seem to be doing that.  Why is
only 'х' affected?
        regards, tom lane

pgsql-hackers by date:

From: Teodor Sigaev
Date: 10 February 2016, 16:46:46
Subject: Re: [PROPOSAL] Improvements of Hunspell dictionaries support

From: Andrew Dunstan
Date: 10 February 2016, 17:03:49
Subject: Re: Tracing down buildfarm "postmaster does not shut down" failures

Re: Mac OS: invalid byte sequence for encoding "UTF8" - Mailing list pgsql-hackers

Previous

Next