Re: Mac OS: invalid byte sequence for encoding "UTF8" - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Mac OS: invalid byte sequence for encoding "UTF8"
Date
Msg-id 28139.1455123480@sss.pgh.pa.us
Whole thread Raw
In response to Re: Mac OS: invalid byte sequence for encoding "UTF8"  (Artur Zakirov <a.zakirov@postgrespro.ru>)
Responses Re: Mac OS: invalid byte sequence for encoding "UTF8"
List pgsql-hackers
Artur Zakirov <a.zakirov@postgrespro.ru> writes:
> I agree that previous patch is wrong. Instead of using new 
> parse_ooaffentry() function maybe better to use sscanf() with %ls 
> format. The %ls format is used to read a wide character string.

No, that way is going to give you worse portability problems than what
we have now.  Older implementations won't have %ls, and even if they
do, they might not have wcstombs() which is the only way you'd get from
libc's idea of wide characters to an encoding we recognize.

> I think this is not a bug. It is a normal behavior. In Mac OS sscanf() 
> with the %s format reads the string one character at a time. The size of 
> letter 'х' is 2. And sscanf() separate it into two wrong characters.

That argument might be convincing if OSX behaved that way for all
multibyte characters, but it doesn't seem to be doing that.  Why is
only 'х' affected?
        regards, tom lane



pgsql-hackers by date:

Previous
From: Teodor Sigaev
Date:
Subject: Re: [PROPOSAL] Improvements of Hunspell dictionaries support
Next
From: Andrew Dunstan
Date:
Subject: Re: Tracing down buildfarm "postmaster does not shut down" failures