On Fri, Sep 19, 2014 at 03:15:53PM -0700, Alon wrote:
> The pg_dump file contains this command:
> CREATE DATABASE workgroup WITH TEMPLATE = template0 ENCODING = 'UTF8'
> LC_COLLATE = 'Norwegian (Bokmål)_Norway.1252' LC_CTYPE = 'Norwegian
> (Bokmål)_Norway.1252';
>
> The UTF16 encoding for ål) [a-ring l parenthesis] is
> 00e5 006c 0029
>
> In UTF8 this set of characters encoded as:
> c3 a5 6c 29
>
> The a-ring is converted to two bytes while the others are one.
>
> Based on the ERROR:
> invalid byte sequence for encoding "UTF8": 0xe5 0x6c 0x29
>
> It appears the set of characters is getting passed as:
> e5 6c 29
>
> In UTF8, e5 is always the start of a three byte character,possibly
> pg_restore, ceratedb or else, tries to read these bytes as a single
> character.
> However, 6c and 29 can only be single byte characters, they can't be the
> next two bytes in a three byte character. Hence the failure.
> Seems like in the code, the 00xe5 is converted to e5 instead of 'c3 a5' when
> passing the LC_COLLATE and LC_CTYPE values.
In WIN1252, "e5 6c 29" is "ål)". We're likely failing to set client_encoding
at some essential point in the process.