Re: pg_dump, pg_restore and UTF8: invalid byte sequence - Mailing list pgsql-novice

From
Subject Re: pg_dump, pg_restore and UTF8: invalid byte sequence
Date
Msg-id 054401c6f19b$8e5c8210$6501a8c0@iwing
Whole thread Raw
In response to pg_dump, pg_restore and UTF8: invalid byte sequence  (<me@alternize.com>)
List pgsql-novice
> shouldn't pg_dump encode the utf8 bytesequences?

at least i found out why the invalid unicode sequences appear in the first
place: tsearch2 in 8.1 doesn't properly handle utf8 characters: the
character's 2-byte representation is converted to lowercase byte for byte.
for example: "ä" which is encoded as "ä" is written to the db by tsearch2
as "ã¤" which is an invalid utf8 byte sequence.

striping the ts2 index columb before dumping fixes the encoding problems. i
guess the 8.2 -> 8.1.5 backport should fix it as well, i'll try asap.

> also, regarding pg_restore, its quite troubling it has the same
> parameter-set as pg_dump

never mind this, it is too late in the evening 8-)

- thomas



pgsql-novice by date:

Previous
From:
Date:
Subject: pg_dump, pg_restore and UTF8: invalid byte sequence
Next
From: Yadnyesh Joshi
Date:
Subject: Inserting arrays from C program