Re: invalid byte sequence - Mailing list pgsql-general

From Craig Ringer
Subject Re: invalid byte sequence
Date
Msg-id 4D70B85E.3020005@postnewspapers.com.au
Whole thread Raw
In response to invalid byte sequence  (Maximilian Tyrtania <lists@contactking.de>)
Responses Re: invalid byte sequence  (Maximilian Tyrtania <lists@contactking.de>)
List pgsql-general
On 04/03/11 00:02, Maximilian Tyrtania wrote:
> After upgrading to pg 9.0.3 (from 8.4.2) on my Mac OS 10.6.2 machine i find this in my log file (a lot):
>
> <postgres%192.168.254.210%2011-03-03 16:37:30 CET%22021>STATEMENT:  SELECT
pg_file_read('pg_log/postgresql-2011-03-03_000000.log',250000, $ 
> <postgres%192.168.254.210%2011-03-03 16:37:32 CET%22021>ERROR:  invalid byte sequence for encoding "UTF8": 0xe3bc74

The "0xe3bc74" looks like gibberish in any encoding I can think of.
What's the input file? Is it sanely encoded? Do you know what encoding
it is in?

If you really want to be encoding-agnostic and you do not care if you
get garbage data in your database that makes no sense and can never make
any sense, then you must ensure that your database is in the "C" locale
for LC_CTYPE and LC_COLLATE, and you must SET client_encoding =
"SQL_ASCII" when reading the data.

A suitable CREATE DATABASE command might be:

CREATE DATABASE garbage
  TEMPLATE template0
  ENCODING 'SQL_ASCII' LC_COLLATE 'C' LC_CTYPE 'C';

but I really don't think that's generally a good idea. Storing random
crap in text fields will cause you pain later. Better to either convert
the text to a sane encoding, store it as bytea if you want the raw
bytes, or reject it.

--
Craig Ringer

pgsql-general by date:

Previous
From: John R Pierce
Date:
Subject: Re: Pgdump error "invalid page header in block"
Next
From: zab08
Date:
Subject: full text search