Re: invalid byte sequence - Mailing list pgsql-general

From Maximilian Tyrtania
Subject Re: invalid byte sequence
Date
Msg-id 39AF8376-3E92-4E65-8686-DFE8D0E706B4@contactking.de
Whole thread Raw
In response to Re: invalid byte sequence  (Craig Ringer <craig@postnewspapers.com.au>)
Responses Re: invalid byte sequence  (Craig Ringer <craig@postnewspapers.com.au>)
List pgsql-general
Am 04.03.2011 um 11:01 schrieb Craig Ringer:

> On 04/03/11 00:02, Maximilian Tyrtania wrote:
>> After upgrading to pg 9.0.3 (from 8.4.2) on my Mac OS 10.6.2 machine i find this in my log file (a lot):
>>
>> <postgres%192.168.254.210%2011-03-03 16:37:30 CET%22021>STATEMENT:  SELECT
pg_file_read('pg_log/postgresql-2011-03-03_000000.log',250000, $ 
>> <postgres%192.168.254.210%2011-03-03 16:37:32 CET%22021>ERROR:  invalid byte sequence for encoding "UTF8": 0xe3bc74
>
> The "0xe3bc74" looks like gibberish in any encoding I can think of.
> What's the input file?

We are talking about pg's own logfile here. I thought that was clear. Look at the file's name. Apparently some guy on
thefrench pgAdmin list has the very same problem. I have no idea how "0xe3bc74" made it into the log file. 

> Is it sanely encoded? Do you know what encoding
> it is in?

As i said, i initially set lc_messages to 'de_DE-UTF8', so i assume that's what the log file was in. I changed it to
'c'now. 

> If you really want to be encoding-agnostic and you do not care if you
> get garbage data in your database that makes no sense and can never make
> any sense, then you must ensure that your database is in the "C" locale
> for LC_CTYPE and LC_COLLATE, and you must SET client_encoding =
> "SQL_ASCII" when reading the data.
>
> A suitable CREATE DATABASE command might be:
>
> CREATE DATABASE garbage
>  TEMPLATE template0
>  ENCODING 'SQL_ASCII' LC_COLLATE 'C' LC_CTYPE 'C';
>
> but I really don't think that's generally a good idea. Storing random
> crap in text fields will cause you pain later. Better to either convert
> the text to a sane encoding, store it as bytea if you want the raw
> bytes, or reject it.

I certainly don't want to be encoding agnostic. I just would like to be able to read my log file using PGAdmin, which i
can'tright now, because PGAdmin 1.12. chops off the content after the 1st character that doesn't match the encoding. 

Best wishes,
Max

Maximilian Tyrtania Software-Entwicklung
Dessauer Str. 6-7
10969 Berlin
http://www.contactking.de



pgsql-general by date:

Previous
From: Kenneth Buckler
Date:
Subject: How to select a list of sequences?
Next
From: "James B. Byrne"
Date:
Subject: Re: Screencasts for PostgreSQL