Thread: "Invalid byte sequence" message

"Invalid byte sequence" message

From
Maximilian Tyrtania
Date:
Hi again,

i upgraded im pg installation to 9.0.3 (from 8.4.2) and now i'm having trouble looking at my log files with pg admin
1.12.2.(on Mac OS 10.6) - on every refresh I'd get a messagebox saying: 

2011-02-15 14:28:12 ERROR  : ERROR:  invalid byte sequence for encoding "UTF8": 0xe3b66c

The server runs on Mac OS 10.6, the data is UTF8, and the client connection settings are as well.
Any pointers?

Maximilian Tyrtania Software-Entwicklung
Dessauer Str. 6-7
10969 Berlin
http://www.contactking.de



Re: "Invalid byte sequence" message

From
Maximilian Tyrtania
Date:
Just found this in my log file:

<postgres%2011-02-16 13:55:32 CET22021>ERROR:  invalid byte sequence for encoding "UTF8": 0xe3bc64
<postgres%2011-02-16 13:55:32 CET22021>STATEMENT:  SELECT pg_file_read('pg_log/postgresql-2011-02-16_000000.log',
100000,50000) 

Still not sure what's going on there. Apparently the contents of the logfile are not valid UTF8 characters. Also, after
iclicked the message boxes away, the log files contents appear incomplete in the log viewer (a couple hours worth of
entriesare simply missing).  

Maximilian Tyrtania Software-Entwicklung
Dessauer Str. 6-7
10969 Berlin
http://www.contactking.de

Am 15.02.2011 um 16:07 schrieb Maximilian Tyrtania:

> Hi again,
>
> i upgraded im pg installation to 9.0.3 (from 8.4.2) and now i'm having trouble looking at my log files with pg admin
1.12.2.(on Mac OS 10.6) - on every refresh I'd get a messagebox saying: 
>
> 2011-02-15 14:28:12 ERROR  : ERROR:  invalid byte sequence for encoding "UTF8": 0xe3b66c
>
> The server runs on Mac OS 10.6, the data is UTF8, and the client connection settings are as well.
> Any pointers?


Re: "Invalid byte sequence" message

From
Guillaume Lelarge
Date:
Le 16/02/2011 14:21, Maximilian Tyrtania a écrit :
> Just found this in my log file:
> 
> <postgres%2011-02-16 13:55:32 CET22021>ERROR:  invalid byte sequence for encoding "UTF8": 0xe3bc64
> <postgres%2011-02-16 13:55:32 CET22021>STATEMENT:  SELECT pg_file_read('pg_log/postgresql-2011-02-16_000000.log',
100000,50000)
 
> 
> Still not sure what's going on there. Apparently the contents of the logfile are not valid UTF8 characters. Also,
afteri clicked the message boxes away, the log files contents appear incomplete in the log viewer (a couple hours worth
ofentries are simply missing). 
 
> 

I suppose it stopped to process the rest of the file once it found an
invalid UTF8 character. There's not much we can do about this.


-- 
Guillaumehttp://www.postgresql.frhttp://dalibo.com


Re: "Invalid byte sequence" message

From
Guillaume Lelarge
Date:
Le 22/02/2011 21:58, Guillaume Lelarge a écrit :
> Le 16/02/2011 14:21, Maximilian Tyrtania a écrit :
>> Just found this in my log file:
>>
>> <postgres%2011-02-16 13:55:32 CET22021>ERROR:  invalid byte sequence for encoding "UTF8": 0xe3bc64
>> <postgres%2011-02-16 13:55:32 CET22021>STATEMENT:  SELECT pg_file_read('pg_log/postgresql-2011-02-16_000000.log',
100000,50000)
 
>>
>> Still not sure what's going on there. Apparently the contents of the logfile are not valid UTF8 characters. Also,
afteri clicked the message boxes away, the log files contents appear incomplete in the log viewer (a couple hours worth
ofentries are simply missing). 
 
>>
> 
> I suppose it stopped to process the rest of the file once it found an
> invalid UTF8 character. There's not much we can do about this.
> 
> 

One guy on a french web forum has the same issue than you. Can you tell
me the value of your lc_messages parameter?


-- 
Guillaumehttp://www.postgresql.frhttp://dalibo.com


Re: "Invalid byte sequence" message

From
"Little, Douglas"
Date:
We see this a lot with web applications where users cut/paste from MS-Word.
In our case, the web app and db (oracle) are the same character set, so no translation or validation is done.
Oracle will store the values, even though they aren't valid UTF8 characters.

We run into problems when the values are imported to our Greenplum/postgres dw.
We don't have a workaround.

Doug


-----Original Message-----
From: pgadmin-support-owner@postgresql.org [mailto:pgadmin-support-owner@postgresql.org] On Behalf Of Guillaume Lelarge
Sent: Wednesday, February 23, 2011 2:41 PM
To: Maximilian Tyrtania
Cc: pgadmin-support@postgresql.org
Subject: Re: [pgadmin-support] "Invalid byte sequence" message

Le 22/02/2011 21:58, Guillaume Lelarge a écrit :
> Le 16/02/2011 14:21, Maximilian Tyrtania a écrit :
>> Just found this in my log file:
>>
>> <postgres%2011-02-16 13:55:32 CET22021>ERROR:  invalid byte sequence for encoding "UTF8": 0xe3bc64
>> <postgres%2011-02-16 13:55:32 CET22021>STATEMENT:  SELECT pg_file_read('pg_log/postgresql-2011-02-16_000000.log',
100000,50000) 
>>
>> Still not sure what's going on there. Apparently the contents of the logfile are not valid UTF8 characters. Also,
afteri clicked the message boxes away, the log files contents appear incomplete in the log viewer (a couple hours worth
ofentries are simply missing).  
>>
>
> I suppose it stopped to process the rest of the file once it found an
> invalid UTF8 character. There's not much we can do about this.
>
>

One guy on a french web forum has the same issue than you. Can you tell
me the value of your lc_messages parameter?


--
Guillaumehttp://www.postgresql.frhttp://dalibo.com

--
Sent via pgadmin-support mailing list (pgadmin-support@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgadmin-support


Re: "Invalid byte sequence" message

From
Vik Reykja
Date:
<div class="gmail_quote">On Wed, Feb 23, 2011 at 21:41, Guillaume Lelarge <span dir="ltr"><<a
href="mailto:guillaume@lelarge.info">guillaume@lelarge.info</a>></span>wrote:<br /><blockquote class="gmail_quote"
style="margin:0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> Le 22/02/2011 21:58,
GuillaumeLelarge a écrit :<br /> > Le 16/02/2011 14:21, Maximilian Tyrtania a écrit :<br /> >> Just found this
inmy log file:<br /> >><br /> >> <postgres%2011-02-16 13:55:32 CET22021>ERROR:  invalid byte sequence
forencoding "UTF8": 0xe3bc64<br /> >> <postgres%2011-02-16 13:55:32 CET22021>STATEMENT:  SELECT
pg_file_read('pg_log/postgresql-2011-02-16_000000.log',100000, 50000)<br /> >><br /> >> Still not sure
what'sgoing on there. Apparently the contents of the logfile are not valid UTF8 characters. Also, after i clicked the
messageboxes away, the log files contents appear incomplete in the log viewer (a couple hours worth of entries are
simplymissing).<br /> >><br /> ><br /> > I suppose it stopped to process the rest of the file once it found
an<br/> > invalid UTF8 character. There's not much we can do about this.<br /> ><br /> ><br /><br /> One guy
ona french web forum has the same issue than you. Can you tell<br /> me the value of your lc_messages parameter?<br
/></blockquote></div><br/>I get it quite easily with LC_MESSAGES = 'French, France' (the installer's default) on a
FrenchWindows.<br /><br /> See this unresolved thread for more info: <a
href="http://archives.postgresql.org/pgsql-bugs/2010-09/msg00138.php">http://archives.postgresql.org/pgsql-bugs/2010-09/msg00138.php</a><br
/><br/> 

Re: "Invalid byte sequence" message

From
Guillaume Lelarge
Date:
Le 23/02/2011 22:51, Vik Reykja a écrit :
> On Wed, Feb 23, 2011 at 21:41, Guillaume Lelarge <guillaume@lelarge.info>wrote:
> 
>> Le 22/02/2011 21:58, Guillaume Lelarge a écrit :
>>> Le 16/02/2011 14:21, Maximilian Tyrtania a écrit :
>>>> Just found this in my log file:
>>>>
>>>> <postgres%2011-02-16 13:55:32 CET22021>ERROR:  invalid byte sequence for
>> encoding "UTF8": 0xe3bc64
>>>> <postgres%2011-02-16 13:55:32 CET22021>STATEMENT:  SELECT
>> pg_file_read('pg_log/postgresql-2011-02-16_000000.log', 100000, 50000)
>>>>
>>>> Still not sure what's going on there. Apparently the contents of the
>> logfile are not valid UTF8 characters. Also, after i clicked the message
>> boxes away, the log files contents appear incomplete in the log viewer (a
>> couple hours worth of entries are simply missing).
>>>
>>> I suppose it stopped to process the rest of the file once it found an
>>> invalid UTF8 character. There's not much we can do about this.
>>>
>>
>> One guy on a french web forum has the same issue than you. Can you tell
>> me the value of your lc_messages parameter?
>>
> 
> I get it quite easily with LC_MESSAGES = 'French, France' (the installer's
> default) on a French Windows.
> 
> See this unresolved thread for more info:
> http://archives.postgresql.org/pgsql-bugs/2010-09/msg00138.php
> 

That's what the guy has (see
http://forums.postgresql.fr/viewtopic.php?pid=8214#p8214 if you read
french). I assume it would work well with lc_messages set to C. Any
production server should have lc_messages set to C.


-- 
Guillaumehttp://www.postgresql.frhttp://dalibo.com


Re: "Invalid byte sequence" message

From
Vik Reykja
Date:
On Wed, Feb 23, 2011 at 23:11, Guillaume Lelarge <guillaume@lelarge.info> wrote:
> default) on a French Windows.
>
> See this unresolved thread for more info:
> http://archives.postgresql.org/pgsql-bugs/2010-09/msg00138.php
>

> I get it quite easily with LC_MESSAGES = 'French, France' (the installer's
That's what the guy has (see
http://forums.postgresql.fr/viewtopic.php?pid=8214#p8214 if you read
french). I assume it would work well with lc_messages set to C.

I set it to 'en_US' which has the same effect.
 
Any production server should have lc_messages set to C.

I disagree.  Any system that offers to write my messages in French should do so correctly, *especially* if it's the default.

In any case, it's a core PostgreSQL bug, not a PGAdmin bug.  It could be dealt with a little more gracefully, though.

Re: "Invalid byte sequence" message

From
Guillaume Lelarge
Date:
Le 23/02/2011 23:20, Vik Reykja a écrit :
> On Wed, Feb 23, 2011 at 23:11, Guillaume Lelarge <guillaume@lelarge.info>wrote:
> 
>>  > default) on a French Windows.
>>>
>>> See this unresolved thread for more info:
>>> http://archives.postgresql.org/pgsql-bugs/2010-09/msg00138.php
>>>
>>
>>> I get it quite easily with LC_MESSAGES = 'French, France' (the
>> installer's
>> That's what the guy has (see
>> http://forums.postgresql.fr/viewtopic.php?pid=8214#p8214 if you read
>> french). I assume it would work well with lc_messages set to C.
> 
> I set it to 'en_US' which has the same effect.
> 

Yeah, that's right. I set it to C because it needs less typing :)

>> Any production server should have lc_messages set to C.
>>
> 
> I disagree.  Any system that offers to write my messages in French should do
> so correctly, *especially* if it's the default.
> 

Yeah. There are three main issues with translated messages:
* Try searching anything on Google with french messages. You'll be  lucky if you find something, and you'll get
billionsof results in  english.
 
* Try asking something in the mailing lists with french messages. The  first answer will be: get us the english
messages.
* Try using any log parser (like pgfouine) with french messages. It  won't work (even with this tool, written by a
frenchguy).
 

I said french, but I suppose these issues are also a problem to other
languages. Other meaning all but english.

> In any case, it's a core PostgreSQL bug, not a PGAdmin bug.  It could be
> dealt with a little more gracefully, though.
> 

Well, pgAdmin could read lc_messages value to guess if it can read it.
That's probably all we can do.


-- 
Guillaumehttp://www.postgresql.frhttp://dalibo.com


Re: "Invalid byte sequence" message

From
Vik Reykja
Date:
On Wed, Feb 23, 2011 at 23:37, Guillaume Lelarge <guillaume@lelarge.info> wrote:
>>
>
> I disagree.  Any system that offers to write my messages in French should do
> so correctly, *especially* if it's the default.
>

>> Any production server should have lc_messages set to C.
Yeah. There are three main issues with translated messages:

 * Try searching anything on Google with french messages. You'll be
  lucky if you find something, and you'll get billions of results in
  english.

Chicken and egg.  The more we discourage localized messages, the less information there will be about them.  Why even bother translating?
 
 * Try asking something in the mailing lists with french messages. The
  first answer will be: get us the english messages.

Even on the French mailing list?  Again, I see this as a problem to be solved, not avoided.
 
 * Try using any log parser (like pgfouine) with french messages. It
  won't work (even with this tool, written by a french guy).

So you've succeeded in making an argument for improving that tool :-)

Re: "Invalid byte sequence" message

From
Guillaume Lelarge
Date:
Le 23/02/2011 23:57, Vik Reykja a écrit :
> On Wed, Feb 23, 2011 at 23:37, Guillaume Lelarge <guillaume@lelarge.info>wrote:
> 
>>  >>
>>>
>>> I disagree.  Any system that offers to write my messages in French should
>> do
>>> so correctly, *especially* if it's the default.
>>>
>>
>>>> Any production server should have lc_messages set to C.
>> Yeah. There are three main issues with translated messages:
>>
>>  * Try searching anything on Google with french messages. You'll be
>>   lucky if you find something, and you'll get billions of results in
>>   english.
>>
> 
> Chicken and egg.  The more we discourage localized messages, the less
> information there will be about them.  Why even bother translating?
> 

For new users who want to try without having to deal with english message.

>>  * Try asking something in the mailing lists with french messages. The
>>   first answer will be: get us the english messages.
>>
> 
> Even on the French mailing list?  Again, I see this as a problem to be
> solved, not avoided.
> 

No. But you don't have many hackers on the french mailing lists :)

>>  * Try using any log parser (like pgfouine) with french messages. It
>>   won't work (even with this tool, written by a french guy).
>>
> 
> So you've succeeded in making an argument for improving that tool :-)
> 

Actually, no. The english messages don't change between minor releases.
French messages do, and a lot. There won't be any easy way to add such a
feature to pgFouine.


-- 
Guillaumehttp://www.postgresql.frhttp://dalibo.com


Re: "Invalid byte sequence" message

From
Maximilian Tyrtania
Date:
Am 23.02.2011 um 21:41 schrieb Guillaume Lelarge:

> Le 22/02/2011 21:58, Guillaume Lelarge a écrit :
>> Le 16/02/2011 14:21, Maximilian Tyrtania a écrit :
>>> Just found this in my log file:
>>>
>>> <postgres%2011-02-16 13:55:32 CET22021>ERROR:  invalid byte sequence for encoding "UTF8": 0xe3bc64
>>> <postgres%2011-02-16 13:55:32 CET22021>STATEMENT:  SELECT pg_file_read('pg_log/postgresql-2011-02-16_000000.log',
100000,50000) 
>>>
>>> Still not sure what's going on there. Apparently the contents of the logfile are not valid UTF8 characters. Also,
afteri clicked the message boxes away, the log files contents appear incomplete in the log viewer (a couple hours worth
ofentries are simply missing).  

>> I suppose it stopped to process the rest of the file once it found an
>> invalid UTF8 character. There's not much we can do about this.
>
> One guy on a french web forum has the same issue than you. Can you tell
> me the value of your lc_messages parameter?

It was set to "de_DE.UTF8". Changed it to C, which is fine with me. Seems to have fixed the problem.

Thanks,

Maximilian Tyrtania Software-Entwicklung
Dessauer Str. 6-7
10969 Berlin
http://www.contactking.de