Re: Encoding-related errors when moving from 7.3 to 8.0.1 - Mailing list pgsql-general

From Carlos Moreno
Subject Re: Encoding-related errors when moving from 7.3 to 8.0.1
Date
Msg-id 423D9080.2050605@mochima.com
Whole thread Raw
In response to Re: Encoding-related errors when moving from 7.3 to 8.0.1  (Alvaro Herrera <alvherre@dcc.uchile.cl>)
Responses Re: Encoding-related errors when moving from 7.3 to 8.0.1
Re: Encoding-related errors when moving from 7.3 to 8.0.1
List pgsql-general
Hi Alvaro, thanks for your reply!

Alvaro Herrera wrote:
>>psql:db_backup.sql:1548: ERROR:  invalid byte sequence for encoding
>>"UNICODE": 0xe12020
>>CONTEXT:  COPY country, line 5, column namespanish:
>>"Canad?                        "
>
> Hmm.  The sequence looks like latin1 interpreted as utf8.  This seems
> the inverse of the problem reported (and solved) here
>
> http://archives.postgresql.org/pgsql-es-ayuda/2005-03/msg00491.php
>
> Maybe you should try sticking a
>
> SET client_encoding TO latin1;
>
> at the beggining of the dump file.

One thing worries me, though.  With all of the previous versions
of postgresql (I think when we started to use it in our system,
it was version 7.1), I have never worried about any encoding
issues.  Our users are mostly Spanish-speaking, and they register
to our system via web-based interfaces;  virtually 100% of them
use Windows (and perhaps most of them Windows in Spanish, with
a Spanish keyboard).

So, our system (CGI's written in C++ running on a Linux server)
simply takes whatever the user gives (properly validated and
escaped) and throws it in the database.  We've never encountered
any problem  (well, or perhaps it's the opposite?  Perhaps we've
always been living with the problem without realizing it?)

I worry now that if I needed to put a set client_encoding
statement to make the insert or COPY statements work, does
that mean that I should modify each and every program that I
have that interacts with the database, and add a "set client
encoding" statement before whatever other statement(s) we
execute?

Or is this client_encoding setting something that gets attached
to the database (or the tables) itself?

Where can I find more documentation on these issues?  I'd like
to get a deeper understanding, to avoid any future problems.

> Why are you using CHAR(n) fields anyway?  It should probably be better
> if you used VARCHAR(n) ...

Una de esas cosas que pasan hasta en las mejores familias  ;-)

(I was also surprised when noticing the bunch of spaces at the
end -- I would have thought that we were using varchars in
fields like that one)

Thanks again!

Cheers,

Carlos
--
PS: I have a strict white-list anti-spam filter in place, which
     is why a direct e-mail would be rejected -- let me know if
     you want to write directly through e-mail, so that I can
     add you to the white list file.

pgsql-general by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: question about 8.1 and stored procedures
Next
From: perico@12move.nl
Date:
Subject: Betr: Re: Betr: Re: Question insert data