Thread: Question about encoding

Question about encoding

From

Denis BUCHER

Date:

24 August 2009, 16:43:01

Hello,

I'm sure you already replied tons of questions like this, but I don't
understand something.

This is the situation :
* I have COPY commands from a dump, encoded as UTF8 (special characters
are encoded as 2-bytes).
* My file contains SET client_encoding = 'UTF8';
* But when I do psql <myfile.sql I get an error for 40 characters long
strings that have UTF8 characters : ERROR:  value too long for type
character varying(40)
* I think I understand why, psql -l says that my database is SQL_ASCII
* Postgresql 7.4

Question 1 :
Is it the expected behavior ? These characters have a SQL_ASCII
equivalent because I already have them stored in another table of the
same database

Question 2 :
If yes, then I suppose I should have the database as LATIN1 or UTF8.
Can I change/convert the encoding of the database ? Or at least of the
schema (which would be even better)

Thanks a lot for any help :-)))

Denis

Re: Question about encoding

From

Peter Eisentraut

Date:

24 August 2009, 22:42:41

On mån, 2009-08-24 at 18:42 +0200, Denis BUCHER wrote:
> Question 1 :
> Is it the expected behavior ? These characters have a SQL_ASCII
> equivalent because I already have them stored in another table of the
> same database

SQL_ASCII is not the same as ASCII.  SQL_ASCII means, take the bytes as
they come.  So a 40-character UTF-8 string might indeed be longer than
40 bytes, which is what SQL_ASCII will look at.

The best bet is to avoid SQL_ASCII altogether.  It's pretty bogus and
inconsistent.

> Question 2 :
> If yes, then I suppose I should have the database as LATIN1 or UTF8.
> Can I change/convert the encoding of the database ? Or at least of the
> schema (which would be even better)

Dump, recreate database with right encoding, restore.

Re: Question about encoding

From

Denis BUCHER

Date:

25 August 2009, 21:45:06

Hello,

Peter Eisentraut a écrit :
>> Question 1 :
>> Is it the expected behavior ? These characters have a SQL_ASCII
>> equivalent because I already have them stored in another table of the
>> same database
> 
> SQL_ASCII is not the same as ASCII.  SQL_ASCII means, take the bytes as
> they come.  So a 40-character UTF-8 string might indeed be longer than
> 40 bytes, which is what SQL_ASCII will look at.
> 
> The best bet is to avoid SQL_ASCII altogether.  It's pretty bogus and
> inconsistent.

OK that's a good point it explains everything.

>> Question 2 :
>> If yes, then I suppose I should have the database as LATIN1 or UTF8.
>> Can I change/convert the encoding of the database ? Or at least of the
>> schema (which would be even better)
> 
> Dump, recreate database with right encoding, restore.

Ok I did it but postgresql 7.4 was not very nice, the following failed :
1. dump
2. DROP DATABASE; CREATE DATABASE WITH ENCODING UTF8
3. restore

And it bugged. In fact the dump was writtent with "SET CLIENT
ENCODING=SQL_ASCII but when restoring on the UTF8 databse it failed, I
had to change it to SET client_encoding = LATIN1.

Anyway, finally everything is working again, I just had to add a SET
NAMES LATIN1 to my application because it was ISO and not UTF8 aware.

Therefore thanks a lot for your help, it solved all my problems !

Denis