Thread: Charset and encoding

Charset and encoding

From
Rosa Maria Carro Salas
Date:
Hello,

    I have defined a database with the encoding SQL_ASCII. I fill the
database by using embedded SQL. When I insert the value "Señales", it is
correctly inserted. I read the correct value by typing the query in
postgres, but when I access this database from a Java program, via JDBC,
I obtain the value "Seqales". The LANG variable is set to "es_ES"
(spanish) which I suppose is OK.

    I have changed this database encoding to LATIN1 and set the client
characterset to LATIN1 by typing  \encoding LATIN1.

    Now when I access the data from postgres interface I obtain
"Se(82f1)ales", and when I get the value through the Java program, I
obtain
"Se ñales".

    Does anybody knows what is happening? Where can I find information
about this?
    Thanks in advance,
    Rosa M. Carro



Re: Charset and encoding

From
Tatsuo Ishii
Date:
>     I have defined a database with the encoding SQL_ASCII. I fill the
> database by using embedded SQL. When I insert the value "Señales", it is
> correctly inserted. I read the correct value by typing the query in
> postgres, but when I access this database from a Java program, via JDBC,
> I obtain the value "Seqales". The LANG variable is set to "es_ES"
> (spanish) which I suppose is OK.
>
>     I have changed this database encoding to LATIN1 and set the client
> characterset to LATIN1 by typing  \encoding LATIN1.
>
>     Now when I access the data from postgres interface I obtain
> "Se(82f1)ales", and when I get the value through the Java program, I
> obtain
> "Se ñales".
>
>     Does anybody knows what is happening? Where can I find information
> about this?
>     Thanks in advance,
>     Rosa M. Carro

(82f1) is 0x82 (leading character for LATIN2) + 0xf1 (Spanish 'n'),
that is the intermediate representation in the backend when the
encoding translatin is necessary. My guess is you set the database
encoding to LATIN2, not LATIN1. Can you show me the result of the
query:

select * from pg_database;
--
Tatsuo Ishii

Re: Charset and encoding

From
Rosa Maria Carro Salas
Date:
datname   datdba    encoding    datpath
------------------------------------------------
courses      26                  7      courses

I have tested with LATIN1 - LATIN5 and I haven't got any result...
I'd need the Spanish 'ñ' (that is what I insert)...
Maybe I need to insert this character in a special way? The client is
set to the same encoding as the database is.

Thanks,
Rosa M. Carro

P.S. Could it be related to multibyte (or something similar)?




Tatsuo Ishii wrote:

> >     I have defined a database with the encoding SQL_ASCII. I fill the
> > database by using embedded SQL. When I insert the value "Señales", it is
> > correctly inserted. I read the correct value by typing the query in
> > postgres, but when I access this database from a Java program, via JDBC,
> > I obtain the value "Seqales". The LANG variable is set to "es_ES"
> > (spanish) which I suppose is OK.
> >
> >     I have changed this database encoding to LATIN1 and set the client
> > characterset to LATIN1 by typing  \encoding LATIN1.
> >
> >     Now when I access the data from postgres interface I obtain
> > "Se(82f1)ales", and when I get the value through the Java program, I
> > obtain
> > "Se ñales".
> >
> >     Does anybody knows what is happening? Where can I find information
> > about this?
> >     Thanks in advance,
> >     Rosa M. Carro
>
> (82f1) is 0x82 (leading character for LATIN2) + 0xf1 (Spanish 'n'), that is
> the intermediate representation in the backend when the encoding translatin
> is necessary. My guess is you set the database encoding to LATIN2, not
> LATIN1. Can you show me the result of the query:
>
> select * from pg_database;
> --
> Tatsuo Ishii

Re: Charset and encoding

From
Tatsuo Ishii
Date:
> datname   datdba    encoding    datpath
> ------------------------------------------------
> courses      26                  7      courses

Hum. Look ok to me.

> I have tested with LATIN1 - LATIN5 and I haven't got any result...
> I'd need the Spanish 'ñ' (that is what I insert)...
> Maybe I need to insert this character in a special way? The client is
> set to the same encoding as the database is.

Can I have a physical copy of your database so that I could dig into
the problem? I mean an archive file using tar of whole contents of
$PGDATA...
--
Tatsuo Ishii


Re: Charset and encoding

From
Tatsuo Ishii
Date:
> > datname   datdba    encoding    datpath
> > ------------------------------------------------
> > courses      26                  7      courses
>
> Hum. Look ok to me.
>
> > I have tested with LATIN1 - LATIN5 and I haven't got any result...
> > I'd need the Spanish 'ñ' (that is what I insert)...
> > Maybe I need to insert this character in a special way? The client is
> > set to the same encoding as the database is.
>
> Can I have a physical copy of your database so that I could dig into
> the problem? I mean an archive file using tar of whole contents of
> $PGDATA...
> --
> Tatsuo Ishii

I got data from Rosa, and tested with PostgreSQL 7.0.3/Linux/x86
configured with --enable-multibyte=LATIN1 --enable-locale.

test=# select * from test;
 name |    descr
------+--------------
 hola | Señalizacion
 hola | Señalizacion
 hola | Señalizacion
 hola | Señalizacion
(4 rows)

Seems ok to me. Maybe you have some probloems with your PostgreSQL
installation?
--
Tatsuo Ishii