Thread: CP1250 to and from Unicode conversion, how?

CP1250 to and from Unicode conversion, how?

From
"Nikola Milutinovic"
Date:
Hi all.

I have a database with text fields containing text with Windows CP-1250 encoding. How can I convert it to Unicode? I
havebuild the database with 

--enable-recode              enable character set recode support
--enable-multibyte           enable multibyte character support
--enable-unicode-conversion  enable unicode conversion support

Also, how can I enter a string containing Unicode chars from "psql"? What is the Unicode escape sequence?

I mean, if all else fails, I'll dump database, run the dump through script/Java/C program to convert all CP-1250 chars
totheir Unicode equivalents and import it again. 

Hope someone will answer my question.

Nix.

Re: CP1250 to and from Unicode conversion, how?

From
Tatsuo Ishii
Date:
> I have a database with text fields containing text with Windows CP-1250 encoding. How can I convert it to Unicode? I
havebuild the database with 

Sorry, the conversion between CP-1250 and Unicode is not currently
supported, nor in 7.2. Actually adding that would be pretty easy, but
we are in the beta freeze phase and can not add a new functionality.

BTW, CP-1250 is equivalent to ISO-8859-2? If so, you could use the
encoding name "LATIN2" instead of WIN1250 and it supports the
converion to/from UNICODE.

> Also, how can I enter a string containing Unicode chars from "psql"? What is the Unicode escape sequence?

No idea. Why not using Unicode aware terminals? I use emacs + mule-ucs.
--
Tatsuo Ishii

Re: CP1250 to and from Unicode conversion, how?

From
Tatsuo Ishii
Date:
It might be a JDBC driver issue. Ask the JDBC gurus.

If you believe it's the problem of the backend, please give me
reproducible examples using psql.
--
Tatsuo Ishii

> Hi.
>
> Problems again.
>
> I have created a DB with encoding set to LATIN2, created tables.
> Connected to the database with psql, set encoding to WIN1250, imported data ("\copy ...")
> The data is there, definitely. Encoding is different from WIN1250, so I guess the encoding is really Latin-2.
>
> Now comes my creeping horror. I have a test Java application which connects to the database, taking one argument;
ENCODING.
>
> This is what comes out:
>
> <NO ENCODING>
> ---------------------------------------------------------------------
> Connecting with: jdbc:postgresql://legba.ev.co.yu/mercury
>
> ID: 39 NAME: Anica SURNAME: Ivkovi?
> ID: 87 NAME: Sa?a SURNAME: Ivkovi?
> ID: 130 NAME: Ljubica SURNAME: Ivkovi?
> ---------------------------------------------------------------------
>
> <LATIN-1>
> ---------------------------------------------------------------------
> Connecting with: jdbc:postgresql://legba.ev.co.yu/mercury?charSet=LATIN1
>
> ID: 39 NAME: Anica SURNAME: Ivkovic
> ID: 87 NAME: Saaa SURNAME: Ivkovic
> ID: 130 NAME: Ljubica SURNAME: Ivkovic
> ---------------------------------------------------------------------
>
> <LATIN-2>
> ---------------------------------------------------------------------
> Connecting with: jdbc:postgresql://legba.ev.co.yu/mercury?charSet=LATIN2
>
> ID: 39 NAME: Anica SURNAME: Ivkovi?
> ID: 87 NAME: Sa?a SURNAME: Ivkovi?
> ID: 130 NAME: Ljubica SURNAME: Ivkovi?
> ---------------------------------------------------------------------
>
> <UTF-8>
> ---------------------------------------------------------------------
> Connecting with: jdbc:postgresql://legba.ev.co.yu/mercury?charSet=UNICODE
>
> Exception in thread "main" java.sql.SQLException:
>         at org.postgresql.Connection.ExecSQL(Connection.java, Compiled Code)
>         at org.postgresql.jdbc2.Statement.execute(Statement.java, Compiled Code)
>         at org.postgresql.jdbc2.Statement.executeQuery(Statement.java, Compiled Code)
>         at test2PostgreSQL.main(test2PostgreSQL.java, Compiled Code)
> ---------------------------------------------------------------------
>
> So, <No encoding> and <Latin-2> give me "?", <Latin-1> gives me what looks like Latin-2 output and <Unicode> crashes
JDBCconnection. 
>
> >:-(
>
> Looks like I'm in for some serious learning...
>
> If it is of any help, on the "Legba.ev.co.yu", for <Unicode> case, which crashed JDBC, PostMaster is spitting out:
>
> ERROR:  parser: parse error at or near "t?"
> FATAL 1:  Socket command type S unknown
>
> I'm taking my "mining helmet" out, getting an axe from the closet and preparing to dig into the source. Before I
commitsuch an act, could you enlighten me? What is going on? 
>
> Nix.