Thread: JDBC to load UTF8@psql to latin1@mysql

JDBC to load UTF8@psql to latin1@mysql

From
Emi Lu
Date:
Good morning,

Is there a simple way to load UTF8 data in psql to mysql(with latin1
encoding) through JDBC?

Thanks a lot!
Emi


Re: JDBC to load UTF8@psql to latin1@mysql

From
Emi Lu
Date:
> Is there a simple way to load UTF8 data in psql to mysql(with latin1
> encoding) through JDBC?
>
JAVA codes work for most of characters, but not "-È". Someone knows why
the following codes cannot load "-È" to mysql@latin1?

Thanks a lot!

--
public static String utf8_to_latin1(String str)
    throws Exception
    {
       try
       {
          String stringToConvert = str;
          byte[] convertStringToByte = stringToConvert.getBytes("UTF-8");
          return new String(convertStringToByte, "ISO-8859-1");
       }catch(Exception e)
       {
          log.error("utf8_to_latin1 Error: " + e.getMessage());
          log.error(e);
          throw e;
       }
    }


Re: JDBC to load UTF8@psql to latin1@mysql

From
Paul Jungwirth
Date:
I don't think your Java code does what you think it does. You should read some more about how Java handles string encodings. Here is a method I wrote some years ago that might also help you. It converts streams, not strings, but what you need should be pretty close (and simpler):

  /**
   * Interprets in according to encIn, and converts it to encOut,
   * writing to out. Allocates buffer for the buffer size.
   * 
   * @param encIn The input encoding.
   * @param encOut The output encoding.
   * @param in The data to convert.
   * @param out Where to send the converted data.
   * @param buffer The size of the buffer or 0 for the default.
   *
   * @throws IOException
   */
  public void run(String encIn, String encOut, InputStream in, OutputStream out, int buffer) throws IOException {
    Reader r = null;
    Writer w = null;
    int len;
    char[]  b;
   
    try {
      if (buffer > 0) {
        r = new BufferedReader(new InputStreamReader(in, encIn), buffer);
        w = new BufferedWriter(new OutputStreamWriter(out, encOut), buffer);
      } else {
        r = new BufferedReader(new InputStreamReader(in, encIn));
        w = new BufferedWriter(new OutputStreamWriter(out, encOut));
        buffer = DEFAULT_BUFFER_SIZE;
      }
      b = new char[buffer];

      while ((len = r.read(b, 0, buffer)) != -1) {
        w.write(b, 0, len);
      }
    } finally {
      try {
        if (r != null) r.close();
      } finally {
        if (w != null) w.close();
      }
    }
  }

Btw, none of this has anything to do with Postgres. :-)

Paul



On Wed, Dec 12, 2012 at 10:19 AM, Emi Lu <emilu@encs.concordia.ca> wrote:

Is there a simple way to load UTF8 data in psql to mysql(with latin1
encoding) through JDBC?

JAVA codes work for most of characters, but not "-È". Someone knows why the following codes cannot load "-È" to mysql@latin1?

Thanks a lot!

--
public static String utf8_to_latin1(String str)
   throws Exception
   {
      try
      {
         String stringToConvert = str;
         byte[] convertStringToByte = stringToConvert.getBytes("UTF-8");
         return new String(convertStringToByte, "ISO-8859-1");
      }catch(Exception e)
      {
         log.error("utf8_to_latin1 Error: " + e.getMessage());
         log.error(e);
         throw e;

      }
   }


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
_________________________________
Pulchritudo splendor veritatis.

Re: JDBC to load UTF8@psql to latin1@mysql

From
Thomas Kellerer
Date:
Emi Lu wrote on 12.12.2012 17:17:
> Good morning,
>
> Is there a simple way to load UTF8 data in psql to mysql(with latin1 encoding) through JDBC?

All you need to do is to query the source database, then use ResultSet.getString() to obtain the data and use a
PreparedStatementand PreparedStatement.setString() to insert/update  the data on the target database. 

The JDBC drivers will handle all the conversion.
Do NOT manually convert the data.

getString() and setString() will do everything correctly.

Thomas



Re: JDBC to load UTF8@psql to latin1@mysql

From
Emi Lu
Date:
>> Is there a simple way to load UTF8 data in psql to mysql(with latin1
>> encoding) through JDBC?
>
> All you need to do is to query the source database, then use
> ResultSet.getString() to obtain the data and use a PreparedStatement and
> PreparedStatement.setString() to insert/update  the data on the target
> database.
>
> The JDBC drivers will handle all the conversion.
> Do NOT manually convert the data.
>
> getString() and setString() will do everything correctly.

I am not using stmt directly but through Mybatis for all db transactions.

So, this approach will not work.

Thanks.
--
Emi




Re: JDBC to load UTF8@psql to latin1@mysql

From
Edson Richter
Date:
Em 13/12/2012 12:00, Emi Lu escreveu:
>
>>> Is there a simple way to load UTF8 data in psql to mysql(with latin1
>>> encoding) through JDBC?
>>
>> All you need to do is to query the source database, then use
>> ResultSet.getString() to obtain the data and use a PreparedStatement and
>> PreparedStatement.setString() to insert/update  the data on the target
>> database.
>>
>> The JDBC drivers will handle all the conversion.
>> Do NOT manually convert the data.
>>
>> getString() and setString() will do everything correctly.
>
> I am not using stmt directly but through Mybatis for all db transactions.

Should not this a Mybatis problem instead?
As stated, JDBC drivers does all the conversion needed automatically,
but if you have a middleware messing with your enconding, then the
problem is the middleware, not databases or drivers.

Edson

>
> So, this approach will not work.
>
> Thanks.
> --
> Emi
>
>
>
>



Re: JDBC to load UTF8@psql to latin1@mysql

From
Emi Lu
Date:
> I don't think your Java code does what you think it does. You should
> read some more about how Java handles string encodings. Here is a method
> I wrote some years ago that might also help you. It converts streams,
> not strings, but what you need should be pretty close (and simpler):
>    /**
>     * Interprets in according to encIn, and converts it to encOut,
>     * writing to out. Allocates buffer for the buffer size.
>     * @param encIn The input encoding.
>     * @param encOut The output encoding.
>     * @param in The data to convert.
>     * @param out Where to send the converted data.
>     * @param buffer The size of the buffer or 0 for the default.
>     * @throws IOException
>     */
>    public void run(String encIn, String encOut, InputStream in,
> OutputStream out, int buffer) throws IOException {
>      Reader r = null;
>      Writer w = null;
>      int len;
>      char[]  b;
>      try {
>        if (buffer > 0) {
>          r = new BufferedReader(new InputStreamReader(in, encIn), buffer);
>          w = new BufferedWriter(new OutputStreamWriter(out, encOut),
> buffer);
>        } else {
>          r = new BufferedReader(new InputStreamReader(in, encIn));
>          w = new BufferedWriter(new OutputStreamWriter(out, encOut));
>          buffer = DEFAULT_BUFFER_SIZE;
>        }
>        b = new char[buffer];
>
>        while ((len = r.read(b, 0, buffer)) != -1) {
>          w.write(b, 0, len);
>        }
>      } finally {
>        try {
>          if (r != null) r.close();
>        } finally {
>          if (w != null) w.close();
>        }
>      }
>    }
> Btw, none of this has anything to do with Postgres. :-)
Thank you for the code first. I will try it later. The problem I had as
mentioned in the subject is:
(1) psql@utf8
(2) mysql@latin1

When I load data from (1) to (2) through Mybatis, french characters
could not be mapped correctly in (2). I was thinking that psql may have
methods could help this. But it seems that I have to try from java
coding side :-(

--
Emi


Re: JDBC to load UTF8@psql to latin1@mysql

From
Paul Jungwirth
Date:
> The JDBC drivers will handle all the conversion.
> Do NOT manually convert the data.

Yeah, I agree this is the right answer here, since you're using JDBC. By the time you get a String from the MySQL driver, it's already in Java's 2-bytes-per-char format. And the Postgres driver will deal with the encoding on the output side. So the code I provided won't help you. I'm afraid I don't know about Mybatis, but if it's built on JDBC I'd think you've just got a configuration problem with what encoding the client expects at either end.

Paul


--
_________________________________
Pulchritudo splendor veritatis.

Re: JDBC to load UTF8@psql to latin1@mysql

From
Thomas Kellerer
Date:
Emi Lu wrote on 13.12.2012 15:00:
>
>>> Is there a simple way to load UTF8 data in psql to mysql(with latin1
>>> encoding) through JDBC?
>>
>> All you need to do is to query the source database, then use
>> ResultSet.getString() to obtain the data and use a PreparedStatement and
>> PreparedStatement.setString() to insert/update  the data on the target
>> database.
>>
>> The JDBC drivers will handle all the conversion.
>> Do NOT manually convert the data.
>>
>> getString() and setString() will do everything correctly.
>
> I am not using stmt directly but through Mybatis for all db transactions.
>
> So, this approach will not work.

Then it's a problem of that MyBatis thing.

Thomas



Re: JDBC to load UTF8@psql to latin1@mysql

From
Tom Lane
Date:
Paul Jungwirth <pj@illuminatedcomputing.com> writes:
> Yeah, I agree this is the right answer here, since you're using JDBC. By
> the time you get a String from the MySQL driver, it's already in Java's
> 2-bytes-per-char format. And the Postgres driver will deal with the
> encoding on the output side. So the code I provided won't help you. I'm
> afraid I don't know about Mybatis, but if it's built on JDBC I'd think
> you've just got a configuration problem with what encoding the client
> expects at either end.

I was wondering if the problem wasn't lots simpler than that.  Is the
character the OP is trying to convert actually part of LATIN1?

            regards, tom lane


Re: JDBC to load UTF8@psql to latin1@mysql

From
Emi Lu
Date:
>> Yeah, I agree this is the right answer here, since you're using JDBC. By
>> the time you get a String from the MySQL driver, it's already in Java's
>> 2-bytes-per-char format. And the Postgres driver will deal with the
>> encoding on the output side. So the code I provided won't help you. I'm
>> afraid I don't know about Mybatis, but if it's built on JDBC I'd think
>> you've just got a configuration problem with what encoding the client
>> expects at either end.
> From: Tom lane
> I was wondering if the problem wasn't lots simpler than that. Is the
> character the OP is trying to convert actually part of LATIN1?

First, the data loading is from psql(unix) to mysql(Unix). Second, DB
transactions are through JAVA+MyBatis.

Steps:
(1) Read utf8.data@psql from psql.xml into java.ArrayList<Bean>

(2) For each list.rec, save into mysql@latin through mysql.xml

Tried "jdbc:mysql://.../mysql_db?...unicode...encoding...=ISO..." No.
This does not work.

For now, through the following method, all letters are correctly
transformed except "È".

What does OP stand for?

Emi
--
public static String utf8_to_latin1(String str)
throws Exception
{
    try
    {
       if(str.indexOf("È")>=0)
       {
          str = str.replaceAll("È", "E");
       }
       byte[] convertStringToByte = str.getBytes("UTF-8");
       str = new String(convertStringToByte, "ISO-8859-1");
       return str;
    }catch(Exception e)
    {
       log.error("utf8_to_latin1 Error: " + e.getMessage());
       log.error(e);
       throw e;
    }
}


Re: JDBC to load UTF8@psql to latin1@mysql

From
Adrian Klaver
Date:
On 12/14/2012 06:06 AM, Emi Lu wrote:


>
> What does OP stand for?

Original Poster.

>
> Emi
> --
>


--
Adrian Klaver
adrian.klaver@gmail.com


Re: JDBC to load UTF8@psql to latin1@mysql

From
Emi Lu
Date:
On 12/14/2012 09:49 AM, Adrian Klaver wrote:
> Original Poster
Thanks. And more info:
Mysql
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| character_set_client     | latin1 |
| character_set_connection | latin1 |
| character_set_database   | latin1 |
| character_set_filesystem | binary |
| character_set_results    | latin1 |
| character_set_server     | latin1 |
| character_set_system     | utf8   |
+--------------------------+--------+
SHOW VARIABLES LIKE "character\_set\_database";
+------------------------+--------+
| Variable_name          | Value  |
+------------------------+--------+
| character_set_database | latin1 |
+------------------------+--------+

Psql everywhere is utf8.

Where could be the problem located?

Also by using the java encoding methods, all characters except "È" are
transformed correctly.

Thanks alot!
Emi
--
public static String utf8_to_latin1(String str)
    throws Exception
    {
       try
       {
          if(str.indexOf("È")>=0)
          {
             str = str.replaceAll("È", "E");
          }
          byte[] convertStringToByte = str.getBytes("UTF-8");
          str                        = new String(convertStringToByte,
"ISO-8859-1");
          return str;
       }catch(Exception e)
       {
          log.error("utf8_to_latin1 Error: " + e.getMessage());
          log.error(e);
          throw e;
       }
    }



Re: JDBC to load UTF8@psql to latin1@mysql

From
Tom Lane
Date:
Emi Lu <emilu@encs.concordia.ca> writes:
> For now, through the following method, all letters are correctly
> transformed except "�".

Meh.  That character renders as \310 in your mail, which is not an
assigned code in ISO 8859-1.  The numerically corresponding Unicode
value would be U+0090, which is an unspecified control character.

I surmise that your source data is not actually either Unicode or
ISO 8859-1, but one of the random "extended" character sets that
Microsoft has loosed upon the world, perhaps windows-1252
http://en.wikipedia.org/wiki/Windows-1252

The conversion code that you're using is quite right to reject the
character as not being valid LATIN1.  What you need to do is figure out
what the data actually is and correct its encoding.  It's evidently
stored wrong in the UTF8 data, if you believe that this code is a
letter.

            regards, tom lane


Re: JDBC to load UTF8@psql to latin1@mysql

From
Tom Lane
Date:
I wrote:
> Meh.  That character renders as \310 in your mail, which is not an
> assigned code in ISO 8859-1.  The numerically corresponding Unicode
> value would be U+0090, which is an unspecified control character.

Oh, scratch that, apparently I can't do hex/octal arithmetic in my
head first thing in the morning.  It's really U+00C8 which is perfectly
valid.  I can't see a reason why that character and only that character
would be problematic --- have you done systematic testing to confirm
that that's the only should-be-LATIN1 character that fails?

            regards, tom lane


Re: JDBC to load UTF8@psql to latin1@mysql

From
Adrian Klaver
Date:
On 12/14/2012 07:35 AM, Tom Lane wrote:
> I wrote:
>> Meh.  That character renders as \310 in your mail, which is not an
>> assigned code in ISO 8859-1.  The numerically corresponding Unicode
>> value would be U+0090, which is an unspecified control character.
>
> Oh, scratch that, apparently I can't do hex/octal arithmetic in my
> head first thing in the morning.  It's really U+00C8 which is perfectly
> valid.  I can't see a reason why that character and only that character
> would be problematic --- have you done systematic testing to confirm
> that that's the only should-be-LATIN1 character that fails?

This is where I am confused, in one of the original posts the OP said:

"JAVA codes work for most of characters, but not "-È"."

>
>             regards, tom lane
>
>


--
Adrian Klaver
adrian.klaver@gmail.com


Re: JDBC to load UTF8@psql to latin1@mysql

From
Emi Lu
Date:
Hello All,
>> Meh.  That character renders as \310 in your mail, which is not an
>> assigned code in ISO 8859-1.  The numerically corresponding Unicode
>> value would be U+0090, which is an unspecified control character.
>
> Oh, scratch that, apparently I can't do hex/octal arithmetic in my
> head first thing in the morning.  It's really U+00C8 which is perfectly
> valid.  I can't see a reason why that character and only that character
> would be problematic --- have you done systematic testing to confirm
> that that's the only should-be-LATIN1 character that fails?

Finally, the problem is resolved:

SHOW VARIABLES LIKE "character\_set\_%";
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| character_set_client     | latin1 |
| character_set_connection | latin1 |
| character_set_database   | latin1 |
| character_set_filesystem | binary |
| character_set_results    | latin1 |
| character_set_server     | latin1 |
| character_set_system     | utf8   | -- here mysql uses utf8 for
character_set_system.

Change my java code to:
========================
public static String utf8_to_mysql(String str)
    throws Exception
    {
       try
       {
          byte[] convertStringToByte = str.getBytes("UTF-8");
          str                        = new String(convertStringToByte,
"UTF-8");
          return str;
       }catch(Exception e)
       {
          log.error("utf8_to_latin1 Error: " + e.getMessage());
          log.error(e);
          throw e;
       }

Have to explicitly specify "UTF-8", but cannot leave as empty.

Larry's comments(from MyBatis mailing list) and I tried both "from/to"
by "UTF8". It works. This is still little bit strange to me. But it works!

 >> My guess is that it's correct but the client you're using is messing
 >> it up. If not, then you need to look at your connection strings to
 >> the 2 databases to make sure they are handling the encodings
 >> correctly.Unless you set them specifically, I suspect they are using
 >> your default system encoding - so both may be using utf8 or iso8859.

Thank you very much for all of your help for this!
Emi



Re: JDBC to load UTF8@psql to latin1@mysql

From
Emi Lu
Date:
On 12/14/2012 01:37 PM, Emi Lu wrote:
> Hello All,
>>> Meh.  That character renders as \310 in your mail, which is not an
>>> assigned code in ISO 8859-1.  The numerically corresponding Unicode
>>> value would be U+0090, which is an unspecified control character.
>>
>> Oh, scratch that, apparently I can't do hex/octal arithmetic in my
>> head first thing in the morning.  It's really U+00C8 which is perfectly
>> valid.  I can't see a reason why that character and only that character
>> would be problematic --- have you done systematic testing to confirm
>> that that's the only should-be-LATIN1 character that fails?
>
> Finally, the problem is resolved:
>
> SHOW VARIABLES LIKE "character\_set\_%";
> +--------------------------+--------+
> | Variable_name            | Value  |
> +--------------------------+--------+
> | character_set_client     | latin1 |
> | character_set_connection | latin1 |
> | character_set_database   | latin1 |
> | character_set_filesystem | binary |
> | character_set_results    | latin1 |
> | character_set_server     | latin1 |
> | character_set_system     | utf8   | -- here mysql uses utf8 for
> character_set_system.
Another try is that if I change my client tool encoding set, I do not
even need my java transition. All right, good to learn from this.

Emi