Thread: ASCII to UNICODE conversion problems

ASCII to UNICODE conversion problems

From
"Ronald Gallagher"
Date:

Hi all,

 

I am using the pg74.215.jdbc3.jar JDBC driver to connect to a PostgreSQL 7.4 database.  I am getting the common error:

 

Invalid character data was found. This is most likely caused by stored data containing characters that are invalid for the character set the database was created in. The most common example of this is storing 8bit data in a SQL_ASCII database.

 

I have read a lot online about how I need to convert my ASCII database into UNICODE but I haven’t been able to successfully do that.  I have dumped the DB and then added SET CLIENT_ENCODING TO 'UNICODE'; to the top of the dump file and then tried to restore it to a UNICODE encoded database.  When I tried this only some of the tables (I assume the ones without special characters) got imported, but many tables had zero rows. 

 

Can someone please tell me how to convert an ASCII DB into a UNICODE DB so hopefully the JDBC driver will actually work?  I also read about a patch for the JDBC driver.  Where can I get this patch and does it work?

 

Thanks in advance for all your help,

Ron

 

 

Re: ASCII to UNICODE conversion problems

From
Kris Jurka
Date:

On Wed, 11 May 2005, Ronald Gallagher wrote:

> I have read a lot online about how I need to convert my ASCII database
> into UNICODE but I haven't been able to successfully do that.  I have
> dumped the DB and then added SET CLIENT_ENCODING TO 'UNICODE'; to the
> top of the dump file and then tried to restore it to a UNICODE encoded
> database.  When I tried this only some of the tables (I assume the ones
> without special characters) got imported, but many tables had zero rows.

Basically the JDBC driver was detecting the fact that your data is not
unicode.  You've put this into a file and labeled it unicode (via SET
client_encoding), so now the server is complaining that your data is not
unicode.

You need to determine what encoding your data is actually in and convert
it to unicode with something like iconv.

> I also read about a patch for the JDBC driver.  Where can I get this
> patch and does it work?

Any patch dealing with encodings is going to require you to know what
encoding your data really is, so I don't specifically know what you are
referring to, I do know it is not going to magically solve your problems.

Kris Jurka


Re: ASCII to UNICODE conversion problems

From
Markus Schaber
Date:
Hi, Ronald,

Kris Jurka wrote:

>>I have read a lot online about how I need to convert my ASCII database
>>into UNICODE but I haven't been able to successfully do that.  I have
>>dumped the DB and then added SET CLIENT_ENCODING TO 'UNICODE'; to the
>>top of the dump file and then tried to restore it to a UNICODE encoded
>>database.  When I tried this only some of the tables (I assume the ones
>>without special characters) got imported, but many tables had zero rows.
> Basically the JDBC driver was detecting the fact that your data is not
> unicode.  You've put this into a file and labeled it unicode (via SET
> client_encoding), so now the server is complaining that your data is not
> unicode.
>
> You need to determine what encoding your data is actually in and convert
> it to unicode with something like iconv.

Another possibility is to create the database in unicode (or any
encoding that can store your data), and put "SET CLIENT_ENCODING TO
'your-real-encoding';" at the top of your dump file. This way PostgreSQL
will convert your data from your-real-encoding to the database encoding
on dump load, an then to unicode with JDBC access.


HTH,
Markus