Thread: ASCII to UNICODE conversion problems
Hi all,
I am using the pg74.215.jdbc3.jar JDBC driver to connect to a PostgreSQL 7.4 database. I am getting the common error:
Invalid character data was found. This is most likely caused by stored data containing characters that are invalid for the character set the database was created in. The most common example of this is storing 8bit data in a SQL_ASCII database.
I have read a lot online about how I need to convert my ASCII database into UNICODE but I haven’t been able to successfully do that. I have dumped the DB and then added SET CLIENT_ENCODING TO 'UNICODE'; to the top of the dump file and then tried to restore it to a UNICODE encoded database. When I tried this only some of the tables (I assume the ones without special characters) got imported, but many tables had zero rows.
Can someone please tell me how to convert an ASCII DB into a UNICODE DB so hopefully the JDBC driver will actually work? I also read about a patch for the JDBC driver. Where can I get this patch and does it work?
Thanks in advance for all your help,
Ron
On Wed, 11 May 2005, Ronald Gallagher wrote: > I have read a lot online about how I need to convert my ASCII database > into UNICODE but I haven't been able to successfully do that. I have > dumped the DB and then added SET CLIENT_ENCODING TO 'UNICODE'; to the > top of the dump file and then tried to restore it to a UNICODE encoded > database. When I tried this only some of the tables (I assume the ones > without special characters) got imported, but many tables had zero rows. Basically the JDBC driver was detecting the fact that your data is not unicode. You've put this into a file and labeled it unicode (via SET client_encoding), so now the server is complaining that your data is not unicode. You need to determine what encoding your data is actually in and convert it to unicode with something like iconv. > I also read about a patch for the JDBC driver. Where can I get this > patch and does it work? Any patch dealing with encodings is going to require you to know what encoding your data really is, so I don't specifically know what you are referring to, I do know it is not going to magically solve your problems. Kris Jurka
Hi, Ronald, Kris Jurka wrote: >>I have read a lot online about how I need to convert my ASCII database >>into UNICODE but I haven't been able to successfully do that. I have >>dumped the DB and then added SET CLIENT_ENCODING TO 'UNICODE'; to the >>top of the dump file and then tried to restore it to a UNICODE encoded >>database. When I tried this only some of the tables (I assume the ones >>without special characters) got imported, but many tables had zero rows. > Basically the JDBC driver was detecting the fact that your data is not > unicode. You've put this into a file and labeled it unicode (via SET > client_encoding), so now the server is complaining that your data is not > unicode. > > You need to determine what encoding your data is actually in and convert > it to unicode with something like iconv. Another possibility is to create the database in unicode (or any encoding that can store your data), and put "SET CLIENT_ENCODING TO 'your-real-encoding';" at the top of your dump file. This way PostgreSQL will convert your data from your-real-encoding to the database encoding on dump load, an then to unicode with JDBC access. HTH, Markus