Thread: JDBC driver, client_encoding and a SQL_ASCII database in production
Hello All, I am using Pentaho Data Integration (PDI) to transform a database in production to some other database, correctly structured for my business intelligence analysis purpose. PDI uses the JDBC driver to perform the transformation. The issue is that JDBC only works with UTF-8 as client_encoding while the charset of the database in production is SQL_ASCII and was filled with ISO-8859-1 characters (and is full of characters such as é è ô ...). The only way I can get the correct string back is to use client_encoding='LATIN1'. Changing the character set of the original database is not an option as it is in production. Anyone has a good idea on how could I proceed to get correctly the content ? Regards, - Emmanuel -- Ingénieur étude et développement Intrinsec 215, avenue Georges Clemenceau 92000 Nanterre http://www.intrinsec.com Emmanuel GUITON Ingénieur développement Standard : +33 1 41 91 77 77 l Fax : +33 1 41 91 77 78 215, avenue Georges Clemenceau l 92024 NANTERRE http://www.intrinsec.com/img_site/content/20091216V2_Intrinsec_diversite.pdf http://www.intrinsec.com/
Emmanuel Guiton wrote: > Hello All, > > I am using Pentaho Data Integration (PDI) to transform a database in > production to some other database, correctly structured for my business > intelligence analysis purpose. > PDI uses the JDBC driver to perform the transformation. The issue is > that JDBC only works with UTF-8 as client_encoding while the charset of > the database in production is SQL_ASCII and was filled with ISO-8859-1 > characters (and is full of characters such as é è ô ...). The only way I > can get the correct string back is to use client_encoding='LATIN1'. > > Changing the character set of the original database is not an option as > it is in production. > > Anyone has a good idea on how could I proceed to get correctly the > content ? Take a copy of the production database, change the database encoding to be LATIN1, and do your conversion from that copy? -O
Oliver Jowett wrote: > Emmanuel Guiton wrote: > >> Hello All, >> >> I am using Pentaho Data Integration (PDI) to transform a database in >> production to some other database, correctly structured for my business >> intelligence analysis purpose. >> PDI uses the JDBC driver to perform the transformation. The issue is >> that JDBC only works with UTF-8 as client_encoding while the charset of >> the database in production is SQL_ASCII and was filled with ISO-8859-1 >> characters (and is full of characters such as é è ô ...). The only way I >> can get the correct string back is to use client_encoding='LATIN1'. >> >> Changing the character set of the original database is not an option as >> it is in production. >> >> Anyone has a good idea on how could I proceed to get correctly the >> content ? >> > > Take a copy of the production database, change the database encoding to > be LATIN1, and do your conversion from that copy? > > -O > Thanks for the idea, but the volume is too heavy and performance problems already too important on the original database to overload it with an additional daily dump. This is not just a one-shot issue. The analysis tool I am setting up is made to continually analyze the activity of my company. Would there be a way to get the binary content of text field ? Maybe that could be the solution, performing the encoding conversion at the application level, then. - Emmanuel -- Ingénieur étude et développement Intrinsec 215, avenue Georges Clemenceau 92000 Nanterre http://www.intrinsec.com Emmanuel GUITON Ingénieur développement Standard : +33 1 41 91 77 77 l Fax : +33 1 41 91 77 78 215, avenue Georges Clemenceau l 92024 NANTERRE http://www.intrinsec.com/img_site/content/20091216V2_Intrinsec_diversite.pdf http://www.intrinsec.com/
Emmanuel Guiton wrote: > Would there be a way to get the binary content of text field ? > Maybe that could be the solution, performing the encoding conversion at > the application level, then. That's not simple, the conversion from network data to String is done very early, well before the data gets anywhere near the application level. You may have to run with a modified driver that you have patched to understand encodings other than UTF-8. -O
On Thu, 11 Mar 2010, Oliver Jowett wrote: > You may have to run with a modified driver that you have patched to > understand encodings other than UTF-8. > The JDBC driver does support running with a non-UTF-8 encoding, but only for server versions prior to 7.3. There's no reason it couldn't work for later versions, so the easiest thing to do is to tweak the v2 protocol setup code to work for your server version and then use that. Kris Jurka