Thread: How to have ant's task insert special chars
appropriately?
Hi! What I'm trying to do is to insert some data from a sql file into a postgres DB by calling the <sql> ant task. My problem is that I can't get special characters (even if they can be represented by the standard ASCII charset, such as ä, ö, ü, é, etc.) to be inserted correctly. When I check the DB with a DB browser such as Squirrel or pgAdmin, after running the <sql> task they can only see question marks where these characters are supposed to be. So does the actual web application using the DB. (The web app can store and read such special characters seamlessly, though. It's also OK if I enter some of them by hand with the DB browsers mentioned above.) This all suggests that the problem is with the way I call the <sql> task. I've tried several versions: inserting from a sql file with iso-8859-1 encoding, then from one with utf-8 encoding. (In pgAdmin I can see the default encoding of the DB is utf-8.) Something like this: <sql encoding="ISO-8859-1" classpath="${db.pg.classpath}" driver="${db.pg.driver}" url="${db.pg.connection}/${db.database.name}" userid="${db.schema.owner.name}" password="${db.schema.owner.pwd}" src="${sql.pg.dir}/update/testdata.sql" /> (This works perfectly as long as there are no special characters.) I tried changing the "encoding" attribute above to UTF-8, adding "SET client_encoding = 'UTF8';" to the UTF-8 file, then to the iso-8859-1 file; "SET client_encoding = 'LATIN1';" to the iso-8859-1 file (this latter in addition produced the error "The server's client_encoding parameter was changed to LATIN1. The JDBC driver requires client_encoding to be UNICODE for correct operation.") Does anyone have any idea how to get those special characters loaded into the DB from a pre-written sql file (possibly with the ant <sql> task)? Thanks, Agoston -- View this message in context: http://www.nabble.com/How-to-have-ant%27s-%3Csql%3E-task-insert-special-chars-appropriately--tp25530663p25530663.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
agostonbejo wrote: > > Hi! > > What I'm trying to do is to insert some data from a sql file into a postgres > DB by calling the <sql> ant task. My problem is that I can't get special > characters (even if they can be represented by the standard ASCII charset, > such as ä, ö, ü, é, etc.) to be inserted correctly. Those aren't ASCII. > When I check the DB with a DB browser such as Squirrel or pgAdmin, after > running the <sql> task they can only see question marks where these > characters are supposed to be. So does the actual web application using the > DB. (The web app can store and read such special characters seamlessly, > though. It's also OK if I enter some of them by hand with the DB browsers > mentioned above.) [snip] > Does anyone have any idea how to get those special characters loaded into > the DB from a pre-written sql file (possibly with the ant <sql> task)? There are three places you need to get this right: 1. The database encoding 2. The client encoding 3. The encoding of the contents of the .sql file Now, since the database is UTF8 that means it can accept the entire range of unicode characters, including all ISO-8859-1. PostgreSQL can automatically convert from ISO-8859-1 to UTF-8 for you, so it doesn't matter which you have in your .sql file. What *does* matter is that you know what encoding your .sql file is using and that you set the client encoding appropriately. Since you're using Java, it's probably simplest just to use UTF-8 all the way through. Crucially, make sure you know what the character-set of the .sql file is - any good text editor should be able to tell you / set this. -- Richard Huxton Archonet Ltd
On Wednesday 23 September 2009 11:29:02 Richard Huxton wrote: > agostonbejo wrote: > > Hi! > > > > What I'm trying to do is to insert some data from a sql file into a > > postgres DB by calling the <sql> ant task. My problem is that I can't get > > special characters (even if they can be represented by the standard ASCII > > charset, such as ä, ö, ü, é, etc.) to be inserted correctly. > > Those aren't ASCII. Actually, they are: ä = 132 ö = 148 ü = 129 é = 130 See: http://www.ascii.nl Site is in dutch, but these characters are in the table headed "Extended ASCII Codes" But for this to work, the source-file needs to be created using the ASCII codepage. Not some extension to it. -- Joost
J. Roeleveld wrote: > On Wednesday 23 September 2009 11:29:02 Richard Huxton wrote: >> agostonbejo wrote: >>> Hi! >>> >>> What I'm trying to do is to insert some data from a sql file into a >>> postgres DB by calling the <sql> ant task. My problem is that I can't get >>> special characters (even if they can be represented by the standard ASCII >>> charset, such as ä, ö, ü, é, etc.) to be inserted correctly. >> Those aren't ASCII. > > Actually, they are: Sorry, but actually, they aren't. ASCII is 7-bits only and doesn't include those accented characters. > ä = 132 > ö = 148 > ü = 129 > é = 130 > See: http://www.ascii.nl > Site is in dutch, but these characters are in the table headed "Extended ASCII > Codes" To the best of my knowledge there is no standard list of "extended ascii codes" - there are many different extensions. See all the various codepages on Windows/IBM, original Mac character set, the various 8-bit era computers etc. > But for this to work, the source-file needs to be created using the ASCII > codepage. Not some extension to it. Not sure what this last bit means. -- Richard Huxton Archonet Ltd
Richard Huxton <dev@archonet.com> writes: > J. Roeleveld wrote: >> Site is in dutch, but these characters are in the table headed "Extended ASCII >> Codes" > To the best of my knowledge there is no standard list of "extended ascii > codes" - Oh, there are plenty of them ;-). The OP's character set is most likely ISO-8859-1 or one of the other ISO-8859-n extensions to ASCII. regards, tom lane
Hi Richard, thanks for the answer! Nevertheless, see below... ;) > Richard Huxton wrote: > agostonbejo wrote: > > > > Hi! > > > > What I'm trying to do is to insert some data from a sql file into a > postgres > > DB by calling the <sql> ant task. My problem is that I can't get special > > characters (even if they can be represented by the standard ASCII > charset, > > such as ä, ö, ü, é, etc.) to be inserted correctly. > > Those aren't ASCII. OK, probably my idea of what ASCII is is a bit too vague: by ASCII I simply meant the ISO-8859-1 charset. (Which might make further discussions about what exactly belongs to ASCII unnecessary...?) Eclipse (the editor which I'm using) says that the original SQL file's encoding is ISO-8859-1, the special characters are shown correctly, also in other text editors. > > There are three places you need to get this right: > 1. The database encoding > 2. The client encoding > 3. The encoding of the contents of the .sql file > > Now, since the database is UTF8 that means it can accept the entire > range of unicode characters, including all ISO-8859-1. > > PostgreSQL can automatically convert from ISO-8859-1 to UTF-8 for you, > so it doesn't matter which you have in your .sql file. > > What *does* matter is that you know what encoding your .sql file is > using and that you set the client encoding appropriately. How do I set the client encoding to ISO-8859-1? As I wrote, the <sql> task complains if I set the client encoding to LATIN1 (which is the PostGres equivalent of ISO-8859-1 if I'm right) that the JDBC driver is not going to like it. (And so it seems indeed.) > > Since you're using Java, it's probably simplest just to use UTF-8 all > the way through. Crucially, make sure you know what the character-set of > the .sql file is - any good text editor should be able to tell you / set > this. As I wrote in my original post, I *have* tried using UTF-8 "all the way through" by converting the original ISO-8859-1 file to UTF-8 and calling the <sql> task with 'encoding="UTF-8"'. It didn't help, the special characters still became question marks. I've also set the client_encoding parameter in the sql file explicitly and I know, i.e., pgAdmin tells me the DB's encoding is UTF-8. (And it should be right, since *that* is able to insert special characters) So, to my best knowledge I got it right on all three places, and it still doesn't work. That's why I opened the topic in the first place. So any other idea what can be wrong here? Thanks! Agoston > > -- > Richard Huxton > Archonet Ltd > -- View this message in context: http://www.nabble.com/How-to-have-ant%27s-%3Csql%3E-task-insert-special-chars-appropriately--tp25530663p25577683.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
agostonbejo wrote: > > > Hi Richard, > > thanks for the answer! Nevertheless, see below... ;) > >> Richard Huxton wrote: >> agostonbejo wrote: >>> Hi! >>> >>> What I'm trying to do is to insert some data from a sql file into a >> postgres >>> DB by calling the <sql> ant task. My problem is that I can't get special >>> characters (even if they can be represented by the standard ASCII >> charset, >>> such as ä, ö, ü, é, etc.) to be inserted correctly. >> Those aren't ASCII. > > OK, probably my idea of what ASCII is is a bit too vague: by ASCII I simply > meant the ISO-8859-1 charset. (Which might make further discussions about > what exactly belongs to ASCII unnecessary...?) > > Eclipse (the editor which I'm using) says that the original SQL file's > encoding is ISO-8859-1, the special characters are shown correctly, also in > other text editors. OK. >> There are three places you need to get this right: >> 1. The database encoding >> 2. The client encoding >> 3. The encoding of the contents of the .sql file >> >> Now, since the database is UTF8 that means it can accept the entire >> range of unicode characters, including all ISO-8859-1. >> >> PostgreSQL can automatically convert from ISO-8859-1 to UTF-8 for you, >> so it doesn't matter which you have in your .sql file. >> >> What *does* matter is that you know what encoding your .sql file is >> using and that you set the client encoding appropriately. > > How do I set the client encoding to ISO-8859-1? As I wrote, the <sql> task > complains if I set the client encoding to LATIN1 (which is the PostGres > equivalent of ISO-8859-1 if I'm right) that the JDBC driver is not going to > like it. (And so it seems indeed.) Correct LATIN1 == ISO-8859-1. Can't help with the JDBC. >> Since you're using Java, it's probably simplest just to use UTF-8 all >> the way through. Crucially, make sure you know what the character-set of >> the .sql file is - any good text editor should be able to tell you / set >> this. > > As I wrote in my original post, I *have* tried using UTF-8 "all the way > through" by converting the original ISO-8859-1 file to UTF-8 and calling the > <sql> task with 'encoding="UTF-8"'. It didn't help, the special characters > still became question marks. I've also set the client_encoding parameter in > the sql file explicitly and I know, i.e., pgAdmin tells me the DB's encoding > is UTF-8. (And it should be right, since *that* is able to insert special > characters) > > So, to my best knowledge I got it right on all three places, and it still > doesn't work. That's why I opened the topic in the first place. Check again - something isn't right. Take the original ,sql file, save it as UTF8 and add a line at the top "set client_encoding=utf8;" Run this through psql and it should work fine. If not, then the database isn't in utf8 after all. Assuming it works, then something in your java setup isn't correct. -- Richard Huxton Archonet Ltd
Thanks, you were right - actually the problem was somewhere else. During the build I had the sql files copied over to another location, feeding them to ant's <filterchain> -- and it was there actually where the encodings got mixed up. (Needless to say, I was looking at the original file and thought it was the <sql> or PostGres that messed something up.) Anyway, after correcly copying over the file, the one with the ISO-8859-1 could be used perfectly. (I suppose the UTF-8 could have as well, but I didn't try it once this one worked.) Thanks, Agoston Richard Huxton wrote: > > Check again - something isn't right. Take the original ,sql file, save > it as UTF8 and add a line at the top "set client_encoding=utf8;" > > Run this through psql and it should work fine. If not, then the database > isn't in utf8 after all. > > Assuming it works, then something in your java setup isn't correct. > > -- > Richard Huxton > Archonet Ltd > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > > -- View this message in context: http://www.nabble.com/How-to-have-ant%27s-%3Csql%3E-task-insert-special-chars-appropriately--tp25530663p25748884.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.