Thread: How to have ant's task insert special chars appropriately?

How to have ant's task insert special chars appropriately?

From
agostonbejo
Date:

Hi!

What I'm trying to do is to insert some data from a sql file into a postgres
DB by calling the <sql> ant task. My problem is that I can't get special
characters (even if they can be represented by the standard ASCII charset,
such as ä, ö, ü, é, etc.) to be inserted correctly.

When I check the DB with a DB browser such as Squirrel or pgAdmin, after
running the <sql> task they can only see question marks where these
characters are supposed to be. So does the actual web application using the
DB. (The web app can store and read such special characters seamlessly,
though. It's also OK if I enter some of them by hand with the DB browsers
mentioned above.)

This all suggests that the problem is with the way I call the <sql> task.
I've tried several versions: inserting from a sql file with iso-8859-1
encoding, then from one with utf-8 encoding. (In pgAdmin I can see the
default encoding of the DB is utf-8.)

Something like this:

<sql
    encoding="ISO-8859-1"
    classpath="${db.pg.classpath}"
    driver="${db.pg.driver}"
    url="${db.pg.connection}/${db.database.name}"
    userid="${db.schema.owner.name}"
    password="${db.schema.owner.pwd}"
    src="${sql.pg.dir}/update/testdata.sql"
/>

(This works perfectly as long as there are no special characters.)

I tried changing the "encoding" attribute above to UTF-8, adding "SET
client_encoding = 'UTF8';" to the UTF-8 file, then to the iso-8859-1 file;
"SET client_encoding = 'LATIN1';" to the iso-8859-1 file (this latter in
addition produced the error "The server's client_encoding parameter was
changed to LATIN1. The JDBC driver requires client_encoding to be UNICODE
for correct operation.")

Does anyone have any idea how to get those special characters loaded into
the DB from a pre-written sql file (possibly with the ant <sql> task)?


Thanks,
Agoston
--
View this message in context:
http://www.nabble.com/How-to-have-ant%27s-%3Csql%3E-task-insert-special-chars-appropriately--tp25530663p25530663.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: How to have ant's task insert special chars appropriately?

From
Richard Huxton
Date:
agostonbejo wrote:
>
> Hi!
>
> What I'm trying to do is to insert some data from a sql file into a postgres
> DB by calling the <sql> ant task. My problem is that I can't get special
> characters (even if they can be represented by the standard ASCII charset,
> such as ä, ö, ü, é, etc.) to be inserted correctly.

Those aren't ASCII.

> When I check the DB with a DB browser such as Squirrel or pgAdmin, after
> running the <sql> task they can only see question marks where these
> characters are supposed to be. So does the actual web application using the
> DB. (The web app can store and read such special characters seamlessly,
> though. It's also OK if I enter some of them by hand with the DB browsers
> mentioned above.)
[snip]
> Does anyone have any idea how to get those special characters loaded into
> the DB from a pre-written sql file (possibly with the ant <sql> task)?

There are three places you need to get this right:
1. The database encoding
2. The client encoding
3. The encoding of the contents of the .sql file

Now, since the database is UTF8 that means it can accept the entire
range of unicode characters, including all ISO-8859-1.

PostgreSQL can automatically convert from ISO-8859-1 to UTF-8 for you,
so it doesn't matter which you have in your .sql file.

What *does* matter is that you know what encoding your .sql file is
using and that you set the client encoding appropriately.

Since you're using Java, it's probably simplest just to use UTF-8 all
the way through. Crucially, make sure you know what the character-set of
the .sql file is - any good text editor should be able to tell you / set
 this.

--
  Richard Huxton
  Archonet Ltd

Re: How to have ant's task insert special chars appropriately?

From
"J. Roeleveld"
Date:
On Wednesday 23 September 2009 11:29:02 Richard Huxton wrote:
> agostonbejo wrote:
> > Hi!
> >
> > What I'm trying to do is to insert some data from a sql file into a
> > postgres DB by calling the <sql> ant task. My problem is that I can't get
> > special characters (even if they can be represented by the standard ASCII
> > charset, such as ä, ö, ü, é, etc.) to be inserted correctly.
>
> Those aren't ASCII.

Actually, they are:
ä = 132
ö = 148
ü = 129
é = 130
See: http://www.ascii.nl
Site is in dutch, but these characters are in the table headed "Extended ASCII
Codes"

But for this to work, the source-file needs to be created using the ASCII
codepage. Not some extension to it.

--
Joost

Re: How to have ant's task insert special chars appropriately?

From
Richard Huxton
Date:
J. Roeleveld wrote:
> On Wednesday 23 September 2009 11:29:02 Richard Huxton wrote:
>> agostonbejo wrote:
>>> Hi!
>>>
>>> What I'm trying to do is to insert some data from a sql file into a
>>> postgres DB by calling the <sql> ant task. My problem is that I can't get
>>> special characters (even if they can be represented by the standard ASCII
>>> charset, such as ä, ö, ü, é, etc.) to be inserted correctly.
>> Those aren't ASCII.
>
> Actually, they are:

Sorry, but actually, they aren't. ASCII is 7-bits only and doesn't
include those accented characters.

> ä = 132
> ö = 148
> ü = 129
> é = 130
> See: http://www.ascii.nl
> Site is in dutch, but these characters are in the table headed "Extended ASCII
> Codes"

To the best of my knowledge there is no standard list of "extended ascii
codes" - there are many different extensions. See all the various
codepages on Windows/IBM, original Mac character set, the various 8-bit
era computers etc.

> But for this to work, the source-file needs to be created using the ASCII
> codepage. Not some extension to it.

Not sure what this last bit means.

--
  Richard Huxton
  Archonet Ltd

Re: How to have ant's task insert special chars appropriately?

From
Tom Lane
Date:
Richard Huxton <dev@archonet.com> writes:
> J. Roeleveld wrote:
>> Site is in dutch, but these characters are in the table headed "Extended ASCII
>> Codes"

> To the best of my knowledge there is no standard list of "extended ascii
> codes" -

Oh, there are plenty of them ;-).  The OP's character set is most
likely ISO-8859-1 or one of the other ISO-8859-n extensions to ASCII.

            regards, tom lane

Re: How to have ant's task insert special chars appropriately?

From
agostonbejo
Date:


Hi Richard,

thanks for the answer! Nevertheless, see below... ;)

> Richard Huxton wrote:
> agostonbejo wrote:
> >
> > Hi!
> >
> > What I'm trying to do is to insert some data from a sql file into a
> postgres
> > DB by calling the <sql> ant task. My problem is that I can't get special
> > characters (even if they can be represented by the standard ASCII
> charset,
> > such as ä, ö, ü, é, etc.) to be inserted correctly.
>
> Those aren't ASCII.

OK, probably my idea of what ASCII is is a bit too vague: by ASCII I simply
meant the ISO-8859-1 charset. (Which might make further discussions about
what exactly belongs to ASCII unnecessary...?)

Eclipse (the editor which I'm using) says that the original SQL file's
encoding is ISO-8859-1, the special characters are shown correctly, also in
other text editors.


>
> There are three places you need to get this right:
> 1. The database encoding
> 2. The client encoding
> 3. The encoding of the contents of the .sql file
>
> Now, since the database is UTF8 that means it can accept the entire
> range of unicode characters, including all ISO-8859-1.
>
> PostgreSQL can automatically convert from ISO-8859-1 to UTF-8 for you,
> so it doesn't matter which you have in your .sql file.
>
> What *does* matter is that you know what encoding your .sql file is
> using and that you set the client encoding appropriately.

How do I set the client encoding to ISO-8859-1? As I wrote, the <sql> task
complains if I set the client encoding to LATIN1 (which is the PostGres
equivalent of ISO-8859-1 if I'm right) that the JDBC driver is not going to
like it. (And so it seems indeed.)

>
> Since you're using Java, it's probably simplest just to use UTF-8 all
> the way through. Crucially, make sure you know what the character-set of
> the .sql file is - any good text editor should be able to tell you / set
>  this.

As I wrote in my original post, I *have* tried using UTF-8 "all the way
through" by converting the original ISO-8859-1 file to UTF-8 and calling the
<sql> task with 'encoding="UTF-8"'. It didn't help, the special characters
still became question marks. I've also set the client_encoding parameter in
the sql file explicitly and I know, i.e., pgAdmin tells me the DB's encoding
is UTF-8. (And it should be right, since *that* is able to insert special
characters)

So, to my best knowledge I got it right on all three places, and it still
doesn't work. That's why I opened the topic in the first place.


So any other idea what can be wrong here?

Thanks!
Agoston


>
> --
>   Richard Huxton
>   Archonet Ltd
>

--
View this message in context:
http://www.nabble.com/How-to-have-ant%27s-%3Csql%3E-task-insert-special-chars-appropriately--tp25530663p25577683.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: How to have ant's task insert special chars appropriately?

From
Richard Huxton
Date:
agostonbejo wrote:
>
>
> Hi Richard,
>
> thanks for the answer! Nevertheless, see below... ;)
>
>> Richard Huxton wrote:
>> agostonbejo wrote:
>>> Hi!
>>>
>>> What I'm trying to do is to insert some data from a sql file into a
>> postgres
>>> DB by calling the <sql> ant task. My problem is that I can't get special
>>> characters (even if they can be represented by the standard ASCII
>> charset,
>>> such as ä, ö, ü, é, etc.) to be inserted correctly.
>> Those aren't ASCII.
>
> OK, probably my idea of what ASCII is is a bit too vague: by ASCII I simply
> meant the ISO-8859-1 charset. (Which might make further discussions about
> what exactly belongs to ASCII unnecessary...?)
>
> Eclipse (the editor which I'm using) says that the original SQL file's
> encoding is ISO-8859-1, the special characters are shown correctly, also in
> other text editors.

OK.

>> There are three places you need to get this right:
>> 1. The database encoding
>> 2. The client encoding
>> 3. The encoding of the contents of the .sql file
>>
>> Now, since the database is UTF8 that means it can accept the entire
>> range of unicode characters, including all ISO-8859-1.
>>
>> PostgreSQL can automatically convert from ISO-8859-1 to UTF-8 for you,
>> so it doesn't matter which you have in your .sql file.
>>
>> What *does* matter is that you know what encoding your .sql file is
>> using and that you set the client encoding appropriately.
>
> How do I set the client encoding to ISO-8859-1? As I wrote, the <sql> task
> complains if I set the client encoding to LATIN1 (which is the PostGres
> equivalent of ISO-8859-1 if I'm right) that the JDBC driver is not going to
> like it. (And so it seems indeed.)

Correct LATIN1 == ISO-8859-1. Can't help with the JDBC.

>> Since you're using Java, it's probably simplest just to use UTF-8 all
>> the way through. Crucially, make sure you know what the character-set of
>> the .sql file is - any good text editor should be able to tell you / set
>>  this.
>
> As I wrote in my original post, I *have* tried using UTF-8 "all the way
> through" by converting the original ISO-8859-1 file to UTF-8 and calling the
> <sql> task with 'encoding="UTF-8"'. It didn't help, the special characters
> still became question marks. I've also set the client_encoding parameter in
> the sql file explicitly and I know, i.e., pgAdmin tells me the DB's encoding
> is UTF-8. (And it should be right, since *that* is able to insert special
> characters)
>
> So, to my best knowledge I got it right on all three places, and it still
> doesn't work. That's why I opened the topic in the first place.

Check again - something isn't right. Take the original ,sql file, save
it as UTF8 and add a line at the top "set client_encoding=utf8;"

Run this through psql and it should work fine. If not, then the database
isn't in utf8 after all.

Assuming it works, then something in your java setup isn't correct.

--
  Richard Huxton
  Archonet Ltd

Re: How to have ant's task insert special chars appropriately?

From
agostonbejo
Date:

Thanks, you were right - actually the problem was somewhere else. During the
build I had the sql files copied over to another location, feeding them to
ant's <filterchain> -- and it was there actually where the encodings got
mixed up.

(Needless to say, I was looking at the original file and thought it was the
<sql> or PostGres that messed something up.)

Anyway, after correcly copying over the file, the one with the ISO-8859-1
could be used perfectly. (I suppose the UTF-8 could have as well, but I
didn't try it once this one worked.)


Thanks,
Agoston



Richard Huxton wrote:
>
> Check again - something isn't right. Take the original ,sql file, save
> it as UTF8 and add a line at the top "set client_encoding=utf8;"
>
> Run this through psql and it should work fine. If not, then the database
> isn't in utf8 after all.
>
> Assuming it works, then something in your java setup isn't correct.
>
> --
>   Richard Huxton
>   Archonet Ltd
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>

--
View this message in context:
http://www.nabble.com/How-to-have-ant%27s-%3Csql%3E-task-insert-special-chars-appropriately--tp25530663p25748884.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.