On Tuesday 04 October 2005 16:16, Jeremy LaCivita wrote:
> Hmmm
>
> so it turns out if i take all my Strings and do this:
>
> str = new String(str.getBytes(), "utf-8");
>
> then it works.
>
> Correct me if i'm wrong, but that says to me that the Strings were
> in UTF-8 already, but Java didn't know it, so it couldn't send them
> to postgres properly.
It's meaningless to ask what encoding a String has. String are
sequence of chars -- they don't have an encoding. The notion of
"encoding" comes into play only when you have to represent a String as
a sequence of bytes.
So, if this returns true for you:
str.equals(new String(str.getBytes(), "utf-8"));
that means your default encoding is either utf-8 or a subset of utf-8,
at least for the characters found in str.
String#getBytes() uses the default encoding which may be specified via
the environment variable LANG on on Unix-like systems.
So, if my default encoding is UTF-8, I get this:
| $ echo $LANG
| en_US.UTF-8
| $ bsh2
| BeanShell 2.0-0.b1.7jpp - by Pat Niemeyer (pat@pat.net)
| bsh % print(System.getProperty("file.encoding"));
| UTF-8
| bsh % str = "Funny char: \u00e8";
| bsh % print(str);
| Funny char: è
| bsh % print(str.equals(new String(str.getBytes(), "utf-8")));
| true
| bsh %
If I change the default encoding to ISO-8859-1, I get this:
| $ env LANG=en_US.iso88591 bsh2
| BeanShell 2.0-0.b1.7jpp - by Pat Niemeyer (pat@pat.net)
| bsh % print(System.getProperty("file.encoding"));
| ISO-8859-1
| bsh % str = "Funny char: \u00e8";
| bsh % print(str);
| Funny char: è
| bsh % print(str.equals(new String(str.getBytes(), "utf-8")));
| false
| bsh %