Thread: Mis-interpreted extended character
Our database ( (PostgreSQL) 7.3.5 ) uses Unicode encoding:
egreek=# \l
List of databases
Name | Owner | Encoding
-------------+----------+----------
egreek | andrew | UNICODE
For some reason, If I try to use an extended character (ASCII code > 127) in a string, I get this peculiar result:
egreek=# SELECT LOWER ( 'Niño' );
egreek'#
(note: it fails to see the closing single-quote character; I hit CTRL-C to quit the operation)
and:
egreek=# SELECT LOWER ( 'Niño ' );
ERROR: Unicode >= 0x10000 is not supported
(note: I added a space before the closing quote).
It looks to me like the 'ñ' character (spanish n with tilde - ASCII 164, as far as I can tell) is messing with the parser somehow.
Furthermore -- this is data coming from a form in a web site used globally, so I have little control over what is passed to me (me = PostgreSQL DB). I can't (without lotsa hassles) "demand" that higher-ASCII characters be encoded on the way in, and decoded on the way out.
Any suggestions/info/hints?
Thanks,
Andrew
egreek=# \l
List of databases
Name | Owner | Encoding
-------------+----------+----------
egreek | andrew | UNICODE
For some reason, If I try to use an extended character (ASCII code > 127) in a string, I get this peculiar result:
egreek=# SELECT LOWER ( 'Niño' );
egreek'#
(note: it fails to see the closing single-quote character; I hit CTRL-C to quit the operation)
and:
egreek=# SELECT LOWER ( 'Niño ' );
ERROR: Unicode >= 0x10000 is not supported
(note: I added a space before the closing quote).
It looks to me like the 'ñ' character (spanish n with tilde - ASCII 164, as far as I can tell) is messing with the parser somehow.
Furthermore -- this is data coming from a form in a web site used globally, so I have little control over what is passed to me (me = PostgreSQL DB). I can't (without lotsa hassles) "demand" that higher-ASCII characters be encoded on the way in, and decoded on the way out.
Any suggestions/info/hints?
Thanks,
Andrew
> Our database ( (PostgreSQL) 7.3.5 ) uses Unicode encoding: > [...] > For some reason, If I try to use an extended character (ASCII code > > 127) in a string, I get this peculiar result: > [...] Probably your terminal is set to ISO-8859-1 ("latin 1") or something like that, while your database is set to unicode as you showed. Hence the mismatch. In unicode (for example UTF-8) non-US-ASCII characters are encoded with two bytes (as opposed to one byte > 127 as happens with ISO-8859-1). Solution is to have everything agree on the encoding. Terminal + DB or Web Browser + DB. Btw. you _do_ actually have an influence on what encoding a web browser uses by setting the "encoding" HTTP header. According to my experience, if you have to deal with only western european encodings, you're better off (still) with ISO-8859-1 (or ISO-8859-15 to have the EUR symbol too). Short answer: not PostgreSQL's fault. Bye, Chris.
Chris,
Thanks -- your answer is 90% of what I need! As for the other 10%:
Chris wrote:
It ALSO turns out that Java has its own issues with >127 characters, which I'm going to look into -- but it was nice to prove you right (and solve half my problem!) by setting the web page encoding...
Thanks again,
Andrew
Thanks -- your answer is 90% of what I need! As for the other 10%:
Chris wrote:
Our database ( (PostgreSQL) 7.3.5 ) uses Unicode encoding: [...]For some reason, If I try to use an extended character (ASCII code > 127) in a string, I get this peculiar result:[...]Probably your terminal is set to ISO-8859-1 ("latin 1") or something like that, while your database is set to unicode as you showed.
Agreed -- and thanks for the above info.Hence the mismatch. In unicode (for example UTF-8) non-US-ASCII characters are encoded with two bytes (as opposed to one byte > 127 as happens with ISO-8859-1). Solution is to have everything agree on the encoding. Terminal + DB or Web Browser + DB. Btw. you _do_ actually have an influence on what encoding a web browser uses by setting the "encoding" HTTP header. According to my experience, if you have to deal with only western european encodings, you're better off (still) with ISO-8859-1 (or ISO-8859-15 to have the EUR symbol too). Short answer: not PostgreSQL's fault.
It ALSO turns out that Java has its own issues with >127 characters, which I'm going to look into -- but it was nice to prove you right (and solve half my problem!) by setting the web page encoding...
Bye, Chris. ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Thanks again,
Andrew