SOLUTION: Insert a Euro symbol as UTF-8 from a latin1 charset. - Mailing list pgsql-hackers

From Roland Glenn McIntosh
Subject SOLUTION: Insert a Euro symbol as UTF-8 from a latin1 charset.
Date
Msg-id 5.1.0.14.2.20030613112410.05ef2260@lnxmain
Whole thread Raw
Responses Re: SOLUTION: Insert a Euro symbol as UTF-8 from a latin1 charset.
Re: SOLUTION: Insert a Euro symbol as UTF-8 from a latin1 charset.
Re: SOLUTION: Insert a Euro symbol as UTF-8 from a latin1 charset.
List pgsql-hackers
This is my solution / bug report / RFC cross-posted from [GENERAL] regarding insertion of hexadecimal characters from
thecommand line.
 
-----------------------------------

Okay.  I have NO IDEA why this works.  If someone could enlighten me as to the math involved I'd appreciate it.  First,
alittle background:
 

The Euro symbol is unicode value 0x20AC.  UTF-8 encoding is a way of representing most unicode characters in two bytes,
andmost latin characters in one byte.
 

The only way I have found to insert a euro symbol into the database from the command line psql client is this:INSERT
INTOmytable VALUES('\342\202\254');
 

I don't know why this works.  In hex, those octal values are:E2 82 AC

I don't know why my "20" byte turned into two bytes of E2 and 82.  Furthermore, I was under the impression that a UTF-8
encodingof the Euro sign only took two bytes.  Corroborating this assumption, upon dumping that table with pg_dump and
examiningthe resultant file in a hex editor, I see this in that character position: AC 20
 

Additionally, according to the psql online documentation and man page:
"Anything contained in single quotes is furthermore subject to C-like substitutions for \n (new line), \t (tab),
\digits,\0digits, and \0xdigits (the character with the given decimal, octal, or hexadecimal code)."
 

Those digits *should* be interpreted as decimal digits, but they aren't.  The man page for psql is either incorrect, or
theimplementation is buggy.
 

I did try the '\0x20AC' method, and '\0x20\0xAC' without success.
It's worth noting that the field I'm inserting into is an SQL_ASCII field, and I'm reading my UTF-8 string out of it
likethis, via JDBC:String value = new String( resultset.getBytes(1), "UTF-8");
 

Can anyone help me make sense of this mumbo jumbo?
-Roland 



pgsql-hackers by date:

Previous
From: The Hermit Hacker
Date:
Subject: Re: Mirro updates
Next
From: ohp@pyrenet.fr
Date:
Subject: Re: Mirro updates