Thread: Encoding nightmare! Pls help!
I've had a discussion on the general list about the implications for storing accented characters within a postgres (7.4.1) db. As a result, I have created a database with no locale = C locale (using initdb {other parms} --no-locale) Here's the database (test) with UNICODE encoding List of databases Name | Owner | Encoding --------------+----------+---------- test | postgres | UNICODE template0 | postgres | UNICODE template1 | postgres | UNICODE The problem I'm having is that I CANNOT write accented characters into the database or get them out correctly using my java code and the pg74.1jdbc3.jar file. Using psql, I select the data (with client encoding = UNICODE), and I get tést.jpg With client encoding = LATIN1, I get tést.jpg But in my little java test app, I get: tést.jpg, tést.jpg, tést.jpg, I want tést.jpg!!!!! Here is the offending section of code: String filename = rset.getString(2); System.out.print(filename); System.out.print(", "); if (filename != null) { try { filename = new String(rset.getBytes(2), "UTF-8"); } catch (UnsupportedEncodingException e) { System.out.println("Cannot decode string?"); } System.out.print(filename); System.out.print(", "); try { filename = new String(rset.getBytes(2), "ISO-8859-1"); } catch (UnsupportedEncodingException e) { System.out.println("Cannot decode string?"); } System.out.print(filename); System.out.print(", "); } Can anyone explain what I am doing wrong? I have become so confused by all this, that I don't think I can see the problem straight anymore. How can I read and write unicode chars into the db. Is there some magick parameter that needs to be passed when setting up the connection/driver? Thanks for any/all help!! John Sidney-Woollett
John, My first guess is that your code is working fine, it is just your System.out.print() calls that are the problem. You haven't specified what character set to use when printing, so it will use the default character set for your jvm. This is necessary since java strings are stored internally in ucs2 and java needs to convert to some other character set when printing them out. thanks, --Barry John Sidney-Woollett wrote: > I've had a discussion on the general list about the implications for > storing accented characters within a postgres (7.4.1) db. > > As a result, I have created a database with no locale = C locale (using > initdb {other parms} --no-locale) > > Here's the database (test) with UNICODE encoding > > List of databases > Name | Owner | Encoding > --------------+----------+---------- > test | postgres | UNICODE > template0 | postgres | UNICODE > template1 | postgres | UNICODE > > The problem I'm having is that I CANNOT write accented characters into the > database or get them out correctly using my java code and the > pg74.1jdbc3.jar file. > > Using psql, I select the data (with client encoding = UNICODE), and I get > > tést.jpg > > With client encoding = LATIN1, I get > > tést.jpg > > But in my little java test app, I get: > > tést.jpg, tést.jpg, tést.jpg, > > I want tést.jpg!!!!! > > Here is the offending section of code: > > String filename = rset.getString(2); > System.out.print(filename); > System.out.print(", "); > > if (filename != null) > { > try > { > filename = new String(rset.getBytes(2), "UTF-8"); > } > catch (UnsupportedEncodingException e) > { > System.out.println("Cannot decode string?"); > } > > System.out.print(filename); > System.out.print(", "); > > try > { > filename = new String(rset.getBytes(2), "ISO-8859-1"); > } > catch (UnsupportedEncodingException e) > { > System.out.println("Cannot decode string?"); > } > > System.out.print(filename); > System.out.print(", "); > } > > Can anyone explain what I am doing wrong? I have become so confused by all > this, that I don't think I can see the problem straight anymore. > > How can I read and write unicode chars into the db. Is there some magick > parameter that needs to be passed when setting up the connection/driver? > > Thanks for any/all help!! > > John Sidney-Woollett > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
On 02/02/2004 18:44 John Sidney-Woollett wrote: > [snip] > Can anyone explain what I am doing wrong? I have become so confused by > all > this, that I don't think I can see the problem straight anymore. > > How can I read and write unicode chars into the db. Is there some magick > parameter that needs to be passed when setting up the connection/driver? > > Thanks for any/all help!! > > John Sidney-Woollett A total shot in the dark but what have you got your JVM Locale/encoding set to? I've not had the need to delve too deeply into this area but this link might be of some help: http://java.sun.com/developer/JDCTechTips/2003/tt0110.html -- Paul Thomas +------------------------------+---------------------------------------------+ | Thomas Micro Systems Limited | Software Solutions for the Smaller Business | | Computer Consultants | http://www.thomas-micro-systems-ltd.co.uk | +------------------------------+---------------------------------------------+
Paul, thanks for the link - I'll have a read. I've tried (probably incorrectly) setting various JVM locales/encodings, and I still get the same rubbish output. Perhaps a good night's sleep will help! John Paul Thomas said: > > On 02/02/2004 18:44 John Sidney-Woollett wrote: >> [snip] >> Can anyone explain what I am doing wrong? I have become so confused by >> all >> this, that I don't think I can see the problem straight anymore. >> >> How can I read and write unicode chars into the db. Is there some magick >> parameter that needs to be passed when setting up the connection/driver? >> >> Thanks for any/all help!! >> >> John Sidney-Woollett > > A total shot in the dark but what have you got your JVM Locale/encoding > set to? I've not had the need to delve too deeply into this area but this > link might be of some help: > > http://java.sun.com/developer/JDCTechTips/2003/tt0110.html > > -- > Paul Thomas > +------------------------------+---------------------------------------------+ > | Thomas Micro Systems Limited | Software Solutions for the Smaller > Business | > | Computer Consultants | > http://www.thomas-micro-systems-ltd.co.uk | > +------------------------------+---------------------------------------------+ >
Thanks for your reply - but I'm still totally baffled and confused... I've got JSP pages printing tést.jpg as tést.jpg even though I have set the content type to "text/html", 'UTF-8'. Also, I've got servlets writing garbage into the database even though I have explicitly set the encoding on the request object to UTF-8 - but this may because I don't really know what encoding scheme the browser has used. It's been a long day... Thanks for your help. John Barry Lind said: > John, > > My first guess is that your code is working fine, it is just your > System.out.print() calls that are the problem. You haven't specified > what character set to use when printing, so it will use the default > character set for your jvm. This is necessary since java strings are > stored internally in ucs2 and java needs to convert to some other > character set when printing them out. > > thanks, > --Barry > > John Sidney-Woollett wrote: >> I've had a discussion on the general list about the implications for >> storing accented characters within a postgres (7.4.1) db. >> >> As a result, I have created a database with no locale = C locale (using >> initdb {other parms} --no-locale) >> >> Here's the database (test) with UNICODE encoding >> >> List of databases >> Name | Owner | Encoding >> --------------+----------+---------- >> test | postgres | UNICODE >> template0 | postgres | UNICODE >> template1 | postgres | UNICODE >> >> The problem I'm having is that I CANNOT write accented characters into >> the >> database or get them out correctly using my java code and the >> pg74.1jdbc3.jar file. >> >> Using psql, I select the data (with client encoding = UNICODE), and I >> get >> >> tést.jpg >> >> With client encoding = LATIN1, I get >> >> tést.jpg >> >> But in my little java test app, I get: >> >> tést.jpg, tést.jpg, tést.jpg, >> >> I want tést.jpg!!!!! >> >> Here is the offending section of code: >> >> String filename = rset.getString(2); >> System.out.print(filename); >> System.out.print(", "); >> >> if (filename != null) >> { >> try >> { >> filename = new String(rset.getBytes(2), "UTF-8"); >> } >> catch (UnsupportedEncodingException e) >> { >> System.out.println("Cannot decode string?"); >> } >> >> System.out.print(filename); >> System.out.print(", "); >> >> try >> { >> filename = new String(rset.getBytes(2), "ISO-8859-1"); >> } >> catch (UnsupportedEncodingException e) >> { >> System.out.println("Cannot decode string?"); >> } >> >> System.out.print(filename); >> System.out.print(", "); >> } >> >> Can anyone explain what I am doing wrong? I have become so confused by >> all >> this, that I don't think I can see the problem straight anymore. >> >> How can I read and write unicode chars into the db. Is there some magick >> parameter that needs to be passed when setting up the connection/driver? >> >> Thanks for any/all help!! >> >> John Sidney-Woollett >> >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 3: if posting/reading through Usenet, please send an appropriate >> subscribe-nomail command to majordomo@postgresql.org so that your >> message can get through to the mailing list cleanly > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend >
OFF TOPIC: email postage should be of interest to those who use OSS newslists
From
"David Wall"
Date:
Sorry for this off topic posting, but it should be of interest to those in the OSS communities since it threatens us. The following story appeared in the New York Times as well as various local papers (like the one here in Seattle). http://www.nytimes.com/2004/02/02/technology/02spam.html The gist is that Microsoft, Yahoo and others are trying to create a scam in which they charge postage for email. Note that this would be used against all open source projects that rely heavily on free emails going out to developers and users. Note that spam filters in the big ISPs will only be made more restrictive in order to increase utilization of the postage scam. After all, nearly every email sent arrives at its destination today, so nobody will pay. But as they tighten the rules, more legit email will get blocked as spam, thus forcing us into paying for postage that provides no added services, and of course would cripple OSS projects that rely on email. Below is my letter to the editor of the NYT and a few quotes from the article for those who aren't registered on the NYT site. Thanks, David ++++ Re: "Gates Backs E-Mail Stamp in War on Spam," http://www.nytimes.com/2004/02/02/technology/02spam.html Dear Mr. Hansell: Postage for sending email? That sounds like a greedy attempt to charge twice without providing any added services. Our tax dollars paid to create the Internet and the communications protocols freely used by Microsoft and Yahoo. But now there's a new land grab to try to privatize our public, worldwide network that promotes freedom. Our monthly ISP and telephone fees already pay for our usage of the Internet. Now we're told that for our own good, we should pay for each email sent, even though we've already paid for that privilege. Open source projects rely heavily on email for developer communications and for user support. Is it surprising that Microsoft likes such a scheme? Will we next have to pay for each instant message or each web page we visit? A typical email, like this one, is about 2 KB in size. Today's MSN homepage is 100 KB, with lots of unsolicited ads. Unsolicited popup ads often run in the 13-20KB range. Should Microsoft have to pay me to view these ads? Their web page consumes 50 times more bandwidth than this email. Lastly, there are commercials services today like Yozons.com that charge for sending secure messages that have no spam or viruses, but at least they offer lots of features beyond email (working return receipts, encrypted delivery to ensure privacy, electronic signatures, status tracking of messages sent, etc.) so many find it worth the extra money spent. We'll pay for services we want, but paying twice for no added service is bad all around. Sincerely, David Wall +++++ Some quotes from the NYT story, since I realize I cannot post the entire story here for copyright reasons: "Ten days ago, Bill Gates, Microsoft's chairman, told the World Economic Forum in Davos, Switzerland, that spam would not be a problem in two years, in part because of systems that would require people to pay money to send e-mail. Yahoo, meanwhile, is quietly evaluating an e-mail postage plan being developed by Goodmail, a Silicon Valley start-up company." ""Damn if I will pay postage for my nice list," said David Farber, a professor at Carnegie Mellon University, who runs a mailing list on technology and policy with 30,000 recipients. He said electronic postage systems are likely to be too complex and would charge noncommercial users who should be able to send e-mail free." "But for the big Internet access providers, or I.S.P.'s, the prospect of e-mail postage creating a new revenue stream that could help offset the cost of their e-mail systems is undeniably attractive"