Thread: Reading £ character from DB is displaying œ character
Hi Everyone, I am trying to read to read a foreign character from database (Get *£*100) is giving some other character. I am using plain Java- JDBC connection. Just wondering if there is anything that needs to be done apart from the below? Below is the pseudo code: connection = DriverManager.getConnection( "jdbc:postgresql://127.0.0.1:5432/demodb?useUnicode=yes&characterEncoding=utf-8", "test", "test"); String sql = "select * from test_data"; ResultSet rs = stmt.executeQuery(sql); while(rs.next()){ System.out.println(rs.getString(1)) // Returns Get *œ*100 Expected : Get *£*100) } DB is UTF-8 Postgres jar version : postgresql-9.0-801.jdbc4.jar Appreciate if you could let me know if I am missing anything here? Thanks, Sai. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Reading-character-from-DB-is-displaying-character-tp5771897.html Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.
> -----Original Message----- > I am trying to read to read a foreign character from database (Get *£*100) is > giving some other character. I am using plain Java- JDBC connection. Just > wondering if there is anything that needs to be done apart from the below? > > Below is the pseudo code: > > connection = DriverManager.getConnection( > > "jdbc:postgresql://127.0.0.1:5432/demodb?useUnicode=yes&characterEnco > ding=utf-8", > "test", > "test"); > > String sql = "select * from test_data"; > ResultSet rs = stmt.executeQuery(sql); > while(rs.next()){ > > System.out.println(rs.getString(1)) // Returns Get *œ*100 What is the OS? What software is *displaying* the output? Is it an IDE? A terminal? If the terminal isn't UTF-8 capable,or there's some output encoding going on you could have perfectly valid data in the resultset and see it corruptedon output to the console.
OS where database is installed is Linux OS where the java program is trying to read is Windows7 Using Eclipse IDE Not sure if there is a problem with the Postgres JDBC driver or ResultSet.getString() not reading it correctly from the DB. Thanks, Sai -- View this message in context: http://postgresql.1045698.n5.nabble.com/Reading-character-from-DB-is-displaying-character-tp5771897p5772060.html Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.
On 9/23/2013 11:33 AM, saisantoshi wrote: > OS where database is installed is Linux > OS where the java program is trying to read is Windows7 > Using Eclipse IDE what was the client_encoding set to? Windows natively doesn't use UTF8, it uses UTF16, but console 'DOS' windows use DOS437 (the old CGA/VGA font) or some such silliness. it all gets very confusing. -- john r pierce 37N 122W somewhere on the middle of the left coast
client_encoding is set to "WIN1252" I am try to set to UTF8, Trying to set a £ character in the update statement is causing "invalid byte code sequence : 0x9x?" Additional information: server_encoding set to "UTF-8" as well. If I set the client_encoding to WIN1252, it saves £ character correctly into the database but the retrieval is a problem ( retrieves œ character) Thanks, Sai. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Reading-character-from-DB-is-displaying-character-tp5771897p5772093.html Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.
Here is the exact error when trying to set to UT8 and updating £ character. ERROR: invalid byte sequence for encoding "UTF8": 0x9c -- View this message in context: http://postgresql.1045698.n5.nabble.com/Reading-character-from-DB-is-displaying-character-tp5771897p5772099.html Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.
On 9/23/2013 1:58 PM, saisantoshi wrote: > Here is the exact error when trying to set to UT8 and updating £ character. > > ERROR: invalid byte sequence for encoding "UTF8": 0x9c WIn1252 is a Windows-specific variant of LATIN-1, an 8 bit character set. 0x9C is in fact the oe munged character in Win1252, and is an invalid character encoding in UTF8. the UK pound symbol is 0xA3 in Win1252, and in UTF8, a UK pound sign is: C2 A3, which is U+00A3 in UTF16/UCS2. Win1252 code table: http://en.wikipedia.org/wiki/Windows-1252#Code_page_layout partial UTF8 char table: http://www.utf8-chartable.de/ ahhhh. 0x9C is the pound symbol in CP437, the MSDOS CGA/EGA/VGA text-mode font. -- john r pierce 37N 122W somewhere on the middle of the left coast
OK... I tried saving the £ symbol into DB via java program as supposed to psql command line and it works fine in saving/retreiving from DB ( via java program) But the issue here is, when I run the sql commands, it does not show me the £ symbol in the returned query.. (some gibberish character) May be because DOS prompt default CP is 437. Not sure what do I need to do in order to see the £ symbol in the returned result set in running the queries or vewing the data in the database. testdb=> show server_encoding; server_encoding ----------------- UTF8 (1 row) testdb=> set client_encoding = 'UTF8'; SET testdb=> \encoding UTF8 Select text from test; *Get £500* // Actual ( Expected *Get £500* ) Not sure what the client_encoding and server_encoding does here? Thanks, Sai -- View this message in context: http://postgresql.1045698.n5.nabble.com/Reading-character-from-DB-is-displaying-character-tp5771897p5772120.html Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.
On 9/23/2013 5:27 PM, saisantoshi wrote: > Not sure what the client_encoding and server_encoding does here? the server encoding gives the character encoding that the data is stored in. the client encoding specifies the encoding used for user input/output, so user data in SQL commands will be converted from client_encoding to server_encoding when its written, and coverted back to the current client_encoding when read. if you're running psql in a windows command shell, try setting the client_encoding to DOS437 or CP437 (I'm not sure what the correct notation is for this) and do your SELECT ... Java itself uses UCS2/UTF16 internally, so I don't know what the right way of displaying text from a java program, it proably depends on whether you're displaying it on a text mode "DOS" window, or in a GUI window, or on a html page. -- john r pierce 37N 122W somewhere on the middle of the left coast
Here is what I get: testdb=> set client_encoding='DOS437'; ERROR: invalid value for parameter "client_encoding": "DOS437" testdb=> set client_encoding='CP437'; ERROR: invalid value for parameter "client_encoding": "CP437" The other question I have is if I am setting the server_encoding/client_encodng to UTF8, then saving the £ character into DB should save £ and be able to see the same in the table. I am still not clear as to why it gets gibbersh characters... its much confusing as to believe if it saved the right characters. Not sure what's missing peice here?? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Reading-character-from-DB-is-displaying-character-tp5771897p5772142.html Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.
How do you determine that you "get gibberish characters"? Are you printing the string, or inspecting the individual charactersin memory using a debugger? The gibberish is probably the result of the way the text is displayed, not a problemwith the database or the JDBC. You need to correctly encode the string as bytes before sending it to System.out. "correctly" depends on the encoding expected by your terminal (cmd.exe, the console of an IDE, etc.). This answer on StackOverflow should be helpful; it is equally applicable to the original question of JSON as it is to yourquestion of JDBC. http://stackoverflow.com/questions/11868022/unicode-characters-appearing-as-question-marks-in-java-json-parsing/11868911#11868911 -----Original Message----- From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of saisantoshi Sent: Tuesday, September 24, 2013 12:52 AM To: pgsql-jdbc@postgresql.org Subject: [JDBC] Re: Reading £ character from DB is displaying œ character Here is what I get: testdb=> set client_encoding='DOS437'; ERROR: invalid value for parameter "client_encoding": "DOS437" testdb=> set client_encoding='CP437'; ERROR: invalid value for parameter "client_encoding": "CP437" The other question I have is if I am setting the server_encoding/client_encodng to UTF8, then saving the £ character intoDB should save £ and be able to see the same in the table. I am still not clear as to why it gets gibbersh characters...its much confusing as to believe if it saved the right characters. Not sure what's missing peice here??
I understand that its not gibberish.. but the problem is how can I set the editor to view the right character set. I am using DOS propmt ( and connectng it to PSQL client) to view the data in DB. Now, there are different audience viewing the data set from DB. If the vewer can't determine what the character is.. it would be very annoying.. There should be a way in DOS prompt to set the terminal to view the data which is UTF-8. Not able to view the data is a limitation on Wndows OS. Thanks, Sai. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Reading-character-from-DB-is-displaying-character-tp5771897p5772290.html Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.
I have had problems of this sort while I was working under Windows. There are *numerous* places where character encoding needs to be specified correctly. Java is UNICODE by default, so is PostgreSQL, and correct character encoding is vital to making all the things work together. The places where things can go wrong are: - Character encoding in the editor that is used to write the source code - Character encoding when compiling the code - Character encoding used when inserting in the database (client-side encoding) - Character encoding used by the database (server-side) - Character encoding used when selecting from the database (client-side encoding) - Character encoding used when outputting the content - Character encoding of the output media One _could_ mix and match different encodings at different steps, but it's a process prone to errors. Using a single encoding everywhere helps (although it can mask problems). I've had numerous occasions where colleagues would write code in Windows, and saved the file with their local Code Page, which is *not* UTF-8. The code was compiled on a system with UTF-8 as default encoding, and the code ended up containing wrong literals. I've had the opposite too: colleagues have compiled UTF-8 code on their machines with a different local encoding, and got broken string literals. It is also possible, that the console you're using (command prompt?) uses one encoding, while your code that outputs the read data uses a different one. Here's what I mean: > C:\Program Files\PostgreSQL\9.3\bin>psql -h database lachezar lachezar > Password for user lachezar: > psql (9.3.0) > WARNING: Console code page (866) differs from Windows code page (1251) > 8-bit characters might not work correctly. See psql reference > page "Notes for Windows users" for details. > Type "help" for help. > > lachezar=> select username, personal_name from users where username = 'lachezar'; > username | personal_name > ----------+---------------- > lachezar | ╦·ўхчрЁ ─юсЁхт > (1 row) > > > lachezar=> \encoding WIN866 > lachezar=> select username, personal_name from users where username = 'lachezar'; > username | personal_name > ----------+---------------- > lachezar | Лъчезар Добрев > (1 row) Just because *you* can't see the correct content, it does not mean, that it's the database's fault (and I'm pretty sure it is not). Check every one of the elements I noted above. It might be the editor you're using! Using Notepad has yielded problems of that sort. In all honesty I've resorted to decompiling code to *catch* lazy/ignorant/newbie developers that fail to set up a development environment correctly. Still... Make sure your database is actually capable of storing those characters > lachezar=> \l+ lachezar > List of databases > Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description > ----------+----------+----------+-------------+-------------+-------------------+-------+------------+------------- > lachezar | lachezar | UTF8 | bg_BG.UTF-8 | bg_BG.UTF-8 | | 13 GB | pg_default | > (1 row) Strictly speaking you might be able to put incompatible characters, but that can yield unpredictable results. 2013/9/25 saisantoshi <mr@starsborn.com>: > I understand that its not gibberish.. but the problem is how can I set the > editor to view the right character set. I am using DOS propmt ( and > connectng it to PSQL client) to view the data in DB. Now, there are > different audience viewing the data set from DB. If the vewer can't > determine what the character is.. it would be very annoying.. There should > be a way in DOS prompt to set the terminal to view the data which is UTF-8. > Not able to view the data is a limitation on Wndows OS. > > Thanks, > Sai.