Thread: Reading £ character from DB is displaying œ character

Reading £ character from DB is displaying œ character

From

saisantoshi

Date:

23 September 2013, 16:46:15

Hi Everyone,

I am trying to read to read a foreign character from database (Get *£*100)
is giving some other character. I am using plain Java- JDBC connection. Just
wondering if there is anything that needs to be done apart from the below?

Below is the pseudo code:

connection = DriverManager.getConnection(

"jdbc:postgresql://127.0.0.1:5432/demodb?useUnicode=yes&characterEncoding=utf-8",
"test",
                    "test");

String sql = "select * from test_data";
ResultSet rs = stmt.executeQuery(sql);
            while(rs.next()){

System.out.println(rs.getString(1))  // Returns Get *œ*100

Expected : Get *£*100)

}

DB is UTF-8
Postgres jar version : postgresql-9.0-801.jdbc4.jar

Appreciate if you could let me know if I am missing anything here?

Thanks,
Sai.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Reading-character-from-DB-is-displaying-character-tp5771897.html
Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.

RE: [JDBC] Reading £ character from DB is displaying œ character

From

Jim Garrison

Date:

23 September 2013, 17:04:37

> -----Original Message-----
> I am trying to read to read a foreign character from database (Get *£*100) is
> giving some other character. I am using plain Java- JDBC connection. Just
> wondering if there is anything that needs to be done apart from the below?
> 
> Below is the pseudo code:
> 
> connection = DriverManager.getConnection(
> 
> "jdbc:postgresql://127.0.0.1:5432/demodb?useUnicode=yes&characterEnco
> ding=utf-8",
> "test",
>                     "test");
> 
> String sql = "select * from test_data";
> ResultSet rs = stmt.executeQuery(sql);
>             while(rs.next()){
> 
> System.out.println(rs.getString(1))  // Returns Get *œ*100

What is the OS? What software is *displaying* the output? Is it an IDE? A terminal? If the terminal isn't UTF-8
capable,or there's some output encoding going on you could have perfectly valid data in the resultset and see it
corruptedon output to the console.

RE: Reading £ character from DB is displaying œ character

From

saisantoshi

Date:

23 September 2013, 18:59:05

OS where database is installed is Linux
OS where the java program is trying to read is Windows7
Using Eclipse IDE

Not sure if there is a problem with the Postgres JDBC driver or
ResultSet.getString() not reading it correctly from the DB.

Thanks,
Sai



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Reading-character-from-DB-is-displaying-character-tp5771897p5772060.html
Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.

Re: Reading £ character from DB is displaying œ character

From

John R Pierce

Date:

23 September 2013, 19:16:52

On 9/23/2013 11:33 AM, saisantoshi wrote:
> OS where database is installed is Linux
> OS where the java program is trying to read is Windows7
> Using Eclipse IDE

what was the client_encoding set to?

Windows natively doesn't use UTF8, it uses UTF16, but console 'DOS'
windows use DOS437 (the old CGA/VGA font) or some such silliness.
it all gets very confusing.

--
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: Reading £ character from DB is displaying œ character

From

saisantoshi

Date:

23 September 2013, 22:03:59

client_encoding is set to "WIN1252"

I am try to set to UTF8, Trying to set a £ character in the update statement
is causing "invalid byte code sequence : 0x9x?"

Additional information:

server_encoding set to "UTF-8" as well.

If I set the client_encoding to WIN1252, it saves £ character correctly into
the database but the retrieval is a problem ( retrieves œ character)

Thanks,
Sai.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Reading-character-from-DB-is-displaying-character-tp5771897p5772093.html
Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.

Re: Reading £ character from DB is displaying œ character

From

saisantoshi

Date:

23 September 2013, 22:04:06

Here is the exact error when trying to set to UT8 and updating £  character.

ERROR:  invalid byte sequence for encoding "UTF8": 0x9c



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Reading-character-from-DB-is-displaying-character-tp5771897p5772099.html
Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.

Re: Re: Reading £ character from DB is displaying œ character

From

John R Pierce

Date:

23 September 2013, 23:02:55

On 9/23/2013 1:58 PM, saisantoshi wrote:
> Here is the exact error when trying to set to UT8 and updating £  character.
>
> ERROR:  invalid byte sequence for encoding "UTF8": 0x9c

WIn1252 is a Windows-specific variant of LATIN-1, an 8 bit character set.

0x9C is in fact the oe munged character in Win1252, and is an invalid
character encoding in UTF8.

the UK pound symbol is 0xA3 in Win1252, and in UTF8, a UK pound sign
is:  C2 A3, which is U+00A3 in UTF16/UCS2.

Win1252 code table:
http://en.wikipedia.org/wiki/Windows-1252#Code_page_layout

partial UTF8 char table:
http://www.utf8-chartable.de/

ahhhh.    0x9C is the pound symbol in CP437, the MSDOS CGA/EGA/VGA
text-mode font.

--
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: Reading £ character from DB is displaying œ character

From

saisantoshi

Date:

24 September 2013, 00:49:31

OK... I tried saving the  £ symbol into DB via java program as supposed to
psql command line and it works fine in saving/retreiving from DB ( via java
program)

But the issue here is, when I run the sql commands, it does not show me the
£ symbol in the returned query.. (some gibberish character) May be because
DOS prompt default CP is 437. Not sure what do I need to do in order to see
the  £  symbol in the returned result set in running the queries or vewing
the data in the database.

testdb=> show server_encoding;
 server_encoding
-----------------
 UTF8
(1 row)

testdb=> set client_encoding = 'UTF8';
SET

testdb=> \encoding
UTF8

Select text from test;

*Get Â£500*   // Actual    ( Expected *Get £500* )


Not sure what the client_encoding and server_encoding does here?

Thanks,
Sai



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Reading-character-from-DB-is-displaying-character-tp5771897p5772120.html
Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.

Re: Re: Reading £ character from DB is displaying œ character

From

John R Pierce

Date:

24 September 2013, 03:06:18

On 9/23/2013 5:27 PM, saisantoshi wrote:
> Not sure what the client_encoding and server_encoding does here?

the server encoding gives the character encoding that the data is stored
in.  the client encoding specifies the encoding used for user
input/output, so user data in SQL commands will be converted from
client_encoding to server_encoding when its written, and coverted back
to the current client_encoding when read.

if you're running psql in a windows command shell, try setting the
client_encoding to DOS437 or CP437 (I'm not sure what the correct
notation is for this) and do your SELECT ...

Java itself uses UCS2/UTF16 internally, so I don't know what the right
way of displaying text from a java program, it proably depends on
whether you're displaying it on a text mode "DOS" window, or in a GUI
window, or on a html page.




--
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: Reading £ character from DB is displaying œ character

From

saisantoshi

Date:

24 September 2013, 13:34:05

Here is what I get:

testdb=> set client_encoding='DOS437';
ERROR:  invalid value for parameter "client_encoding": "DOS437"
testdb=> set client_encoding='CP437';
ERROR:  invalid value for parameter "client_encoding": "CP437"

The other question I have is if I am setting the
server_encoding/client_encodng to UTF8, then saving the £ character into DB
should save £ and be able to see the same in the table. I am still not clear
as to why it gets gibbersh characters... its much confusing as to believe if
it saved the right characters. Not sure what's missing peice here??





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Reading-character-from-DB-is-displaying-character-tp5771897p5772142.html
Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.

RE: [JDBC] Re: Reading £ character from DB is displaying œ character

From

"Hudson, Derrick"

Date:

24 September 2013, 16:05:16

How do you determine that you "get gibberish characters"?  Are you printing the string, or inspecting the individual
charactersin memory using a debugger?  The gibberish is probably the result of the way the text is displayed, not a
problemwith the database or the JDBC.  You need to correctly encode the string as bytes before sending it to
System.out. "correctly" depends on the encoding expected by your terminal (cmd.exe, the console of an IDE, etc.).
 

This answer on StackOverflow should be helpful;  it is equally applicable to the original question of JSON as it is to
yourquestion of JDBC.
 


http://stackoverflow.com/questions/11868022/unicode-characters-appearing-as-question-marks-in-java-json-parsing/11868911#11868911




-----Original Message-----
From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of saisantoshi
Sent: Tuesday, September 24, 2013 12:52 AM
To: pgsql-jdbc@postgresql.org
Subject: [JDBC] Re: Reading £ character from DB is displaying œ character

Here is what I get:

testdb=> set client_encoding='DOS437';
ERROR:  invalid value for parameter "client_encoding": "DOS437"
testdb=> set client_encoding='CP437';
ERROR:  invalid value for parameter "client_encoding": "CP437"

The other question I have is if I am setting the server_encoding/client_encodng to UTF8, then saving the £ character
intoDB should save £ and be able to see the same in the table. I am still not clear as to why it gets gibbersh
characters...its much confusing as to believe if it saved the right characters. Not sure what's missing peice here??

RE: Re: Reading £ character from DB is displaying œ character

From

saisantoshi

Date:

25 September 2013, 04:49:27

I understand that its not gibberish.. but the problem is how can I set the
editor to view the right character set. I am using DOS propmt ( and
connectng it to PSQL client) to view the data in DB. Now, there are
different audience viewing the data set from DB. If the vewer can't
determine what the character is.. it would be very annoying.. There should
be a way in DOS prompt to set the terminal to view the data which is UTF-8.
Not able to view the data is a limitation on Wndows OS.

Thanks,
Sai.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Reading-character-from-DB-is-displaying-character-tp5771897p5772290.html
Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.

Re: [JDBC] Re: Reading £ character from DB is displaying œ character

From

Lachezar Dobrev

Date:

25 September 2013, 09:06:10

  I have had problems of this sort while I was working under Windows.
  There are *numerous* places where character encoding needs to be
specified correctly. Java is UNICODE by default, so is PostgreSQL, and
correct character encoding is vital to making all the things work
together.
  The places where things can go wrong are:

  - Character encoding in the editor that is used to write the source code
  - Character encoding when compiling the code
  - Character encoding used when inserting in the database
(client-side encoding)
  - Character encoding used by the database (server-side)
  - Character encoding used when selecting from the database
(client-side encoding)
  - Character encoding used when outputting the content
  - Character encoding of the output media

  One _could_ mix and match different encodings at different steps,
but it's a process prone to errors. Using a single encoding everywhere
helps (although it can mask problems).

  I've had numerous occasions where colleagues would write code in
Windows, and saved the file with their local Code Page, which is *not*
UTF-8. The code was compiled on a system with UTF-8 as default
encoding, and the code ended up containing wrong literals. I've had
the opposite too: colleagues have compiled UTF-8 code on their
machines with a different local encoding, and got broken string
literals.

  It is also possible, that the console you're using (command prompt?)
uses one encoding, while your code that outputs the read data uses a
different one.

  Here's what I mean:

> C:\Program Files\PostgreSQL\9.3\bin>psql -h database lachezar lachezar
> Password for user lachezar:
> psql (9.3.0)
> WARNING: Console code page (866) differs from Windows code page (1251)
>         8-bit characters might not work correctly. See psql reference
>         page "Notes for Windows users" for details.
> Type "help" for help.
>
> lachezar=> select username, personal_name from users where username = 'lachezar';
>  username | personal_name
> ----------+----------------
>  lachezar | ╦·ўхчрЁ ─юсЁхт
> (1 row)
>
>
> lachezar=> \encoding WIN866
> lachezar=> select username, personal_name from users where username = 'lachezar';
>  username | personal_name
> ----------+----------------
>  lachezar | Лъчезар Добрев
> (1 row)

  Just because *you* can't see the correct content, it does not mean,
that it's the database's fault (and I'm pretty sure it is not).

  Check every one of the elements I noted above. It might be the
editor you're using! Using Notepad has yielded problems of that sort.
In all honesty I've resorted to decompiling code to *catch*
lazy/ignorant/newbie developers that fail to set up a development
environment correctly.

  Still... Make sure your database is actually capable of storing
those characters

> lachezar=> \l+ lachezar
>                                                 List of databases
>     Name  |  Owner   | Encoding |   Collate   |    Ctype    | Access privileges | Size  | Tablespace | Description
> ----------+----------+----------+-------------+-------------+-------------------+-------+------------+-------------
>  lachezar | lachezar | UTF8     | bg_BG.UTF-8 | bg_BG.UTF-8 |                   | 13 GB | pg_default |
> (1 row)

  Strictly speaking you might be able to put incompatible characters,
but that can yield unpredictable results.

2013/9/25 saisantoshi <mr@starsborn.com>:
> I understand that its not gibberish.. but the problem is how can I set the
> editor to view the right character set. I am using DOS propmt ( and
> connectng it to PSQL client) to view the data in DB. Now, there are
> different audience viewing the data set from DB. If the vewer can't
> determine what the character is.. it would be very annoying.. There should
> be a way in DOS prompt to set the terminal to view the data which is UTF-8.
> Not able to view the data is a limitation on Wndows OS.
>
> Thanks,
> Sai.