Thread: Understanding Encoding
Hello All,
I am not able to understand how the encoding is handled. I would be happy if someone can tell what is happening in the following scenario:
1. I have created a database with EUC_KR encoding and created a table and inserted some korean value into it.
=# CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE='ko_KR.euckr' TEMPLATE=template0;
=# \c korean
korean=# SHOW client_encoding;
client_encoding
-----------------
UTF8
(1 row)
korean=# CREATE TABLE tbl (doc text);
korean=# INSERT INTO tbl VALUES ('그레스');
2. If I insert non-korean values it throws error:
korean=# INSERT INTO tbl VALUES ('データベース');
ERROR: character with byte sequence 0xe3 0x83 0xbc in encoding "UTF8" has no equivalent in encoding "EUC_KR"
korean=# SELECT * FROM tbl;
doc
--------
그레스
(1 row)
3. I change the client encoding to EUC_KR and try inserting the same korean characters and it throws an error:
korean=# SET client_encoding = 'EUC_KR';
SET
korean=# INSERT INTO tbl VALUES ('그레스');
ERROR: invalid byte sequence for encoding "EUC_KR": 0xa0 0x88
Even the SELECT statement displays something different. I am not able to understand why?
korean=# SELECT * FROM tbl;
doc
--------
����
(1 row)
Can someone please help me.
Thanks you,
Beena Emerson
On Fri, Sep 6, 2013 at 2:56 PM, Beena Emerson <memissemerson@gmail.com> wrote: > Hello All, > > I am not able to understand how the encoding is handled. I would be happy if > someone can tell what is happening in the following scenario: > > 1. I have created a database with EUC_KR encoding and created a table and > inserted some korean value into it. > > =# CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' > LC_CTYPE='ko_KR.euckr' TEMPLATE=template0; > > =# \c korean > > korean=# SHOW client_encoding; > client_encoding > ----------------- > UTF8 > (1 row) > > korean=# CREATE TABLE tbl (doc text); > > korean=# INSERT INTO tbl VALUES ('그레스'); > > > 2. If I insert non-korean values it throws error: > > korean=# INSERT INTO tbl VALUES ('データベース'); > ERROR: character with byte sequence 0xe3 0x83 0xbc in encoding "UTF8" has > no equivalent in encoding "EUC_KR" > > korean=# SELECT * FROM tbl; > doc > -------- > 그레스 > (1 row) > > > 3. I change the client encoding to EUC_KR and try inserting the same korean > characters and it throws an error: > > korean=# SET client_encoding = 'EUC_KR'; > SET > korean=# INSERT INTO tbl VALUES ('그레스'); > ERROR: invalid byte sequence for encoding "EUC_KR": 0xa0 0x88 > > > Even the SELECT statement displays something different. I am not able to > understand why? > > korean=# SELECT * FROM tbl; > doc > -------- > ���� > (1 row) > I wonder if you have tried changing your "locale" to ko_KR; something like: LANG=ko_KR LC_ALL=ko_KR \ psql -d korean -- Amit Langote
I wonder if you have tried changing your "locale" to ko_KR; something like:
LANG=ko_KR LC_ALL=ko_KR \
psql -d korean
Hi,
It still gives same result:
$ LANG=ko_KR LC_ALL=ko_KR
$ psql -d korean
korean=# SHOW client_encoding;
client_encoding
-----------------
EUC_KR
(1 row)
korean=# INSERT INTO tbl VALUES ('그레스');
ERROR: invalid byte sequence for encoding "EUC_KR": 0xa0 0x88
Beena Emerson <memissemerson@gmail.com> writes: > It still gives same result: > $ LANG=ko_KR LC_ALL=ko_KR > $ psql -d korean > korean=# SHOW client_encoding; > client_encoding > ----------------- > EUC_KR > (1 row) > korean=# INSERT INTO tbl VALUES ('그레스'); > ERROR: invalid byte sequence for encoding "EUC_KR": 0xa0 0x88 What you need to figure out is what encoding the text you are typing is in. You're telling psql it's EUC_KR but it evidently isn't. If you're typing these characters manually then it's probably determined by a setting of the terminal-emulator program you're using. But if you're copying-and-pasting then things get more complicated. Also, what you did above is not what Amit suggested: he wanted you to put the variable assignments on the same command line as the psql invocation, so that they'd affect the environment passed to psql. I'm suspicious of his solution because I'd have thought the terminal program would set up the right environment ... but you might as well try it. regards, tom lane
On Fri, Sep 6, 2013 at 3:47 PM, Beena Emerson <memissemerson@gmail.com> wrote: > >> >> I wonder if you have tried changing your "locale" to ko_KR; something >> like: >> >> LANG=ko_KR LC_ALL=ko_KR \ >> psql -d korean >> > > Hi, > > It still gives same result: > > $ LANG=ko_KR LC_ALL=ko_KR > $ psql -d korean > > korean=# SHOW client_encoding; > client_encoding > ----------------- > EUC_KR > (1 row) > > korean=# INSERT INTO tbl VALUES ('그레스'); > ERROR: invalid byte sequence for encoding "EUC_KR": 0xa0 0x88 I changed the encoding of the terminal emulator (GNOME Terminal 2.31.3) using the Terminal menu as: Terminal -> Set Character Encoding -> Korean (EUC-KR) Note that, if the menu only lists UTF-8, you'd have to add EUC-KR using "Add or Remove". And it seems to work; could you try the same? -- Amit Langote
On Fri, Sep 6, 2013 at 12:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Beena Emerson <memissemerson@gmail.com> writes:What you need to figure out is what encoding the text you are typing
> It still gives same result:
> $ LANG=ko_KR LC_ALL=ko_KR
> $ psql -d korean
> korean=# SHOW client_encoding;
> client_encoding
> -----------------
> EUC_KR
> (1 row)
> korean=# INSERT INTO tbl VALUES ('그레스');
> ERROR: invalid byte sequence for encoding "EUC_KR": 0xa0 0x88
is in. You're telling psql it's EUC_KR but it evidently isn't.
If you're typing these characters manually then it's probably determined
by a setting of the terminal-emulator program you're using. But if
you're copying-and-pasting then things get more complicated.
Also, what you did above is not what Amit suggested: he wanted you to put
the variable assignments on the same command line as the psql invocation,
so that they'd affect the environment passed to psql. I'm suspicious of
his solution because I'd have thought the terminal program would set up
the right environment ... but you might as well try it.
I tried with both the assignment and invocation in same line. Again it gave the same result.
Maybe the problem is with copy paste. I will look into it.
Thank you.
You can refer : http://www.postgresql.org/docs/9.2/static/multibyte.html
On Fri, Sep 6, 2013 at 11:26 AM, Beena Emerson <memissemerson@gmail.com> wrote:
Hello All,I am not able to understand how the encoding is handled. I would be happy if someone can tell what is happening in the following scenario:1. I have created a database with EUC_KR encoding and created a table and inserted some korean value into it.=# CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE='ko_KR.euckr' TEMPLATE=template0;=# \c koreankorean=# SHOW client_encoding;client_encoding-----------------UTF8(1 row)korean=# CREATE TABLE tbl (doc text);korean=# INSERT INTO tbl VALUES ('그레스');2. If I insert non-korean values it throws error:korean=# INSERT INTO tbl VALUES ('データベース');ERROR: character with byte sequence 0xe3 0x83 0xbc in encoding "UTF8" has no equivalent in encoding "EUC_KR"korean=# SELECT * FROM tbl;doc--------그레스(1 row)3. I change the client encoding to EUC_KR and try inserting the same korean characters and it throws an error:korean=# SET client_encoding = 'EUC_KR';SETkorean=# INSERT INTO tbl VALUES ('그레스');ERROR: invalid byte sequence for encoding "EUC_KR": 0xa0 0x88Even the SELECT statement displays something different. I am not able to understand why?korean=# SELECT * FROM tbl;doc--------����(1 row)Can someone please help me.Thanks you,Beena Emerson
> Hello All, > > I am not able to understand how the encoding is handled. I would be happy > if someone can tell what is happening in the following scenario: > > 1. I have created a database with EUC_KR encoding and created a table and > inserted some korean value into it. > > =# CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' > LC_CTYPE='ko_KR.euckr' TEMPLATE=template0; > > =# \c korean > > korean=# SHOW client_encoding; > client_encoding > ----------------- > UTF8 > (1 row) > > korean=# CREATE TABLE tbl (doc text); > > korean=# INSERT INTO tbl VALUES ('그레스'); > > > 2. If I insert non-korean values it throws error: > > korean=# INSERT INTO tbl VALUES ('データベース'); > ERROR: character with byte sequence 0xe3 0x83 0xbc in encoding "UTF8" has > no equivalent in encoding "EUC_KR" The error messages says all. PostgreSQL accepted 'データベース' encoded in UTF-8 then tried to convert to EUC_KR but failed, because EUC_KR does not accept languages other than Korean (and ASCII). What else did you expect? > korean=# SELECT * FROM tbl; > doc > -------- > 그레스 > (1 row) > > > 3. I change the client encoding to EUC_KR and try inserting the same korean > characters and it throws an error: > > korean=# SET client_encoding = 'EUC_KR'; > SET > korean=# INSERT INTO tbl VALUES ('그레스'); > ERROR: invalid byte sequence for encoding "EUC_KR": 0xa0 0x88 0xa0 is definitely not part of EUC_KR. That's why PostgreSQL throws an error. I gues you are using UHC (Unified Hangul Code), rather than EUC_KR. They are different encodings. You should do either: 1) Make sure that your termical encoding is EUC_KR. 2) set client_encoding = 'uhc'; > Even the SELECT statement displays something different. I am not able to > understand why? > > korean=# SELECT * FROM tbl; > doc > -------- > ���� > (1 row) This is because the same reason above. > Can someone please help me. > > Thanks you, > > Beena Emerson -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp