Re: Re: Re: [GENERAL] can postgresql supported utf8mb4 character sets? - Mailing list pgsql-general

From Arjen Nienhuis
Subject Re: Re: Re: [GENERAL] can postgresql supported utf8mb4 character sets?
Date
Msg-id CAG6W84JPmOJP5F2eKtL-LvvVA63fuyGVV521hwQns7nk1_BLHw@mail.gmail.com
Whole thread Raw
In response to Re: Re: [GENERAL] can postgresql supported utf8mb4 character sets?  ("lsliang" <lsliang@pconline.com.cn>)
Responses May "PostgreSQL server side GB18030 character set support" reconsidered?
List pgsql-general
On Fri, Mar 6, 2015 at 3:55 AM, lsliang <lsliang@pconline.com.cn> wrote:
 
 
 
 
2015-03-06
 

发件人:Adrian Klaver
发送时间:2015-03-05 21:31:39
收件人:lsliang; pgsql-general
抄送:
主题:Re: [GENERAL] can postgresql supported utf8mb4 character sets?
 
On 03/05/2015 01:45 AM, lsliang wrote:
> can  postgresql supported   utf8mb4  character set?
> today   mobile  apps support   4-byte  character   and  utf8 can only
> support  1-3 bytes character
The docs would seem to indicate otherwise:
> if   load  string  to database which  contain  a  4-byte character
> will failed  .
Have you actually tried to load strings in to Postgres?
If so and it failed what was the method you used and what was the error?
> mysql   since  5.5.3 support utf8mb4 character sets
> I don't  find  some information about  postgresql
>   thanks
-- 
Adrian Klaver
 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
thanks   for  your help .
 
 postgresql   can support   4-byte  character  
 
test=> select * from utf8mb4_test ;
ERROR:  character with byte sequence 0xf0 0x9f 0x98 0x84 in encoding "UTF8" has no equivalent in encoding "GB18030"
test=> \encoding utf8 
test=> select * from utf8mb4_test ;
 content 
---------
 ðŸ˜„
 ðŸ˜„
 
pcauto=> 
 
 

UTF-8 support works fine. The 3 byte limit was something mysql invented. But it only works if your client encoding is UTF-8. In your example, your terminal is not set to UTF-8.

create table test (glyph text);
insert into test values ('A'), ('馬'), ('𐁀'), ('😄'), ('🇪🇸');

select glyph, convert_to(glyph, 'utf-8'), length(glyph) FROM test;
 glyph |     convert_to     | length
-------+--------------------+--------
 A     | \x41               |      1
 馬    | \xe9a6ac           |      1
 𐁀     | \xf0908180         |      1
 😄     | \xf09f9884         |      1
 🇪🇸    | \xf09f87aaf09f87b8 |      2
(5 rows)

What doesn't work is GB18030:

select glyph, convert_to(glyph, 'GB18030'), length(glyph) FROM test;
ERROR:  character with byte sequence 0xf0 0x90 0x81 0x80 in encoding "UTF8" has no equivalent in encoding "GB18030"


I think that is a bug.

Gr. Arjen


pgsql-general by date:

Previous
From: dpopova@uvic.ca
Date:
Subject: Re: How to get plpython2 in /lib?
Next
From: wambacher
Date:
Subject: Re: autovacuum worker running amok - and me too ;)