Home > mailing lists

Re: [GENERAL] How well does PostgreSQL 9.6.1 support unicode? - Mailing list pgsql-general

From	Vick Khera
Subject	Re: [GENERAL] How well does PostgreSQL 9.6.1 support unicode?
Date	December 21, 2016 17:08:31
Msg-id	CALd+dcfA2-p2CquiokLPxQKWzFP-ggtQ7uqcab3ozYsdajkGAQ@mail.gmail.com Whole thread Raw
In response to	Re: [GENERAL] How well does PostgreSQL 9.6.1 support unicode? (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses	Re: [GENERAL] How well does PostgreSQL 9.6.1 support unicode?
List	pgsql-general

Tree view

On Wed, Dec 21, 2016 at 2:56 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> A PostgreSQL database with encoding=UTF8 just accepts the whole
> range of Unicode, regardless that a character is defined for the
> code or not.

Interesting... when I converted my application and database to utf8 encoding, I discovered that Postgres is picky about UTF-8. Specifically the UTF-8 code point 0xed 0xa0 0x8d which maps to UNICODE code point 0xd80d. This looks like a proper character but in fact is not a defined character code point.

Given the above unicode table:

insert into unicode(id, string) values(1, E'\xed\xa0\x8d');
ERROR: invalid byte sequence for encoding "UTF8": 0xed 0xa0 0x8d

So I think when you present an actual string of UTF8 encoded characters, Postgres does refuse characters unknown. However, as you observe, inserting the unicode code point directly does not produce an error:

insert into unicode(id, string) values(1, U&'\d80d');
INSERT 0 1

I discovered this when that specific byte sequence was found in my database during the conversion. I have no idea what my customer entered in the form to make that sequence, but it was part of the Vietnamese spelling of Ho Chi Minh City as best I could figure.

pgsql-general by date:

From: Yogesh Sharma
Date: 21 December 2016, 17:00:10
Subject: Re: [GENERAL] Request to share approach during REINDEX operation

From: Vick Khera
Date: 21 December 2016, 17:12:13
Subject: Re: [GENERAL] Request to share approach during REINDEX operation

Re: [GENERAL] How well does PostgreSQL 9.6.1 support unicode? - Mailing list pgsql-general

Previous

Next