Home > mailing lists

Re: How to display complicated Chinese character: Biang. - Mailing list pgsql-general

From	Laurenz Albe
Subject	Re: How to display complicated Chinese character: Biang.
Date	June 2, 2022 11:03:58
Msg-id	55a71975b11634d6554559064f24931bb25408b9.camel@cybertec.at Whole thread Raw
In response to	How to display complicated Chinese character: Biang. (jian he <jian.universality@gmail.com>)
List	pgsql-general

Tree view

On Thu, 2022-06-02 at 12:45 +0530, jian he wrote:
> Trying to display some special Chinese characters in Postgresql.
> 
> localhost:5433 admin@test=# show LC_COLLATE;
> +------------+
> | lc_collate |
> +------------+
> | C.UTF-8    |
> +------------+ 
> 
> > with strings(s) as (
> >  values (U&'\+0030EDD')
> > )
> > select s,
> >   octet_length(s),
> >   char_length(s),
> >   (select count(*) from icu_character_boundaries(s,'en')) as graphemes from strings;
> > 
> 
> +-----+--------------+-------------+-----------+
> |  s    | octet_length | char_length | graphemes |
> +-----+--------------+-------------+-----------+
> | ロD |            4      |           2          |         2 |
> +-----+--------------+-------------+-----------+
> 
> Seems not right. graphemes should be 1?

You have an extra "0" there; "\+" unicode escapes have exactly 6 digits:

WITH strings(s) AS (
   VALUES (U&'\+030EDD')
)
select s,
       octet_length(s),
       char_length(s) 
from strings;

 s  │ octet_length │ char_length 
════╪══════════════╪═════════════
 𰻝 │            4 │           1
(1 row)

PostgreSQL doesn't have a function "icu_character_boundaries".

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

pgsql-general by date:

From: Tim Kelly
Date: 02 June 2022, 10:52:18
Subject: Re: unoptimized nested loops

From: Danny Shemesh
Date: 02 June 2022, 11:44:04
Subject: Re: Extended multivariate statistics are ignored (potentially related to high null fraction, not sure)

Re: How to display complicated Chinese character: Biang. - Mailing list pgsql-general

Previous

Next