RE: About Unicode IVS - Mailing list pgsql-admin

From Graham Myers
Subject RE: About Unicode IVS
Date
Msg-id d60efdf8caa7379a7483cd530ba5098e@mail.gmail.com
Whole thread Raw
In response to RE: About Unicode IVS  (荒井元成 <n2029@ndensan.co.jp>)
Responses RE: About Unicode IVS  (荒井元成 <n2029@ndensan.co.jp>)
List pgsql-admin

Thanks you for the explanation, Unicode always blows my mind 😊  The problems is that postgres is counting code points which in your example is two.

 


Graham Myers
 

From: 荒井元成 <n2029@ndensan.co.jp>
Sent: 29 March 2022 09:21
To: 'Graham Myers' <gmyers@retailexpress.com>; 'David G. Johnston' <david.g.johnston@gmail.com>
Cc: pgsql-admin@lists.postgresql.org
Subject: RE: About Unicode IVS

 

thank you for your reply.

 

This is because two characters display one character.

This includes Unicode Variant Selectors and Combining Characters.

 

Moto.

 

From: Graham Myers <gmyers@retailexpress.com>
Sent: Tuesday, March 29, 2022 4:46 PM
To:
荒井元成 <n2029@ndensan.co.jp>; David G. Johnston <david.g.johnston@gmail.com>
Cc: pgsql-admin@lists.postgresql.org
Subject: RE: About Unicode IVS

 

Why do you expect the concatenation of two characters to return a length of one? 

 

Graham Myers

 

From: 荒井元成 <n2029@ndensan.co.jp>
Sent: 29 March 2022 05:35
To: 'David G. Johnston' <david.g.johnston@gmail.com>
Cc: pgsql-admin@lists.postgresql.org
Subject: RE: About Unicode IVS

 

 

thank you for your reply.

It will be 2 characters.

 

select char_length(U&'\+008FBA' || U&'\+0E0102');

char_length

-------------

           2

(1 )

 

select length('󠄂');

length

--------

      2

(1 )

 

select char_length('󠄂');

char_length

-------------

           2

(1 )

 

$ psql -l

                                      データベース一覧

   名前    | 所有者  | エンコーディング | 照合順序 | Ctype(変換演算子) |    アクセス権限

-----------+---------+------------------+----------+-------------------+---------------------

D209007   | D209007 | UTF8             | C        | C                 |

postgres  | D209007 | UTF8             | C        | C                 |

template0 | D209007 | UTF8             | C        | C                 | =c/D209007         +

           |         |                  |          |                   | D209007=CTc/D209007

template1 | D209007 | UTF8             | C        | C                 | =c/D209007         +

           |         |                  |          |                   | D209007=CTc/D209007

(4 )

 

 

$ cat pgdata/PG_VERSION

13

 

Moto.

 

From: David G. Johnston <david.g.johnston@gmail.com>
Sent: Tuesday, March 29, 2022 12:38 PM
To:
荒井元成 <n2029@ndensan.co.jp>
Cc: pgsql-admin@lists.postgresql.org
Subject: Re: About Unicode IVS

 



On Monday, March 28, 2022,
荒井元成 <n2029@ndensan.co.jp> wrote:

Hi,

 

In the Length () function, it will be 2 characters where you want it to be 1 character.

Is it possible to respond by changing the settings such as changing the collation setting like SQL Server?
Also, if you understand how to deal with it (eg, create your own function), it would be helpful if you could provide as much information as you can.

 

 

Try char_length(text) instead.

 

David J.

 

Attachment

pgsql-admin by date:

Previous
From: 荒井元成
Date:
Subject: RE: About Unicode IVS
Next
From: 荒井元成
Date:
Subject: RE: About Unicode IVS