Home > mailing lists

RE: About Unicode IVS - Mailing list pgsql-admin

From	荒井元成
Subject	RE: About Unicode IVS
Date	March 29, 2022 11:03:45
Msg-id	013501d8435c$a8f1c9e0$fad55da0$@ndensan.co.jp Whole thread Raw
In response to	Re: About Unicode IVS (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-admin

Tree view

thank you for your reply.

In SQL Server, the variant character selector is treated as one character with two characters. The collation order is
Japanese_XJIS_140_CS_AS_KS_WS_VSS_UTF8.

Moto.

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Tuesday, March 29, 2022 7:26 PM
To: Holger Jakobs <holger@jakobs.com>
Cc: pgsql-admin@lists.postgresql.org; n2029@ndensan.co.jp
Subject: Re: About Unicode IVS

Holger Jakobs <holger@jakobs.com> writes:
> It's totally correct that the two characters are still two characters.
> You would have to normalize the string first, so that the combination
> becomes one character.

Yeah.  In principle the normalize() function ought to do this for you.  But it doesn't seem to shorten the given
examplefor me; I'm not sure if that means the example is incorrect, or if it's a bug in normalize(). 

u8=# select octet_length(U&'\+008FBA' || U&'\+0E0102');  octet_length
--------------
            7
(1 row)

u8=# select octet_length(normalize(U&'\+008FBA' || U&'\+0E0102'));  octet_length
--------------
            7
(1 row)

            regards, tom lane

pgsql-admin by date:

From: Tom Lane
Date: 29 March 2022, 10:25:56
Subject: Re: About Unicode IVS

From: 荒井元成
Date: 30 March 2022, 00:06:06
Subject: RE: Re: About Unicode IVS

RE: About Unicode IVS - Mailing list pgsql-admin

Previous

Next