Home > mailing lists

Thread: About Unicode IVS

RE: About Unicode IVS

From

荒井元成

Date:

29 March 2022, 04:34:49

thank you for your reply.

It will be 2 characters.

select char_length(U&'\+008FBA' || U&'\+0E0102');

char_length

-------------

(1 行)

select length('辺󠄂');

length

--------

(1 行)

select char_length('辺󠄂');

char_length

-------------

(1 行)

$ psql -l

データベース一覧

-----------+---------+------------------+----------+-------------------+---------------------

D209007 | D209007 | UTF8 | C | C |

postgres | D209007 | UTF8 | C | C |

template0 | D209007 | UTF8 | C | C | =c/D209007 +

template1 | D209007 | UTF8 | C | C | =c/D209007 +

(4 行)

$ cat pgdata/PG_VERSION

Moto.

From: David G. Johnston <david.g.johnston@gmail.com>
Sent: Tuesday, March 29, 2022 12:38 PM
To: 荒井元成 <n2029@ndensan.co.jp>
Cc: pgsql-admin@lists.postgresql.org
Subject: Re: About Unicode IVS

On Monday, March 28, 2022, 荒井元成 <n2029@ndensan.co.jp> wrote:

Hi,

In the Length () function, it will be 2 characters where you want it to be 1 character.

Is it possible to respond by changing the settings such as changing the collation setting like SQL Server?

Also, if you understand how to deal with it (eg, create your own function), it would be helpful if you could provide as much information as you can.

Try char_length(text) instead.

David J.

RE: About Unicode IVS

From

Graham Myers

Date:

29 March 2022, 07:45:40

Why do you expect the concatenation of two characters to return a length of one?

Graham Myers

From: 荒井元成 <n2029@ndensan.co.jp>
Sent: 29 March 2022 05:35
To: 'David G. Johnston' <david.g.johnston@gmail.com>
Cc: pgsql-admin@lists.postgresql.org
Subject: RE: About Unicode IVS

thank you for your reply.

It will be 2 characters.

select char_length(U&'\+008FBA' || U&'\+0E0102');

char_length

-------------

(1 行)

select length('辺󠄂');

length

--------

(1 行)

select char_length('辺󠄂');

char_length

-------------

(1 行)

$ psql -l

データベース一覧

-----------+---------+------------------+----------+-------------------+---------------------

D209007 | D209007 | UTF8 | C | C |

postgres | D209007 | UTF8 | C | C |

template0 | D209007 | UTF8 | C | C | =c/D209007 +

template1 | D209007 | UTF8 | C | C | =c/D209007 +

(4 行)

$ cat pgdata/PG_VERSION

Moto.

On Monday, March 28, 2022, 荒井元成 <n2029@ndensan.co.jp> wrote:

Hi,

In the Length () function, it will be 2 characters where you want it to be 1 character.

Is it possible to respond by changing the settings such as changing the collation setting like SQL Server?

Also, if you understand how to deal with it (eg, create your own function), it would be helpful if you could provide as much information as you can.

Try char_length(text) instead.

David J.

Attachment

image564879.png

RE: About Unicode IVS

From

荒井元成

Date:

29 March 2022, 08:21:18

thank you for your reply.

This is because two characters display one character.

This includes Unicode Variant Selectors and Combining Characters.

Moto.

From: Graham Myers <gmyers@retailexpress.com>
Sent: Tuesday, March 29, 2022 4:46 PM
To: 荒井元成 <n2029@ndensan.co.jp>; David G. Johnston <david.g.johnston@gmail.com>
Cc: pgsql-admin@lists.postgresql.org
Subject: RE: About Unicode IVS

Why do you expect the concatenation of two characters to return a length of one?

Graham Myers

thank you for your reply.

It will be 2 characters.

select char_length(U&'\+008FBA' || U&'\+0E0102');

char_length

-------------

(1 行)

select length('辺󠄂');

length

--------

(1 行)

select char_length('辺󠄂');

char_length

-------------

(1 行)

$ psql -l

データベース一覧

-----------+---------+------------------+----------+-------------------+---------------------

D209007 | D209007 | UTF8 | C | C |

postgres | D209007 | UTF8 | C | C |

template0 | D209007 | UTF8 | C | C | =c/D209007 +

template1 | D209007 | UTF8 | C | C | =c/D209007 +

(4 行)

$ cat pgdata/PG_VERSION

Moto.

On Monday, March 28, 2022, 荒井元成 <n2029@ndensan.co.jp> wrote:

Hi,

In the Length () function, it will be 2 characters where you want it to be 1 character.

Is it possible to respond by changing the settings such as changing the collation setting like SQL Server?

Also, if you understand how to deal with it (eg, create your own function), it would be helpful if you could provide as much information as you can.

Try char_length(text) instead.

David J.

Attachment

image001.png

RE: About Unicode IVS

From

Graham Myers

Date:

29 March 2022, 08:26:09

Thanks you for the explanation, Unicode always blows my mind 😊 The problems is that postgres is counting code points which in your example is two.

Graham Myers

From: 荒井元成 <n2029@ndensan.co.jp>
Sent: 29 March 2022 09:21
To: 'Graham Myers' <gmyers@retailexpress.com>; 'David G. Johnston' <david.g.johnston@gmail.com>
Cc: pgsql-admin@lists.postgresql.org
Subject: RE: About Unicode IVS

thank you for your reply.

This is because two characters display one character.

This includes Unicode Variant Selectors and Combining Characters.

Moto.

Why do you expect the concatenation of two characters to return a length of one?

Graham Myers

thank you for your reply.

It will be 2 characters.

select char_length(U&'\+008FBA' || U&'\+0E0102');

char_length

-------------

(1 行)

select length('辺󠄂');

length

--------

(1 行)

select char_length('辺󠄂');

char_length

-------------

(1 行)

$ psql -l

データベース一覧

-----------+---------+------------------+----------+-------------------+---------------------

D209007 | D209007 | UTF8 | C | C |

postgres | D209007 | UTF8 | C | C |

template0 | D209007 | UTF8 | C | C | =c/D209007 +

template1 | D209007 | UTF8 | C | C | =c/D209007 +

(4 行)

$ cat pgdata/PG_VERSION

Moto.

On Monday, March 28, 2022, 荒井元成 <n2029@ndensan.co.jp> wrote:

Hi,

In the Length () function, it will be 2 characters where you want it to be 1 character.

Is it possible to respond by changing the settings such as changing the collation setting like SQL Server?

Also, if you understand how to deal with it (eg, create your own function), it would be helpful if you could provide as much information as you can.

Try char_length(text) instead.

David J.

Attachment

RE: About Unicode IVS

From

荒井元成

Date:

29 March 2022, 08:52:45

Where should I make a request if I want Postgresql to handle it?

Is this mailing list all right?

Moto.

From: Graham Myers <gmyers@retailexpress.com>
Sent: Tuesday, March 29, 2022 5:26 PM
To: 荒井元成 <n2029@ndensan.co.jp>; David G. Johnston <david.g.johnston@gmail.com>
Cc: pgsql-admin@lists.postgresql.org
Subject: RE: About Unicode IVS

Thanks you for the explanation, Unicode always blows my mind 😊 The problems is that postgres is counting code points which in your example is two.

Graham Myers

thank you for your reply.

This is because two characters display one character.

This includes Unicode Variant Selectors and Combining Characters.

Moto.

Why do you expect the concatenation of two characters to return a length of one?

Graham Myers

thank you for your reply.

It will be 2 characters.

select char_length(U&'\+008FBA' || U&'\+0E0102');

char_length

-------------

(1 行)

select length('辺󠄂');

length

--------

(1 行)

select char_length('辺󠄂');

char_length

-------------

(1 行)

$ psql -l

データベース一覧

-----------+---------+------------------+----------+-------------------+---------------------

D209007 | D209007 | UTF8 | C | C |

postgres | D209007 | UTF8 | C | C |

template0 | D209007 | UTF8 | C | C | =c/D209007 +

template1 | D209007 | UTF8 | C | C | =c/D209007 +

(4 行)

$ cat pgdata/PG_VERSION

Moto.

On Monday, March 28, 2022, 荒井元成 <n2029@ndensan.co.jp> wrote:

Hi,

In the Length () function, it will be 2 characters where you want it to be 1 character.

Is it possible to respond by changing the settings such as changing the collation setting like SQL Server?

Also, if you understand how to deal with it (eg, create your own function), it would be helpful if you could provide as much information as you can.

Try char_length(text) instead.

David J.

Attachment

RE: About Unicode IVS

From

"Michel SALAIS"

Date:

29 March 2022, 09:34:40

Hi,

I think this has something to do with collation and ctype. As I see you have it set to “C” for all your databases (even if I don’t understand your titles 😊).

Michel SALAIS

De : 荒井元成 <n2029@ndensan.co.jp>
Envoyé : mardi 29 mars 2022 06:35
À : 'David G. Johnston' <david.g.johnston@gmail.com>
Cc : pgsql-admin@lists.postgresql.org
Objet : RE: About Unicode IVS

thank you for your reply.

It will be 2 characters.

select char_length(U&'\+008FBA' || U&'\+0E0102');

char_length

-------------

(1 行)

select length('辺󠄂');

length

--------

(1 行)

select char_length('辺󠄂');

char_length

-------------

(1 行)

$ psql -l

データベース一覧

-----------+---------+------------------+----------+-------------------+---------------------

D209007 | D209007 | UTF8 | C | C |

postgres | D209007 | UTF8 | C | C |

template0 | D209007 | UTF8 | C | C | =c/D209007 +

template1 | D209007 | UTF8 | C | C | =c/D209007 +

(4 行)

$ cat pgdata/PG_VERSION

Moto.

On Monday, March 28, 2022, 荒井元成 <n2029@ndensan.co.jp> wrote:

Hi,

In the Length () function, it will be 2 characters where you want it to be 1 character.

Is it possible to respond by changing the settings such as changing the collation setting like SQL Server?

Also, if you understand how to deal with it (eg, create your own function), it would be helpful if you could provide as much information as you can.

Try char_length(text) instead.

David J.

RE: About Unicode IVS

From

荒井元成

Date:

29 March 2022, 09:55:29

thank you for your reply.

Changing the collation order and CTYPE did not change the behavior.

-----------+---------+------------------+-------------+-------------------+---------------------

D209007 | D209007 | UTF8 | C | C |

postgres | D209007 | UTF8 | C | C |

template0 | D209007 | UTF8 | C | C | =c/D209007 +

template1 | D209007 | UTF8 | C | C | =c/D209007 +

(5 行)

D209007=# \c template2

データベース"template2"にユーザ"D209007"として接続しました。

template2=# select char_length(U&'\+0000E6' || U&'\+000300');

char_length

-------------

(1 行)

template2=# select char_length(U&'\+008FBA' || U&'\+0E0102');

char_length

-------------

(1 行)

template2=# select length(U&'\+008FBA' || U&'\+0E0102');

length

--------

(1 行)

Moto.

From: Michel SALAIS <msalais@msym.fr>
Sent: Tuesday, March 29, 2022 6:35 PM
To: '荒井元成' <n2029@ndensan.co.jp>; 'David G. Johnston' <david.g.johnston@gmail.com>
Cc: pgsql-admin@lists.postgresql.org
Subject: RE: About Unicode IVS

Hi,

I think this has something to do with collation and ctype. As I see you have it set to “C” for all your databases (even if I don’t understand your titles 😊).

Michel SALAIS

thank you for your reply.

It will be 2 characters.

select char_length(U&'\+008FBA' || U&'\+0E0102');

char_length

-------------

(1 行)

select length('辺󠄂');

length

--------

(1 行)

select char_length('辺󠄂');

char_length

-------------

(1 行)

$ psql -l

データベース一覧

-----------+---------+------------------+----------+-------------------+---------------------

D209007 | D209007 | UTF8 | C | C |

postgres | D209007 | UTF8 | C | C |

template0 | D209007 | UTF8 | C | C | =c/D209007 +

template1 | D209007 | UTF8 | C | C | =c/D209007 +

(4 行)

$ cat pgdata/PG_VERSION

Moto.

On Monday, March 28, 2022, 荒井元成 <n2029@ndensan.co.jp> wrote:

Hi,

In the Length () function, it will be 2 characters where you want it to be 1 character.

Is it possible to respond by changing the settings such as changing the collation setting like SQL Server?

Also, if you understand how to deal with it (eg, create your own function), it would be helpful if you could provide as much information as you can.

Try char_length(text) instead.

David J.

Re: About Unicode IVS

From

Holger Jakobs

Date:

29 March 2022, 10:02:18

It's totally correct that the two characters are still two characters.

You would have to normalize the string first, so that the combination becomes one character.

More information about this topic, which is in part beyond PostgreSQL:

Regards,

Holger

Am 29.03.22 um 11:55 schrieb 荒井元成:

@font-face {font-family:"ＭＳゴシック"; panose-1:2 11 6 9 7 2 5 8 2 4;}@font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face {font-family:"ＭＳＰゴシック"; panose-1:2 11 6 0 7 2 5 8 2 4;}@font-face {font-family:"・ｭ・ｳ繧ｴ繧ｷ繝・け"; panose-1:0 0 0 0 0 0 0 0 0 0;}@font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face {font-family:"Calibri Light"; panose-1:2 15 3 2 2 2 4 3 2 4;}@font-face {font-family:Consolas; panose-1:2 11 6 9 2 2 4 3 2 4;}@font-face {font-family:"\@ＭＳゴシック"; panose-1:2 11 6 9 7 2 5 8 2 4;}@font-face {font-family:"\@ＭＳＰゴシック"; panose-1:2 11 6 0 7 2 5 8 2 4;}@font-face {font-family:"Segoe UI Emoji"; panose-1:2 11 5 2 4 2 4 2 2 3;}@font-face {font-family:"\@・ｭ・ｳ繧ｴ繧ｷ繝・け";}@font-face {font-family:"ÿ2dÿ33 0b40b70c30af"; panose-1:0 0 0 0 0 0 0 0 0 0;}p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0mm; margin-bottom:.0001pt; font-size:12.0pt; font-family:"ＭＳＰゴシック";}a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;}a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:purple; text-decoration:underline;}pre {mso-style-priority:99; mso-style-link:"HTML 書式付き $文字$"; margin:0mm; margin-bottom:.0001pt; font-size:12.0pt; font-family:"ＭＳゴシック";}span.HTML {mso-style-name:"HTML 書式付き $文字$"; mso-style-priority:99; mso-style-link:"HTML 書式付き"; font-family:"Courier New";}p.msonormal0, li.msonormal0, div.msonormal0 {mso-style-name:msonormal; mso-margin-top-alt:auto; margin-right:0mm; mso-margin-bottom-alt:auto; margin-left:0mm; font-size:12.0pt; font-family:"ＭＳＰゴシック";}p.PrformatHTML, li.PrformatHTML, div.PrformatHTML {mso-style-name:"Préformaté HTML"; mso-style-link:"Préformaté HTML Car"; margin:0mm; margin-bottom:.0001pt; font-size:12.0pt; font-family:"ＭＳＰゴシック";}span.PrformatHTMLCar {mso-style-name:"Préformaté HTML Car"; mso-style-priority:99; mso-style-link:"Préformaté HTML"; font-family:Consolas;}span.y2iqfc {mso-style-name:y2iqfc;}.MsoChpDefault {mso-style-type:export-only; font-size:10.0pt;}div.WordSection1 {page:WordSection1;}
thank you for your reply.
Changing the collation order and CTYPE did not change the behavior.

   名前    | 所有者 | エンコーディング | 照合順序   | Ctype(変換演算子) |    アクセス権限
-----------+---------+------------------+-------------+-------------------+---------------------
D209007   | D209007 | UTF8             | C           | C                 |
postgres | D209007 | UTF8             | C           | C                 |
template0 | D209007 | UTF8             | C           | C                 | =c/D209007         +
           |         |                  |             |                   | D209007=CTc/D209007
template1 | D209007 | UTF8             | C           | C                 | =c/D209007         +
           |         |                  |             |                   | D209007=CTc/D209007
template2 | D209007 | UTF8             | ja_JP.UTF-8 | ja_JP.UTF-8       |
(5 行)

D209007=# \c template2
データベース"template2"にユーザ"D209007"として接続しました。
template2=# select char_length(U&'\+0000E6' || U&'\+000300');
char_length
-------------
           2
(1 行)

template2=# select char_length(U&'\+008FBA' || U&'\+0E0102');
char_length
-------------
           2
(1 行)

template2=# select length(U&'\+008FBA' || U&'\+0E0102');
length
--------
      2
(1 行)

Moto.

From: Michel SALAIS <msalais@msym.fr>
Sent: Tuesday, March 29, 2022 6:35 PM
To: '荒井元成' <n2029@ndensan.co.jp>; 'David G. Johnston' <david.g.johnston@gmail.com>
Cc: pgsql-admin@lists.postgresql.org
Subject: RE: About Unicode IVS

Hi,
I think this has something to do with collation and ctype. As I see you have it set to “C” for all your databases (even if I don’t understand your titles 😊).

Michel SALAIS

De : 荒井元成 <n2029@ndensan.co.jp>
Envoyé : mardi 29 mars 2022 06:35
À : 'David G. Johnston' <david.g.johnston@gmail.com>
Cc : pgsql-admin@lists.postgresql.org
Objet : RE: About Unicode IVS

thank you for your reply.
It will be 2 characters.

select char_length(U&'\+008FBA' || U&'\+0E0102');
char_length
-------------
           2
(1 行)

select length('辺󠄂');
length
--------
      2
(1 行)

select char_length('辺󠄂');
char_length
-------------
           2
(1 行)

$ psql -l
                                      データベース一覧
   名前    | 所有者 | エンコーディング | 照合順序 | Ctype(変換演算子) |    アクセス権限
-----------+---------+------------------+----------+-------------------+---------------------
D209007   | D209007 | UTF8             | C        | C                 |
postgres | D209007 | UTF8             | C        | C                 |
template0 | D209007 | UTF8             | C        | C                 | =c/D209007         +
           |         |                  |          |                  | D209007=CTc/D209007
template1 | D209007 | UTF8             | C        | C                 | =c/D209007         +
           |         |                  |          |                   | D209007=CTc/D209007
(4 行)

$ cat pgdata/PG_VERSION
13

Moto.

From: David G. Johnston <david.g.johnston@gmail.com>
Sent: Tuesday, March 29, 2022 12:38 PM
To: 荒井元成 <n2029@ndensan.co.jp>
Cc: pgsql-admin@lists.postgresql.org
Subject: Re: About Unicode IVS

On Monday, March 28, 2022, 荒井元成 <n2029@ndensan.co.jp> wrote:
Hi,

In the Length () function, it will be 2 characters where you want it to be 1 character.
Is it possible to respond by changing the settings such as changing the collation setting like SQL Server?

Also, if you understand how to deal with it (eg, create your own function), it would be helpful if you could provide as much information as you can.

Try char_length(text) instead.

David J.

-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012

Attachment

OpenPGP_signature

Re: About Unicode IVS

From

Tom Lane

Date:

29 March 2022, 10:25:56

Holger Jakobs <holger@jakobs.com> writes:
> It's totally correct that the two characters are still two characters.
> You would have to normalize the string first, so that the combination 
> becomes one character.

Yeah.  In principle the normalize() function ought to do this for
you.  But it doesn't seem to shorten the given example for me;
I'm not sure if that means the example is incorrect, or if it's
a bug in normalize().

u8=# select octet_length(U&'\+008FBA' || U&'\+0E0102');
 octet_length 
--------------
            7
(1 row)

u8=# select octet_length(normalize(U&'\+008FBA' || U&'\+0E0102'));
 octet_length 
--------------
            7
(1 row)

            regards, tom lane

RE: About Unicode IVS

From

荒井元成

Date:

29 March 2022, 11:03:45

thank you for your reply.

In SQL Server, the variant character selector is treated as one character with two characters. The collation order is
Japanese_XJIS_140_CS_AS_KS_WS_VSS_UTF8.

Moto.

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Tuesday, March 29, 2022 7:26 PM
To: Holger Jakobs <holger@jakobs.com>
Cc: pgsql-admin@lists.postgresql.org; n2029@ndensan.co.jp
Subject: Re: About Unicode IVS

Holger Jakobs <holger@jakobs.com> writes:
> It's totally correct that the two characters are still two characters.
> You would have to normalize the string first, so that the combination
> becomes one character.

Yeah.  In principle the normalize() function ought to do this for you.  But it doesn't seem to shorten the given
examplefor me; I'm not sure if that means the example is incorrect, or if it's a bug in normalize(). 

u8=# select octet_length(U&'\+008FBA' || U&'\+0E0102');  octet_length
--------------
            7
(1 row)

u8=# select octet_length(normalize(U&'\+008FBA' || U&'\+0E0102'));  octet_length
--------------
            7
(1 row)

            regards, tom lane

RE: Re: About Unicode IVS

From

荒井元成

Date:

30 March 2022, 00:06:06

Variant forms cannot be solved by normalization.

Moto.

-------- 元のメッセージ --------
件名: Re: About Unicode IVS
日付: 2022-03-29 19:02
発信者: Holger Jakobs <holger@jakobs.com>
宛先: pgsql-admin@lists.postgresql.org



It's totally correct that the two characters are still two characters.

You would have to normalize the string first, so that the combination becomes one character.

More information about this topic, which is in part beyond PostgreSQL:

      *
https://stackoverflow.com/questions/7931204/what-is-normalized-utf-8-all-about
[1]
      * https://en.wikipedia.org/wiki/Unicode_equivalence [2]

Regards,

Holger

Am 29.03.22 um 11:55 schrieb 荒井元成:

> thank you for your reply.
>
> Changing the collation order and CTYPE did not change the behavior.
>
> 名前 | 所有者 | エンコーディング | 照合順序
> | Ctype(変換演算子) | アクセス権限
>
>
-----------+---------+------------------+-------------+-------------------+---------------------
>
>
> D209007 | D209007 | UTF8 | C | C
> |
>
> postgres | D209007 | UTF8 | C | C
> |
>
> template0 | D209007 | UTF8 | C | C
> | =c/D209007 +
>
> | | | |
> | D209007=CTc/D209007
>
> template1 | D209007 | UTF8 | C | C
> | =c/D209007 +
>
> | | | |
> | D209007=CTc/D209007
>
> template2 | D209007 | UTF8 | ja_JP.UTF-8 | ja_JP.UTF-8
> |
>
> (5 行)
>
> D209007=# c template2
>
>
データベース"template2"にユーザ"D209007"として接続しました。
>
>
> template2=# select char_length(U&'+0000E6' || U&'+000300');
>
> char_length
>
> -------------
>
> 2
>
> (1 行)
>
> template2=# select char_length(U&'+008FBA' || U&'+0E0102');
>
> char_length
>
> -------------
>
> 2
>
> (1 行)
>
> template2=# select length(U&'+008FBA' || U&'+0E0102');
>
> length
>
> --------
>
> 2
>
> (1 行)
>
> Moto.
>
> FROM: Michel SALAIS <msalais@msym.fr>
> SENT: Tuesday, March 29, 2022 6:35 PM
> TO: '荒井元成' <n2029@ndensan.co.jp>; 'David G. Johnston'
> <david.g.johnston@gmail.com>
> CC: pgsql-admin@lists.postgresql.org
> SUBJECT: RE: About Unicode IVS
>
> Hi,
>
> I think this has something to do with collation and ctype. As I see
> you have it set to “C” for all your databases (even if I don’t
> understand your titles 😊).
>
> _Michel SALAIS_
>
> _ _
>
> DE : 荒井元成 <n2029@ndensan.co.jp>
> ENVOYÉ : mardi 29 mars 2022 06:35
> À : 'David G. Johnston' <david.g.johnston@gmail.com> CC :
> pgsql-admin@lists.postgresql.org OBJET : RE: About Unicode IVS
>
> thank you for your reply.
>
> It will be 2 characters.
>
> select char_length(U&'+008FBA' || U&'+0E0102');
>
> char_length
>
> -------------
>
> 2
>
> (1 行)
>
> select length('辺󠄂');
>
> length
>
> --------
>
> 2
>
> (1 行)
>
> select char_length('辺󠄂');
>
> char_length
>
> -------------
>
> 2
>
> (1 行)
>
> $ psql -l
>
> データベース一覧
>
> 名前 | 所有者 | エンコーディング | 照合順序 |
> Ctype(変換演算子) | アクセス権限
>
>
-----------+---------+------------------+----------+-------------------+---------------------
>
>
> D209007 | D209007 | UTF8 | C | C
> |
>
> postgres | D209007 | UTF8 | C | C
> |
>
> template0 | D209007 | UTF8 | C | C
> | =c/D209007 +
>
> | | | |
> | D209007=CTc/D209007
>
> template1 | D209007 | UTF8 | C | C
> | =c/D209007 +
>
> | | | |
> | D209007=CTc/D209007
>
> (4 行)
>
> $ cat pgdata/PG_VERSION
>
> 13
>
> Moto.
>
> FROM: David G. Johnston <david.g.johnston@gmail.com>
> SENT: Tuesday, March 29, 2022 12:38 PM
> TO: 荒井元成 <n2029@ndensan.co.jp>
> CC: pgsql-admin@lists.postgresql.org
> SUBJECT: Re: About Unicode IVS
>
> On Monday, March 28, 2022, 荒井元成 <n2029@ndensan.co.jp> wrote:
>
>> Hi,
>>
>> In the Length () function, it will be 2 characters where you want it
>> to be 1 character.
>>
>> Is it possible to respond by changing the settings such as changing
>> the collation setting like SQL Server?
>>
>> Also, if you understand how to deal with it (eg, create your own
>> function), it would be helpful if you could provide as much
>> information as you can.
>
> Try char_length(text) instead.
>
> David J.

--
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012


Links:
------
[1]
https://stackoverflow.com/questions/7931204/what-is-normalized-utf-8-all-about
[2] https://en.wikipedia.org/wiki/Unicode_equivalence