Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8
Date
Msg-id 53742.1604028272@sss.pgh.pa.us
Whole thread Raw
In response to Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8  (Amit Langote <amitlangote09@gmail.com>)
List pgsql-hackers
Amit Langote <amitlangote09@gmail.com> writes:
> On Fri, Oct 30, 2020 at 9:44 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>> Today while working on some other task related to database encoding, I
>> noticed that the MINUS SIGN (with byte sequence a1-dd) in EUC-JP is
>> mapped to FULLWIDTH HYPHEN-MINUS (with byte sequence ef-bc-8d) in
>> UTF-8. See below:
>> ...
>> Isn't this a bug?

> Can't tell what reason there was to do that, but there must have been
> some.  Maybe the Japanese character sets prefer full-width hyphen
> minus (unicode U+FF0D) over mathematical minus sign (U+2212)?

The way it's been explained to me in the past is that the conversion
between Unicode and the various Japanese encodings is not as well
defined as one could wish, because there are multiple quasi-standard
versions of the Japanese encodings.  So we shouldn't move too hastily
on changing this.  Maybe it's really a bug, but maybe there are good
reasons.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8
Next
From: Fujii Masao
Date:
Subject: Re: Add Information during standby recovery conflicts