Re: badly calculated width of emoji in psql - Mailing list pgsql-hackers

From John Naylor
Subject Re: badly calculated width of emoji in psql
Date
Msg-id CAFBsxsH5ejH4-1xaTLpSK8vWoK1m6fA1JBtTM6jmBsLfmDki1g@mail.gmail.com
Whole thread Raw
In response to Re: badly calculated width of emoji in psql  (Jacob Champion <pchampion@vmware.com>)
Responses Re: badly calculated width of emoji in psql  (John Naylor <john.naylor@enterprisedb.com>)
Re: badly calculated width of emoji in psql  (Jacob Champion <pchampion@vmware.com>)
List pgsql-hackers
On Tue, Aug 24, 2021 at 1:50 PM Jacob Champion <pchampion@vmware.com> wrote:
>
> Does there need to be any sanity check for overlapping ranges between
> the combining and fullwidth sets? The Unicode data on a dev's machine
> would have to be broken somehow for that to happen, but it could
> potentially go undetected for a while if it did.

It turns out I should have done that to begin with. In the Unicode data, it apparently happens that a character can be both combining and wide, and that will cause ranges to overlap in my scheme:

302A..302D;W     # Mn     [4] IDEOGRAPHIC LEVEL TONE MARK..IDEOGRAPHIC ENTERING TONE MARK

{0x3000, 0x303E, 2},
{0x302A, 0x302D, 0},

3099..309A;W     # Mn     [2] COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK..COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK

{0x3099, 0x309A, 0},
{0x3099, 0x30FF, 2},

Going by the above, Jacob's patch from July 21 just happened to be correct by chance since the combining character search happened first.

It seems the logical thing to do is revert my 0001 and 0002 and go back to something much closer to Jacob's patch, plus a big comment explaining that the order in which the searches happen matters.

The EastAsianWidth.txt does have combining property "Mn" in the comment above, so it's tempting to just read that (plus we could read just one file for these properties). However, it seems risky to rely on comments, since their presence and format is probably less stable than the data format.
--
John Naylor
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Nikolay Samokhvalov
Date:
Subject: Re: log_autovacuum in Postgres 14 -- ordering issue
Next
From: Stephen Frost
Date:
Subject: Re: log_autovacuum in Postgres 14 -- ordering issue