Home > mailing lists

Re: badly calculated width of emoji in psql - Mailing list pgsql-hackers

From	John Naylor
Subject	Re: badly calculated width of emoji in psql
Date	August 25, 2021 23:15:34
Msg-id	CAFBsxsH5ejH4-1xaTLpSK8vWoK1m6fA1JBtTM6jmBsLfmDki1g@mail.gmail.com Whole thread Raw
In response to	Re: badly calculated width of emoji in psql (Jacob Champion <pchampion@vmware.com>)
Responses	Re: badly calculated width of emoji in psql Re: badly calculated width of emoji in psql
List	pgsql-hackers

Tree view

On Tue, Aug 24, 2021 at 1:50 PM Jacob Champion <pchampion@vmware.com> wrote:
>
> Does there need to be any sanity check for overlapping ranges between
> the combining and fullwidth sets? The Unicode data on a dev's machine
> would have to be broken somehow for that to happen, but it could
> potentially go undetected for a while if it did.

It turns out I should have done that to begin with. In the Unicode data, it apparently happens that a character can be both combining and wide, and that will cause ranges to overlap in my scheme:

302A..302D;W # Mn [4] IDEOGRAPHIC LEVEL TONE MARK..IDEOGRAPHIC ENTERING TONE MARK

{0x3000, 0x303E, 2},
{0x302A, 0x302D, 0},

3099..309A;W # Mn [2] COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK..COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK

{0x3099, 0x309A, 0},
{0x3099, 0x30FF, 2},

Going by the above, Jacob's patch from July 21 just happened to be correct by chance since the combining character search happened first.

It seems the logical thing to do is revert my 0001 and 0002 and go back to something much closer to Jacob's patch, plus a big comment explaining that the order in which the searches happen matters.

The EastAsianWidth.txt does have combining property "Mn" in the comment above, so it's tempting to just read that (plus we could read just one file for these properties). However, it seems risky to rely on comments, since their presence and format is probably less stable than the data format.
--
John Naylor
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Nikolay Samokhvalov
Date: 25 August 2021, 21:42:22
Subject: Re: log_autovacuum in Postgres 14 -- ordering issue

From: Stephen Frost
Date: 25 August 2021, 23:33:05
Subject: Re: log_autovacuum in Postgres 14 -- ordering issue

Re: badly calculated width of emoji in psql - Mailing list pgsql-hackers

Previous

Next