Home > mailing lists

Re: badly calculated width of emoji in psql - Mailing list pgsql-hackers

From	Pavel Stehule
Subject	Re: badly calculated width of emoji in psql
Date	April 5, 2021 13:13:28
Msg-id	CAFj8pRC74VjsR9s3wuh0mrT+FAmLNvvxM7WObaoOFEiQdQTeog@mail.gmail.com Whole thread Raw
In response to	Re: badly calculated width of emoji in psql (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List	pgsql-hackers

Tree view

po 5. 4. 2021 v 7:07 odesílatel Kyotaro Horiguchi <horikyota.ntt@gmail.com> napsal:

At Fri, 2 Apr 2021 11:51:26 +0200, Pavel Stehule <pavel.stehule@gmail.com> wrote in
> with this patch, the formatting is correct

I think the hardest point of this issue is that we don't have a
reasonable authoritative source that determines character width. And
that the presentation is heavily dependent on environment.

Unicode 9 and/or 10 defines the character properties "Emoji" and
"Emoji_Presentation", and tr51[1] says that

> Emoji are generally presented with a square aspect ratio, which
> presents a problem for flags.
...
> Current practice is for emoji to have a square aspect ratio, deriving
> from their origin in Japanese. For interoperability, it is recommended
> that this practice be continued with current and future emoji. They
> will typically have about the same vertical placement and advance
> width as CJK ideographs. For example:

Ok, even putting aside flags, the first table in [2] asserts that "#",
"*", "0-9" are emoji characters. But we and I think no-one never
present them in two-columns. And the table has many mysterious holes
I haven't looked into.

We could Emoji_Presentation=yes for the purpose, but for example,
U+23E9(BLACK RIGHT-POINTING DOUBLE TRIANGLE) has the property
Emoji_Presentation=yes but U+23E9(BLACK RIGHT-POINTING DOUBLE TRIANGLE
WITH VERTICAL BAR) does not for a reason uncertaion to me. It doesn't
look like other than some kind of mistake.

About environment, for example, U+23E9 is an emoji, and
Emoji_Presentation=yes, but it is shown in one column on my
xterm. (I'm not sure what font am I using..)

[1] http://www.unicode.org/reports/tr51/
[2] https://unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt

A possible compromise is that we treat all Emoji=yes characters
excluding ASCII characters as double-width and manually merge the
fragmented regions into reasonably larger chunks.

It should be fixed in glibc,

https://sourceware.org/bugzilla/show_bug.cgi?id=20313

so we can check it

Regards

Pavel

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

pgsql-hackers by date:

From: "Euler Taveira"
Date: 05 April 2021, 13:11:04
Subject: Re: Logical Replication - improve error message while adding tables to the publication in check_publication_add_relation

From: Andrew Dunstan
Date: 05 April 2021, 13:15:32
Subject: Re: ALTER TABLE ADD COLUMN fast default

Re: badly calculated width of emoji in psql - Mailing list pgsql-hackers

Previous

Next