Re: badly calculated width of emoji in psql - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: badly calculated width of emoji in psql
Date
Msg-id CAFj8pRCGkhApxBhtBP1abW9Wj+HtDaUuA63WudZb2oH8p445NQ@mail.gmail.com
Whole thread Raw
In response to Re: badly calculated width of emoji in psql  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers


po 19. 7. 2021 v 9:46 odesílatel Michael Paquier <michael@paquier.xyz> napsal:
On Wed, Jul 07, 2021 at 06:03:34PM +0000, Jacob Champion wrote:
> I would guess that's the key issue here. If we choose a particular
> width for emoji characters, is there anything keeping a terminal's font
> from doing something different anyway?

I'd say that we are doing our best in guessing what it should be,
then.  One cannot predict how fonts are designed.

> We could also keep the fragments as-is and generate a full interval
> table, like common/unicode_combining_table.h. It looks like there's
> roughly double the number of emoji intervals as combining intervals, so
> hopefully adding a second binary search wouldn't be noticeably slower.

Hmm.  Such things have a cost, and this one sounds costly with a
limited impact.  What do we gain except a better visibility with psql?

The benefit is correct displaying. I checked impact on server side, and ucs_wcwidth is used just for calculation of error position. Any other usage is only in psql.

Moreover, I checked unicode ranges, and I think so for common languages the performance impact should be zero (because typically use ucs < 0x1100). The possible (but very low) impact can be for some historic languages or special symbols. It has not any impact for ranges that currently return display width 2, because the new range is at the end of list.

I am not sure how wide usage of PQdsplen is outside psql, but I have no reason to think so, so developers will prefer this function over built functionality in any developing environment that supports unicode. So in this case I have a strong opinion to prefer correctness of result against current speed (note: I have an experience from pspg development, where this operation is really on critical path, and I tried do some micro optimization without strong effect - on very big unusual result (very wide, very long (100K rows) the difference was about 500 ms (on pager side, it does nothing else than string operations in this moment)).

Regards

Pavel

> In your opinion, would the current one-line patch proposal make things
> strictly better than they are today, or would it have mixed results?
> I'm wondering how to help this patch move forward for the current
> commitfest, or if we should maybe return with feedback for now.

Based on the following list, it seems to me that [u+1f300,u+0x1faff]
won't capture everything, like the country flags:
http://www.unicode.org/emoji/charts/full-emoji-list.html
--
Michael

pgsql-hackers by date:

Previous
From: Yugo NAGATA
Date:
Subject: Re: corruption of WAL page header is never reported
Next
From: Ibrar Ahmed
Date:
Subject: Re: Minimal logical decoding on standbys