On Tue, Jan 24, 2023 at 11:40:01AM -0500, Greg Stark wrote:
> On Sat, 21 Jan 2023 at 13:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Probably our long-term answer is to avoid depending on wcwidth
> > and use wcswidth instead. But it's hard to get excited about
> > doing the legwork for that until popular libc implementations
> > get it right.
>
> Here's an interesting blog post about trying to do this in Rust:
>
> https://tomdebruijn.com/posts/rust-string-length-width-calculations/
>
> TL;DR... Even counting the number of graphemes isn't enough because
> terminals typically (but not always) display emoji graphemes using two
> columns.
>
> At the end of the day Unicode kind of assumes a variable-width display
> where the rendering is handled by something that has access to the
> actual font metrics. So anything trying to line things up in columns
> in a way that works with any rendering system down the line using any
> font is going to be making a best guess.
Yes, good article, though I am still surprised this is not discussed
more often. Anyway, for psql, we assume a fixed width output device, so
we can just assume that for computation. You are right that Unicode
just doesn't seem to consider fixed width output cases and doesn't
provide much guidance.
Beyond psql, should we update our docs to say that character_length()
for Unicode returns the number of Unicode code points, and not
necessarily the number of displayed characters if grapheme clusters are
present?
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Embrace your flaws. They make you human, rather than perfect,
which you will never be.