Re: Unicode grapheme clusters - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Unicode grapheme clusters
Date
Msg-id Y9AvgA1+93WXp9gN@momjian.us
Whole thread Raw
In response to Re: Unicode grapheme clusters  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
On Tue, Jan 24, 2023 at 11:40:01AM -0500, Greg Stark wrote:
> On Sat, 21 Jan 2023 at 13:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Probably our long-term answer is to avoid depending on wcwidth
> > and use wcswidth instead.  But it's hard to get excited about
> > doing the legwork for that until popular libc implementations
> > get it right.
> 
> Here's an interesting blog post about trying to do this in Rust:
> 
> https://tomdebruijn.com/posts/rust-string-length-width-calculations/
> 
> TL;DR... Even counting the number of graphemes isn't enough because
> terminals typically (but not always) display emoji graphemes using two
> columns.
> 
> At the end of the day Unicode kind of assumes a variable-width display
> where the rendering is handled by something that has access to the
> actual font metrics. So anything trying to line things up in columns
> in a way that works with any rendering system down the line using any
> font is going to be making a best guess.

Yes, good article, though I am still surprised this is not discussed
more often.  Anyway, for psql, we assume a fixed width output device, so
we can just assume that for computation.  You are right that Unicode
just doesn't seem to consider fixed width output cases and doesn't
provide much guidance.

Beyond psql, should we update our docs to say that character_length()
for Unicode returns the number of Unicode code points, and not
necessarily the number of displayed characters if grapheme clusters are
present?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.



pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: Non-superuser subscription owners
Next
From: Robert Haas
Date:
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation