Home > mailing lists

Re: Unicode grapheme clusters - Mailing list pgsql-hackers

From	Bruce Momjian
Subject	Re: Unicode grapheme clusters
Date	January 24, 2023 19:20:32
Msg-id	Y9AvgA1+93WXp9gN@momjian.us Whole thread Raw
In response to	Re: Unicode grapheme clusters (Greg Stark <stark@mit.edu>)
List	pgsql-hackers

Tree view

On Tue, Jan 24, 2023 at 11:40:01AM -0500, Greg Stark wrote:
> On Sat, 21 Jan 2023 at 13:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Probably our long-term answer is to avoid depending on wcwidth
> > and use wcswidth instead.  But it's hard to get excited about
> > doing the legwork for that until popular libc implementations
> > get it right.
> 
> Here's an interesting blog post about trying to do this in Rust:
> 
> https://tomdebruijn.com/posts/rust-string-length-width-calculations/
> 
> TL;DR... Even counting the number of graphemes isn't enough because
> terminals typically (but not always) display emoji graphemes using two
> columns.
> 
> At the end of the day Unicode kind of assumes a variable-width display
> where the rendering is handled by something that has access to the
> actual font metrics. So anything trying to line things up in columns
> in a way that works with any rendering system down the line using any
> font is going to be making a best guess.

Yes, good article, though I am still surprised this is not discussed
more often.  Anyway, for psql, we assume a fixed width output device, so
we can just assume that for computation.  You are right that Unicode
just doesn't seem to consider fixed width output cases and doesn't
provide much guidance.

Beyond psql, should we update our docs to say that character_length()
for Unicode returns the number of Unicode code points, and not
necessarily the number of displayed characters if grapheme clusters are
present?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.

pgsql-hackers by date:

From: Jacob Champion
Date: 24 January 2023, 19:18:44
Subject: Re: Non-superuser subscription owners

From: Robert Haas
Date: 24 January 2023, 19:21:15
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

Re: Unicode grapheme clusters - Mailing list pgsql-hackers

Previous

Next