Re: Unicode grapheme clusters - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Unicode grapheme clusters
Date
Msg-id CAM-w4HNoonCZW3p=D9J2ev7LpOKXiAsgaH-XOUV=3gL_OJMwOA@mail.gmail.com
Whole thread Raw
In response to Re: Unicode grapheme clusters  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Unicode grapheme clusters
Re: Unicode grapheme clusters
List pgsql-hackers
On Sat, 21 Jan 2023 at 13:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Probably our long-term answer is to avoid depending on wcwidth
> and use wcswidth instead.  But it's hard to get excited about
> doing the legwork for that until popular libc implementations
> get it right.

Here's an interesting blog post about trying to do this in Rust:

https://tomdebruijn.com/posts/rust-string-length-width-calculations/

TL;DR... Even counting the number of graphemes isn't enough because
terminals typically (but not always) display emoji graphemes using two
columns.

At the end of the day Unicode kind of assumes a variable-width display
where the rendering is handled by something that has access to the
actual font metrics. So anything trying to line things up in columns
in a way that works with any rendering system down the line using any
font is going to be making a best guess.

-- 
greg



pgsql-hackers by date:

Previous
From: Jelte Fennema
Date:
Subject: Re: run pgindent on a regular basis / scripted manner
Next
From: Tom Lane
Date:
Subject: Re: run pgindent on a regular basis / scripted manner