Re: Unicode grapheme clusters - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Unicode grapheme clusters
Date
Msg-id Y8wrKdVl/HpKDYrP@momjian.us
Whole thread Raw
In response to Re: Unicode grapheme clusters  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Unicode grapheme clusters
List pgsql-hackers
On Sat, Jan 21, 2023 at 12:37:30PM -0500, Bruce Momjian wrote:
> Well, as one of the URLs I quoted said:
> 
>     This is by design. wcwidth() is utterly broken. Any terminal or
>     terminal application that uses it is also utterly broken. Forget
>     about emoji wcwidth() doesn't even work with combining characters,
>     zero width joiners, flags, and a whole bunch of other things.
> 
> So, either we have to find a function in the library that will do the
> looping over the string for us, or we need to identify the special
> Unicode characters that create grapheme clusters and handle them in our
> code.

I just checked if wcswidth() would honor graphene clusters, though
wcwidth() does not, but it seems wcswidth() treats characters just like
wcwidth():

    $ LANG=en_US.UTF-8 grapheme_test
    wcswidth len=7
    
    bytes_consumed=4, wcwidth len=2
    bytes_consumed=4, wcwidth len=2
    bytes_consumed=3, wcwidth len=0
    bytes_consumed=3, wcwidth len=1
    bytes_consumed=3, wcwidth len=0
    bytes_consumed=4, wcwidth len=2

C test program attached.  This is on Debian 11.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.

Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Unicode grapheme clusters
Next
From: Tom Lane
Date:
Subject: Re: Unicode grapheme clusters