Re: Unicode grapheme clusters - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Unicode grapheme clusters
Date
Msg-id CAM-w4HMTeJ9nwd_9Ohvaka8qNQ8s0Xw=-URaCP5MCe2buDwHcw@mail.gmail.com
Whole thread Raw
In response to Unicode grapheme clusters  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Unicode grapheme clusters
List pgsql-hackers
This is how we've always documented it. Postgres treats code points as "characters" not graphemes.

You don't need to go to anything as esoteric as emojis to see this either. Accented characters like é have no canonical forms that are multiple code points and in some character sets some accented characters can only be represented that way.

But I don't think there's any reason to consider changing e existing functions. They have to be consistent with substr and the other string manipulation functions.

We could add new functions to work with graphemes but it might bring more pain keeping it up to date....

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Next
From: David Rowley
Date:
Subject: Re: [PATCH] Teach planner to further optimize sort in distinct