Thread: BUG #16222: [[:print:]] doesn't correctly handle Emoji skin tone modifiers on MacOS
BUG #16222: [[:print:]] doesn't correctly handle Emoji skin tone modifiers on MacOS
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16222 Logged by: Mack Earnhardt Email address: mack@agilereasoning.com PostgreSQL version: 11.6 Operating system: MacOS Catalina Description: On Linux heroku-18, these expressions both eval true: select '✌'~'\A[[:print:]]*\Z'; select '✌🏻'~'\A[[:print:]]*\Z'; On MacOS Catalina, the 1st evals true but the 2nd evals false.
Re: BUG #16222: [[:print:]] doesn't correctly handle Emoji skin tone modifiers on MacOS
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > On Linux heroku-18, these expressions both eval true: > select '✌'~'\A[[:print:]]*\Z'; > select '✌ð»'~'\A[[:print:]]*\Z'; > On MacOS Catalina, the 1st evals true but the 2nd evals false. This is entirely a function of what your operating system's locale support does. So it could be that you chose the wrong LC_CTYPE setting for the macOS database -- in C locale, for example, "false" is the right answer. However, we've observed that macOS's UTF8-based locales seem pretty brain-dead about handling of multibyte characters :-(. So it's likely that this boils down to being Apple's bug. I haven't detected any interest on their part in improving their POSIX locale support, unfortunately. regards, tom lane
Re: BUG #16222: [[:print:]] doesn't correctly handle Emoji skin tonemodifiers on MacOS
From
Mack Earnhardt
Date:
Hi Tom, You’re correct. I thought the fact that Terminal and Vim both display correct-ish was enough to rule out the OS. It wasn’t. The database LC_CTYPE is set to en_US.UTF-8, as is my bash terminal. When I put the two queries in a text file and use `egrep'^[[:print:]]+$’`, only the first line is recognized. Thanks for helping me narrow this down! -M > On Jan 21, 2020, at 12:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > PG Bug reporting form <noreply@postgresql.org> writes: >> On Linux heroku-18, these expressions both eval true: > >> select '✌'~'\A[[:print:]]*\Z'; >> select '✌ð»'~'\A[[:print:]]*\Z'; > >> On MacOS Catalina, the 1st evals true but the 2nd evals false. > > This is entirely a function of what your operating system's > locale support does. So it could be that you chose the wrong > LC_CTYPE setting for the macOS database -- in C locale, for > example, "false" is the right answer. However, we've observed > that macOS's UTF8-based locales seem pretty brain-dead about > handling of multibyte characters :-(. So it's likely that this > boils down to being Apple's bug. I haven't detected any interest > on their part in improving their POSIX locale support, unfortunately. > > regards, tom lane