Thread: ts_parse reports different between MacOS, FreeBSD/Linux
Hello, We have an application whose test suite fails on MacOS when running the search tests on unicode characters. I've narrowed it down to the following: macos=# select * from ts_parse('default','天'); tokid | token -------+------- 12 | 天 (1 row) freebsd=# select * from ts_parse('default','天'); tokid | token -------+------- 2 | 天 (1 row) This has been bugging me for a while, but it's a test our devs using MacOS just ignores for now as we know it passes ourCI/CD pipeline on FreeBSD/Linux. It seems if anyone is shipping an app on MacOS and bundling Postgres they're going tohave a bad time with searching. Please let me know if there's anything I can do to help. Will gladly test patches. Thanks, -- Mark Felder ports-secteam & portmgr alumni feld@FreeBSD.org
"Mark Felder" <feld@FreeBSD.org> writes: > We have an application whose test suite fails on MacOS when running the search tests on unicode characters. Yeah, known problem :-(. The text search parser relies on the C library's locale data to classify characters as being letters, digits, etc. Unfortunately, the UTF8 locales on macOS are just horribly bad, and report many results that are different from other platforms. I suppose that Apple has got reasonable Unicode character knowledge somewhere in their OS; they are just not very interested in making the POSIX locale APIs work well. Which leaves us with a bit of a problem for getting consistent results cross-platform. regards, tom lane