Re: Issues with german locale on CentOS 5,6,7 - Mailing list pgsql-general

From Peter Geoghegan
Subject Re: Issues with german locale on CentOS 5,6,7
Date
Msg-id CAEYLb_XiiqAXFiZ=abBvMn6fi=wVwQ8mxMKFL8+DLrBc6+sj1A@mail.gmail.com
Whole thread Raw
In response to Re: Issues with german locale on CentOS 5,6,7  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Wed, Oct 7, 2015 at 8:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 1. Being compatible with the operating system's collation behavior is a
> feature, not a bug.  If nothing else, it allows us to tell people that
> if we sort data the same way that sort(1) does, then it's not a bug that
> we're not sorting the way they think we should.  But quite aside from
> that, there are practical uses to being compatible with other tools.

I am not proposing to make that impossible.

> 2. Last I checked, ICU *only* supports Unicode, and not only that, but
> only UTF16.  This is a non-starter; not only for our Far Eastern users,
> but also those who find various LatinX encodings sufficient.  ICU would be
> a functional fail for the former and a performance fail for the latter.

UTF-16 is more efficient for representing East Asian languages, so not
sure what you mean about that. I realize that using UTF-16 is a
non-starter, though.

I guess you were talking about people who don't use Unicode due to the
Han Unification controversy. Again, I'm not proposing to only support
Unicode, but realistically the vast majority of users want Unicode,
even in East Asia.

Yes, ICU only supports Unicode, but it has supported UTF-8 for years
now, and not as a second class citizen. See
http://userguide.icu-project.org/strings/utf-8 . As it says there:

"""
If it is known that the default charset is always UTF-8 on the target
platform, then you should #define U_CHARSET_IS_UTF8 1 in or before
unicode/utypes.h. (For example, modify the default value there or pass
-DU_CHARSET_IS_UTF8=1 as a compiler flag.) This will change most of
the implementation code to use dedicated (simpler, faster) UTF-8 code
paths and avoid dependencies on the conversion framework. (Avoiding
such dependencies helps with statically linked libraries and may allow
the use of UCONFIG_NO_LEGACY_CONVERSION or even UCONFIG_NO_CONVERSION
[see unicode/uconfig.h].)

"""

> 3. As Thomas Munro already noted, whatcha gonna do when ICU changes their
> collations?  Or are their collations graven on stone tablets, unlike
> anyone else's?

See my response to Thomas.

--
Regards,
Peter Geoghegan


pgsql-general by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Issues with german locale on CentOS 5,6,7
Next
From: "Ramalingam, Sankarakumar"
Date:
Subject: Re: postgres standby won't start