Re: Move defaults toward ICU in 16? - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Move defaults toward ICU in 16? |
Date | |
Msg-id | CA+hUKGLd3ESz2yZ69HB6TO2E9J0Ku_eBJ5AddPHQvW+iPZ-puA@mail.gmail.com Whole thread Raw |
In response to | Re: Move defaults toward ICU in 16? (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Move defaults toward ICU in 16?
Re: Move defaults toward ICU in 16? |
List | pgsql-hackers |
On Fri, Feb 3, 2023 at 5:31 AM Jeff Davis <pgsql@j-davis.com> wrote: > On Thu, 2023-02-02 at 08:44 -0500, Robert Haas wrote: > > On Thu, Feb 2, 2023 at 8:13 AM Jeff Davis <pgsql@j-davis.com> wrote: > > > If we don't want to nudge users toward ICU, is it because we are > > > waiting for something, or is there a lack of consensus that ICU is > > > actually better? > > > > Do you think it's better? > > Yes: > > * ICU more featureful: e.g. supports case-insensitive collations (the > citext docs suggest looking at ICU instead). > * It's faster: a simple non-contrived sort is something like 70% > faster[1] than one using glibc. > * It can provide consistent semantics across platforms. +1 > * Easier for users to control what library version is available on > their system. We can also ask packagers to keep some old versions of > ICU available for an extended period of time. > * If one of the ICU multilib patches makes it in, it will be easier > for users to select which of the library versions Postgres will use. > * Reports versions for indiividual collators, distinct from the > library version. +1 > The biggest disadvantage (rather, the flip side of its advantages) is > that it's a separate dependency. Will ICU still be maintained in 10 > years or will we end up stuck maintaining it ourselves? Then again, > we've already been shipping it, so I don't know if we can avoid that > problem entirely now even if we wanted to. It has a pretty special status, with an absolutely enormous amount of technology depending on it. http://blog.unicode.org/2016/05/icu-joins-unicode-consortium.html https://unicode.org/consortium/consort.html https://home.unicode.org/membership/members/ https://home.unicode.org/about-unicode/ I mean, who knows what the future holds, but ultimately what we're doing here is taking the de facto reference implementation of the Unicode collation algorithm. Are Unicode and the consortium still going to be here in 10 years? We're all in on Unicode, and it's also tangled up with ISO standards, as are parts of the collation stuff. Sure, there could be a clean-room implementation that replaces it in some sense (just as there is a Java implementation) but it would very likely be "the same" because the real thing we're buying here is the set of algorithms and data maintenance that the whole industry has agreed on. Unless Britain decides to exit the Latin alphabet, terminate membership of ISO and revert to anglo-saxon runes with a sort order that is defined in the new constitution as "the opposite of whatever Unicode says", it's hard to see obstacles to ICU's long term universal applicability. It's still important to have libc support as an option, though, because it's a totally reasonable thing to want sort order to agree with the "sort" command on the same host, and you are willing to deal with all the complexities that we're trying to escape.
pgsql-hackers by date: