Thread: [HACKERS] PostgreSQL 10 (latest beta) and older ICU
Collegues, PostgreSQL 10 when build with --with-icu requires ICU 4.6 and above. (because it searches for libicu using pkgconfig, and pgkconfig support significantly changed in ICU version 4.6) However, there are some reasons to build PostgreSQL with older versions of ICU. For instance such Linux distributions as RHEL 6 ships ICU 4.2, and SLES 11 also ships older ICU. Sometimes, it is not feasible to compile newer ICU from sources on the servers with these (and derived) distributions. It is possible to compile PostgreSQL 10 with these versions of ICU by specifying ICU_CFLAGS and ICU_LIBS explicitely. However, backend startup fails with errors on some Spanish and Singalese locales. It turns out, that PostgreSQL enumerates collations for all ICU locales and passes it into uloc_toLanguageTag function with strict argument of this function set to TRUE. But for some locales (es*@collation=tradtiional, si*@collation=dictionary) only call with strict=FALSE succeeds in ICU 4.2. In newer versions of ICU all locales can be resolved with strict=TRUE. ICU docs state that if strict=FALSE, some locale fields can be incomplete. What solition is better: 1. Just skip locale/collation combinations which cannot be resolved with strict=TRUE and issue warning instead of error if uloc_toLanguageTag fails on some locale. It would cause some ICU collations be missiong from the databases running on the systems with old ICU. 2. If we see ICU version earlier than 4.6, pass strict=FALSE to ucol_toLanguageTag. It would cause databases on such systems use fallback collation sequence for these collations. Personally I prefer first solition, because I doubt that any of my users would ever need Singalese collation, and probably they can survive without traditional Spanish too. -- Victor Wagner <vitus@wagner.pp.ru>
Victor Wagner <vitus@wagner.pp.ru> writes: > PostgreSQL 10 when build with --with-icu requires ICU 4.6 and above. > (because it searches for libicu using pkgconfig, and pgkconfig support > significantly changed in ICU version 4.6) > However, there are some reasons to build PostgreSQL with older versions > of ICU. No doubt, but making that happen seems like a feature, and IMO we're too late in the v10 beta test cycle to be taking new features on board, especially ones with inherently-large portability risks. We could maybe consider patches for this for v11 ... but will anyone still care by then? (Concretely, my concern is that the alpha and beta testing that's happened so far hasn't covered pre-4.6 ICU at all. I'd be surprised if the incompatibility you found so far is the only one.) regards, tom lane
On Tue, 25 Jul 2017 16:50:38 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Victor Wagner <vitus@wagner.pp.ru> writes: > > PostgreSQL 10 when build with --with-icu requires ICU 4.6 and above. > > (because it searches for libicu using pkgconfig, and pgkconfig > > support significantly changed in ICU version 4.6) > > > However, there are some reasons to build PostgreSQL with older > > versions of ICU. > > No doubt, but making that happen seems like a feature, and IMO we're > too late in the v10 beta test cycle to be taking new features on May be PGDG people could integrate it as a patch for RPMS for particular systems which are affected by the problem. I'd rather say that it is bugfix. Relaxing too strict checking. If we choose approach to allow non-strict language tags, it is oneline patch. If we choose more safe approach to ignore non-strict collations, it would take about five lines 1. Replace one ereport(ERROR with ereport(WARINING 2. Return special value (NULL) after issuing this warning 3. Handle this special value in calling function by continuing to next loop iteration 4,5 - curly braces around 1 and 2. see patch at the end > board, especially ones with inherently-large portability risks. We I don't think that there are so large portability risks. Patch can be done such way that it would behave exactly as now if ICU version is 4.6 or above. Moreover, it is easy to make old ICU support separate configure option (because another configure test is needed anyway). > could maybe consider patches for this for v11 ... but will anyone > still care by then? Undoubtedly will. v11 is expedted to be released in 2018. And RHEL 6 is supported until 2020, and extended support would end in Nov 2023. And it is possible that some derived distribution would last longer. > (Concretely, my concern is that the alpha and beta testing that's > happened so far hasn't covered pre-4.6 ICU at all. I'd be surprised > if the incompatibility you found so far is the only one.) diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c index b6c14c9..9e5da98 100644 --- a/src/backend/commands/collationcmds.c +++ b/src/backend/commands/collationcmds.c @@ -442,11 +442,13 @@ get_icu_language_tag(const char *localename) status = U_ZERO_ERROR; uloc_toLanguageTag(localename,buf, sizeof(buf), TRUE, &status); - if (U_FAILURE(status)) - ereport(ERROR, + if (U_FAILURE(status)) + { + ereport(WARNING, (errmsg("could not convert locale name \"%s\" to languagetag: %s", localename, u_errorName(status)))); - + return NULL; + } return pstrdup(buf);} @@ -733,6 +735,10 @@ pg_import_system_collations(PG_FUNCTION_ARGS) char *localeid = psprintf("%s@collation=%s",name, val); langtag = get_icu_language_tag(localeid); + if (langtag == NULL) { + /* No such collation in library, skip */ + continue; + } collcollate = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : localeid; /*
On 7/25/17 15:20, Victor Wagner wrote: > It turns out, that PostgreSQL enumerates collations for all ICU locales > and passes it into uloc_toLanguageTag function with strict argument of > this function set to TRUE. But for some locales > (es*@collation=tradtiional, si*@collation=dictionary) only call with > strict=FALSE succeeds in ICU 4.2. In newer versions of ICU all locales > can be resolved with strict=TRUE. We are only calling uloc_toLanguageTag() with keyword/value combinations that ICU itself previously told us were supported. So just ignoring errors doesn't seem proper in this case. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, 31 Jul 2017 19:42:30 -0400 Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 7/25/17 15:20, Victor Wagner wrote: > > It turns out, that PostgreSQL enumerates collations for all ICU > > locales and passes it into uloc_toLanguageTag function with strict > > argument of this function set to TRUE. But for some locales > > (es*@collation=tradtiional, si*@collation=dictionary) only call with > > strict=FALSE succeeds in ICU 4.2. In newer versions of ICU all > > locales can be resolved with strict=TRUE. > > We are only calling uloc_toLanguageTag() with keyword/value > combinations that ICU itself previously told us were supported. So > just ignoring errors doesn't seem proper in this case. > We know that this version of ICU is broken. But what choice we have? Locale code in the glibc is much more broken. So we just have to work around bugs in underlaying libraries anyway. In case of ICU we just need to omit some collations. It might cause incompatibility with newer systems where these collations are used, but if we fall back to glibc locale, there would be much more incompatibilites. And also there is bug in strxfrm, which prevents use of abbreviated keys. --
On 8/1/17 02:12, Victor Wagner wrote: >> We are only calling uloc_toLanguageTag() with keyword/value >> combinations that ICU itself previously told us were supported. So >> just ignoring errors doesn't seem proper in this case. >> > We know that this version of ICU is broken. But what choice we have? I don't know that we can already reach that conclusion. Maybe the APIs have changed or we are not using them correctly. Are there any bug reports or changelog entries related to this? I don't think we should just give up and ignore errors. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 1 Aug 2017 08:16:54 -0400 Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 8/1/17 02:12, Victor Wagner wrote: > >> We are only calling uloc_toLanguageTag() with keyword/value > >> combinations that ICU itself previously told us were supported. So > >> just ignoring errors doesn't seem proper in this case. > >> > > We know that this version of ICU is broken. But what choice we > > have? > > I don't know that we can already reach that conclusion. Maybe the Because it was fixed in subsequent versions. And 4.2 is first version where this function appeared. So, we still have problems with SLES11 which use even older version and have to be supported at least two more years. > APIs have changed or we are not using them correctly. Are there any > bug reports or changelog entries related to this? I don't think we > should just give up and ignore errors. We also can provide configuration option or command-line option for initdb which would restrict list of languages for which collation sequences would be created. This would help for all but two languages.
Victor Wagner <vitus@wagner.pp.ru> writes: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: >> I don't know that we can already reach that conclusion. Maybe the > Because it was fixed in subsequent versions. > And 4.2 is first version where this function appeared. > So, we still have problems with SLES11 which use even older version and > have to be supported at least two more years. I think the answer is very clear: we do not need to support old broken versions of ICU, especially not in our first release. We'll have enough to do making the code stable and performant with good versions of ICU. regards, tom lane
On 8/1/17 08:28, Victor Wagner wrote: > On Tue, 1 Aug 2017 08:16:54 -0400 > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > >> On 8/1/17 02:12, Victor Wagner wrote: >>>> We are only calling uloc_toLanguageTag() with keyword/value >>>> combinations that ICU itself previously told us were supported. So >>>> just ignoring errors doesn't seem proper in this case. >>>> >>> We know that this version of ICU is broken. But what choice we >>> have? >> I don't know that we can already reach that conclusion. Maybe the > Because it was fixed in subsequent versions. I propose the attached patch to address this. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 8/1/17 11:28, Peter Eisentraut wrote: > On 8/1/17 08:28, Victor Wagner wrote: >> On Tue, 1 Aug 2017 08:16:54 -0400 >> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: >> >>> On 8/1/17 02:12, Victor Wagner wrote: >>>>> We are only calling uloc_toLanguageTag() with keyword/value >>>>> combinations that ICU itself previously told us were supported. So >>>>> just ignoring errors doesn't seem proper in this case. >>>>> >>>> We know that this version of ICU is broken. But what choice we >>>> have? >>> I don't know that we can already reach that conclusion. Maybe the >> Because it was fixed in subsequent versions. > > I propose the attached patch to address this. committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services