Thread: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

[HACKERS] PostgreSQL 10 (latest beta) and older ICU

From

Victor Wagner

Date:

26 July 2017, 01:20:52

Collegues,

PostgreSQL 10 when build with --with-icu requires ICU 4.6 and above.
(because it searches for libicu using pkgconfig, and pgkconfig support
significantly changed in ICU  version 4.6)

However, there are some reasons to build PostgreSQL with older versions
of ICU. 

For instance such Linux distributions as RHEL 6 ships ICU 4.2, and SLES
11 also ships older ICU. Sometimes, it is not feasible to compile newer
ICU from sources on the servers with these (and derived) distributions.

It is possible to compile PostgreSQL 10 with these versions of ICU
by specifying ICU_CFLAGS and ICU_LIBS explicitely.

However, backend startup fails with errors on some Spanish and
Singalese locales.

It turns out, that PostgreSQL enumerates collations for all ICU locales
and passes it into uloc_toLanguageTag function with strict argument of
this function set to TRUE. But for some locales
(es*@collation=tradtiional, si*@collation=dictionary) only call with
strict=FALSE succeeds in ICU 4.2. In newer versions of ICU all locales
can be resolved with strict=TRUE.

ICU docs state that if strict=FALSE, some locale fields can be
incomplete.

What solition is better:

1. Just skip locale/collation combinations which cannot be resolved
with strict=TRUE and issue warning instead of error if
uloc_toLanguageTag fails on some locale.

It would cause some ICU collations be missiong from the databases
running on the systems with old ICU.

2. If we see ICU version earlier than 4.6, pass strict=FALSE to
ucol_toLanguageTag.  It would cause databases on such systems use
fallback collation sequence for these collations.

Personally I prefer first solition, because I doubt that any of my
users would ever need Singalese collation, and probably they can
survive without traditional Spanish too.

--                                   Victor Wagner <vitus@wagner.pp.ru>

Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

From

Tom Lane

Date:

26 July 2017, 02:50:38

Victor Wagner <vitus@wagner.pp.ru> writes:
> PostgreSQL 10 when build with --with-icu requires ICU 4.6 and above.
> (because it searches for libicu using pkgconfig, and pgkconfig support
> significantly changed in ICU  version 4.6)

> However, there are some reasons to build PostgreSQL with older versions
> of ICU. 

No doubt, but making that happen seems like a feature, and IMO we're too
late in the v10 beta test cycle to be taking new features on board,
especially ones with inherently-large portability risks.  We could maybe
consider patches for this for v11 ... but will anyone still care by then?

(Concretely, my concern is that the alpha and beta testing that's happened
so far hasn't covered pre-4.6 ICU at all.  I'd be surprised if the
incompatibility you found so far is the only one.)
        regards, tom lane

Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

From

Victor Wagner

Date:

26 July 2017, 12:33:53

On Tue, 25 Jul 2017 16:50:38 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Victor Wagner <vitus@wagner.pp.ru> writes:
> > PostgreSQL 10 when build with --with-icu requires ICU 4.6 and above.
> > (because it searches for libicu using pkgconfig, and pgkconfig
> > support significantly changed in ICU  version 4.6)  
> 
> > However, there are some reasons to build PostgreSQL with older
> > versions of ICU.   
> 
> No doubt, but making that happen seems like a feature, and IMO we're
> too late in the v10 beta test cycle to be taking new features on

May be PGDG people could integrate it as a patch for RPMS for
particular systems which are affected by the problem.

I'd rather say that it is bugfix. Relaxing too strict checking.
If we choose approach to allow non-strict language tags, it is oneline
patch.

If we choose more safe approach to ignore non-strict collations, it
would take about five lines 
1. Replace one ereport(ERROR with ereport(WARINING
2. Return special value (NULL) after issuing this warning
3. Handle this special value in calling function by continuing to next
loop iteration
4,5 - curly braces around 1 and 2. see patch at the end

> board, especially ones with inherently-large portability risks.  We

I don't think that there are so large portability risks. Patch can be
done such way that it would behave exactly as now if ICU version is 4.6
or above. Moreover, it is easy to make old ICU support separate
configure option (because another configure test is needed anyway).

> could maybe consider patches for this for v11 ... but will anyone
> still care by then?

Undoubtedly will. v11 is expedted to be released in 2018. And RHEL 6
is supported until 2020, and extended support would end in Nov 2023.

And it is possible that some derived distribution would last longer.

> (Concretely, my concern is that the alpha and beta testing that's
> happened so far hasn't covered pre-4.6 ICU at all.  I'd be surprised
> if the incompatibility you found so far is the only one.)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index b6c14c9..9e5da98 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -442,11 +442,13 @@ get_icu_language_tag(const char *localename)       status = U_ZERO_ERROR;
uloc_toLanguageTag(localename,buf, sizeof(buf), TRUE, &status);

-       if (U_FAILURE(status))
-               ereport(ERROR,
+       if (U_FAILURE(status)) 
+       {
+               ereport(WARNING,                               (errmsg("could not convert locale name \"%s\" to
languagetag: %s",                                               localename, u_errorName(status))));

-
+               return NULL;
+       }       return pstrdup(buf);}

@@ -733,6 +735,10 @@ pg_import_system_collations(PG_FUNCTION_ARGS)                               char       *localeid =
psprintf("%s@collation=%s",name, val);                               langtag = get_icu_language_tag(localeid);

+                               if (langtag == NULL) {
+                                       /* No such collation in library, skip */
+                                       continue;
+                               }                               collcollate = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag :
localeid;                              /*

Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

From

Peter Eisentraut

Date:

01 August 2017, 05:42:30

On 7/25/17 15:20, Victor Wagner wrote:
> It turns out, that PostgreSQL enumerates collations for all ICU locales
> and passes it into uloc_toLanguageTag function with strict argument of
> this function set to TRUE. But for some locales
> (es*@collation=tradtiional, si*@collation=dictionary) only call with
> strict=FALSE succeeds in ICU 4.2. In newer versions of ICU all locales
> can be resolved with strict=TRUE.

We are only calling uloc_toLanguageTag() with keyword/value combinations
that ICU itself previously told us were supported.  So just ignoring
errors doesn't seem proper in this case.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

From

Victor Wagner

Date:

01 August 2017, 12:12:21

On Mon, 31 Jul 2017 19:42:30 -0400
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

> On 7/25/17 15:20, Victor Wagner wrote:
> > It turns out, that PostgreSQL enumerates collations for all ICU
> > locales and passes it into uloc_toLanguageTag function with strict
> > argument of this function set to TRUE. But for some locales
> > (es*@collation=tradtiional, si*@collation=dictionary) only call with
> > strict=FALSE succeeds in ICU 4.2. In newer versions of ICU all
> > locales can be resolved with strict=TRUE.  
> 
> We are only calling uloc_toLanguageTag() with keyword/value
> combinations that ICU itself previously told us were supported.  So
> just ignoring errors doesn't seem proper in this case.
> 

We know that this version of ICU is broken. But what choice we have?
Locale code in the glibc is much more broken.
So we just have to work around bugs in underlaying  libraries anyway.
In case of ICU we just need to omit some collations.

It might cause incompatibility with newer systems where these
collations are used, but if we fall back to glibc locale, there would
be much more incompatibilites. And also there is bug in strxfrm, which
prevents use of abbreviated keys.

--

Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

From

Peter Eisentraut

Date:

01 August 2017, 18:16:54

On 8/1/17 02:12, Victor Wagner wrote:
>> We are only calling uloc_toLanguageTag() with keyword/value
>> combinations that ICU itself previously told us were supported.  So
>> just ignoring errors doesn't seem proper in this case.
>>
> We know that this version of ICU is broken. But what choice we have?

I don't know that we can already reach that conclusion.  Maybe the APIs
have changed or we are not using them correctly.  Are there any bug
reports or changelog entries related to this?  I don't think we should
just give up and ignore errors.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

From

Victor Wagner

Date:

01 August 2017, 18:28:06

On Tue, 1 Aug 2017 08:16:54 -0400
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

> On 8/1/17 02:12, Victor Wagner wrote:
> >> We are only calling uloc_toLanguageTag() with keyword/value
> >> combinations that ICU itself previously told us were supported.  So
> >> just ignoring errors doesn't seem proper in this case.
> >>  
> > We know that this version of ICU is broken. But what choice we
> > have?  
> 
> I don't know that we can already reach that conclusion.  Maybe the

Because it was fixed in subsequent versions.

And 4.2 is first version where this function appeared.
So, we still have problems with SLES11 which use even older version and
have to be supported at least two more years.

> APIs have changed or we are not using them correctly.  Are there any
> bug reports or changelog entries related to this?  I don't think we
> should just give up and ignore errors.

We also can provide configuration option or command-line option for
initdb which would restrict list of languages for which collation
sequences would be created. This would help for all but two languages.

Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

From

Tom Lane

Date:

01 August 2017, 19:49:13

Victor Wagner <vitus@wagner.pp.ru> writes:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
>> I don't know that we can already reach that conclusion.  Maybe the

> Because it was fixed in subsequent versions.

> And 4.2 is first version where this function appeared.
> So, we still have problems with SLES11 which use even older version and
> have to be supported at least two more years.

I think the answer is very clear: we do not need to support old broken
versions of ICU, especially not in our first release.  We'll have enough
to do making the code stable and performant with good versions of ICU.
        regards, tom lane

Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

From

Peter Eisentraut

Date:

01 August 2017, 21:28:26

On 8/1/17 08:28, Victor Wagner wrote:
> On Tue, 1 Aug 2017 08:16:54 -0400
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
> 
>> On 8/1/17 02:12, Victor Wagner wrote:
>>>> We are only calling uloc_toLanguageTag() with keyword/value
>>>> combinations that ICU itself previously told us were supported.  So
>>>> just ignoring errors doesn't seem proper in this case.
>>>>  
>>> We know that this version of ICU is broken. But what choice we
>>> have?  
>> I don't know that we can already reach that conclusion.  Maybe the
> Because it was fixed in subsequent versions.

I propose the attached patch to address this.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

0001-Add-support-for-ICU-4.2.patch

Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

From

Peter Eisentraut

Date:

05 August 2017, 19:36:52

On 8/1/17 11:28, Peter Eisentraut wrote:
> On 8/1/17 08:28, Victor Wagner wrote:
>> On Tue, 1 Aug 2017 08:16:54 -0400
>> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
>>
>>> On 8/1/17 02:12, Victor Wagner wrote:
>>>>> We are only calling uloc_toLanguageTag() with keyword/value
>>>>> combinations that ICU itself previously told us were supported.  So
>>>>> just ignoring errors doesn't seem proper in this case.
>>>>>  
>>>> We know that this version of ICU is broken. But what choice we
>>>> have?  
>>> I don't know that we can already reach that conclusion.  Maybe the
>> Because it was fixed in subsequent versions.
> 
> I propose the attached patch to address this.

committed

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services