ICU 54 and earlier are too dangerous - Mailing list pgsql-hackers

From Jeff Davis
Subject ICU 54 and earlier are too dangerous
Date
Msg-id ea927ede4e8a8f3ba515b15a083577a68e9f9201.camel@j-davis.com
Whole thread Raw
Responses Re: ICU 54 and earlier are too dangerous  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: ICU 54 and earlier are too dangerous  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
In ICU 54 and earlier, if ucol_open() is unable to find a matching
locale, it will fall back to the *environment*.

Using ICU 54:

  initdb -D data -N --locale="en_US.UTF-8"
  pg_ctl -D data -l logfile start
  psql postgres -c "create collation asdf(provider=icu, locale='asdf')"
  # returns true
  psql postgres -c "select 'abc' collate asdf < 'ABC' collate asdf"
  psql postgres -c "alter system set lc_messages='C'"
  pg_ctl -D data -l logfile restart
  # returns false and warns about collation version mismatch
  psql postgres -c "select 'abc' collate asdf < 'ABC' collate asdf"

This was fixed in ICU 55 to fall back to the root locale instead[1],
which is stable, has a collator version, and is not dependent on the
environment. As far as I can tell, 55 and later never fall back to the
environment when opening a collator (unless you explicitly pass NULL to
ucol_open(), which is documented).

It would be nice if we could detect when this fallback-to-environment
happens, so that we could just refuse to create the bogus collation.
But I didn't find a good way. There are non-error return codes from
ucol_open() that seem promising[2], but they aren't actually very
useful to distinguish the fallback-to-environment case as far as I can
tell.

Unless someone has a better idea, I think we need to bump the minimum
required ICU version to 55. That would solve the issue in v16 and
later, but those using old versions of ICU and old versions of postgres
would still be vulnerable to these kinds of typos.

Regards,
    Jeff Davis


[1] https://icu.unicode.org/download/55m1
[2]
https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/utypes_8h.html#a3343c1c8a8377277046774691c98d78c



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Combine pg_walinspect till_end_of_wal functions with others
Next
From: Tom Lane
Date:
Subject: Re: pg_dump versus hash partitioning