Thread: pg_upgrade fails to preserve old versions of the predefinedcollations

pg_upgrade fails to preserve old versions of the predefinedcollations

From
Alexander Lakhin
Date:
Hello hackers,

When dealing with an OS upgrade, a some kind of anomaly related to
collations was found.
Suppose, we have Debian 8 with postgresql 12 installed.
Then we create a custom collation:
CREATE COLLATION russian (provider=icu, locale='ru_RU');
and
SELECT oid, collname, collnamespace, collprovider, collversion FROM
pg_collation WHERE collname like 'ru%';
returns
 12847 | ru-RU-x-icu |            11 | i            | 58.0.0.50
...
 16384 | russian     |          2200 | i            | 58.0.0.50
Then let's create two tables with text columns and indexes and fill them
with some data:
CREATE TABLE test_icu_ru (f1 varchar COLLATE "ru-RU-x-icu", i int);
INSERT INTO test_icu_ru SELECT chr(x), x FROM generate_series(1, 2000)
as y(x); CREATE INDEX ON test_icu_ru (f1);

CREATE TABLE test_icu_russian (f1 varchar COLLATE "russian", i int);
INSERT INTO test_icu_russian SELECT chr(x), x FROM generate_series(1,
2000) as y(x); CREATE INDEX ON test_icu_russian (f1);

Perform two test queries:
postgres=# select * from test_icu_ru where f1=chr(821);
 f1 |  i 
----+-----
  ̵ | 821
(1 row)

postgres=# select * from test_icu_russian where f1=chr(821);
 f1 |  i 
----+-----
  ̵ | 821
(1 row)

postgres=# EXPLAIN select * from test_icu_ru where f1=chr(821);
                                      QUERY
PLAN                                     
--------------------------------------------------------------------------------------
 Index Scan using test_icu_ru_f1_idx on test_icu_ru  (cost=0.28..8.29
rows=1 width=6)
   Index Cond: ((f1)::text = '̵'::text)
(2 rows)

postgres=# EXPLAIN select * from test_icu_russian where f1=chr(821);
                                           QUERY
PLAN                                          
------------------------------------------------------------------------------------------------
 Index Scan using test_icu_russian_f1_idx on test_icu_russian 
(cost=0.28..8.29 rows=1 width=6)
   Index Cond: ((f1)::text = '̵'::text)
(2 rows)
(The indexes are indeed used by the above queries.)

Now suppose that the OS is upgraded to Debian 9 (or the pgdata just
moved to Debian 9 with the postgresql 12).
The same queries return:
postgres=# select * from test_icu_ru where f1=chr(821);
WARNING:  collation "ru-RU-x-icu" has version mismatch
DETAIL:  The collation in the database was created using version
58.0.0.50, but the operating system provides version 153.64.29.
HINT:  Rebuild all objects affected by this collation and run ALTER
COLLATION pg_catalog."ru-RU-x-icu" REFRESH VERSION, or build PostgreSQL
with the right library version.
 f1 | i
----+---
(0 rows)

postgres=# select * from test_icu_russian where f1=chr(821);
WARNING:  collation "russian" has version mismatch
DETAIL:  The collation in the database was created using version
58.0.0.50, but the operating system provides version 153.64.29.
HINT:  Rebuild all objects affected by this collation and run ALTER
COLLATION public.russian REFRESH VERSION, or build PostgreSQL with the
right library version.
 f1 | i
----+---
(0 rows)

We get no data due to the real collation/sort order change but the
warning says what to do.
The query presented at
https://www.postgresql.org/docs/12/sql-altercollation.html returns:
        Collation        |               Object               
-------------------------+-------------------------------------
 collation "ru-RU-x-icu" | column f1 of table test_icu_ru
 collation "ru-RU-x-icu" | index test_icu_ru_f1_idx
 collation russian       | column f1 of table test_icu_russian
 collation russian       | index test_icu_russian_f1_idx
So the documented behavior is observed.

But after pg_upgrade:
pg_createcluster 12 new
/usr/lib/postgresql/12/bin/pg_upgrade -b /usr/lib/postgresql/12/bin -B
/usr/lib/postgresql/12/bin -d /etc/postgresql/12/main -D
/etc/postgresql/12/new
In the new cluster the same queries return:
postgres=# select * from test_icu_russian where f1=chr(821);
WARNING:  collation "russian" has version mismatch
DETAIL:  The collation in the database was created using version
58.0.0.50, but the operating system provides version 153.64.29.
HINT:  Rebuild all objects affected by this collation and run ALTER
COLLATION public.russian REFRESH VERSION, or build PostgreSQL with the
right library version.
 f1 | i
----+---
(0 rows)

postgres=# select * from test_icu_ru where f1=chr(821);
 f1 | i
----+---
(0 rows)
(There is no warning for the predefined collation now.)

The query presented at
https://www.postgresql.org/docs/12/sql-altercollation.html returns:
     Collation     |               Object               
-------------------+-------------------------------------
 collation russian | column f1 of table test_icu_russian
 collation russian | index test_icu_russian_f1_idx
(2 rows)

and
SELECT oid, collname, collnamespace, collprovider, collversion FROM
pg_collation WHERE collname like 'ru%';
returns
  oid  |  collname   | collnamespace | collprovider | collversion
-------+-------------+---------------+--------------+-------------
 12884 | ru-BY-x-icu |            11 | i            | 153.64.29
 12885 | ru-KG-x-icu |            11 | i            | 153.64.29
 12886 | ru-KZ-x-icu |            11 | i            | 153.64.29
 12887 | ru-MD-x-icu |            11 | i            | 153.64.29
 12888 | ru-RU-x-icu |            11 | i            | 153.64.29
 12889 | ru-UA-x-icu |            11 | i            | 153.64.29
 12883 | ru-x-icu    |            11 | i            | 153.64.29
 12329 | ru_RU       |            11 | c            |
 12328 | ru_RU.utf8  |            11 | c            |
 16402 | russian     |          2200 | i            | 58.0.0.50
So only the custom collation' version is actual, but predefined ones
correspond to a newer libicu, but not to actual data.

For all that, REINDEX repairs both indexes:
postgres=# REINDEX INDEX test_icu_russian_f1_idx;
REINDEX
postgres=# select * from test_icu_russian where f1=chr(821);
 f1 |  i 
----+-----
  ̵ | 821
(1 row)

postgres=# REINDEX INDEX test_icu_ru_f1_idx;
REINDEX
postgres=# select * from test_icu_ru where f1=chr(821);
 f1 |  i 
----+-----
  ̵ | 821
(1 row)

So for now it seems dangerous to use predefined collations as their old
versions are not preserved by pg_upgrade and the user doesn't know which
indexes affected by the actual ICU collation changes.


Best regards,
Alexander



Re: pg_upgrade fails to preserve old versions of the predefined collations

From
Thomas Munro
Date:
On Fri, Nov 29, 2019 at 9:08 AM Alexander Lakhin <exclusion@gmail.com> wrote:
> So for now it seems dangerous to use predefined collations as their old
> versions are not preserved by pg_upgrade and the user doesn't know which
> indexes affected by the actual ICU collation changes.

Yeah, we noticed this while working on a proposal for new
per-database-object version dependency tracking, and Peter E has
written a patch to address it:

https://commitfest.postgresql.org/25/2328/



Re: pg_upgrade fails to preserve old versions of the predefinedcollations

From
Alexander Lakhin
Date:
28.11.2019 23:25, Thomas Munro пишет:
> On Fri, Nov 29, 2019 at 9:08 AM Alexander Lakhin <exclusion@gmail.com> wrote:
>> So for now it seems dangerous to use predefined collations as their old
>> versions are not preserved by pg_upgrade and the user doesn't know which
>> indexes affected by the actual ICU collation changes.
> Yeah, we noticed this while working on a proposal for new
> per-database-object version dependency tracking, and Peter E has
> written a patch to address it:
>
> https://commitfest.postgresql.org/25/2328/
Thank you! This patch is working for me. After pg_upgrade with the
applied patch I'm getting:
postgres=# SELECT oid, collname, collnamespace, collprovider,
collversion FROM pg_collation WHERE collname like 'ru%';
  oid  |  collname   | collnamespace | collprovider | collversion
-------+-------------+---------------+--------------+-------------
 17561 | ru-BY-x-icu |            11 | i            | 58.0.0.50
 17562 | ru-KG-x-icu |            11 | i            | 58.0.0.50
 17563 | ru-KZ-x-icu |            11 | i            | 58.0.0.50
 17564 | ru-MD-x-icu |            11 | i            | 58.0.0.50
 17565 | ru-RU-x-icu |            11 | i            | 58.0.0.50
 17566 | ru-UA-x-icu |            11 | i            | 58.0.0.50
 17567 | ru-x-icu    |            11 | i            | 58.0.0.50
 17568 | ru_RU       |            11 | c            |
 17569 | ru_RU.utf8  |            11 | c            |
 17696 | russian     |          2200 | i            | 58.0.0.50
(10 rows)

Best regards,
Alexander