Thread: COLLATION update in 13.1

COLLATION update in 13.1

From
Matthias Apitz
Date:
Hello,

When the Linux OS is updated, for example from SLES 15 SP5 to SP6, the version of the glibc is sometimes updated, for example from 2.31 to 2.38. For existing databases this gives on SQL a warning as:

user@rechner: $SC_SQL -Usisis sisis
WARNING:  database "sisis" has a collation version mismatch
DETAIL:  The database was created using collation version 2.31, but the operating system provides version 2.38.
HINT:  Rebuild all objects in this database that use the default collation and run 
ALTER DATABASE sisis REFRESH COLLATION VERSION, or build PostgreSQL with the right library version. ...
This HINT works fine on 15.x and 16.x server versions.

On 13.1 it gives an SQL ERROR:

sisis=# ALTER DATABASE sisis REFRESH COLLATION VERSION;
ERROR:  syntax error at or near "REFRESH"
LINE 1: ALTER DATABASE sisis REFRESH COLLATION VERSION;


What is the procedure on 13.1 to bring the external (glibc) version in sync with. the used version in the PostgreSQL database?


Thanks

matthias



Re: COLLATION update in 13.1

From
Jeremy Schneider
Date:
On Mon, 24 Feb 2025 11:08:43 +0100
Matthias Apitz <gurucubano@googlemail.com> wrote:

> 
> What is the procedure on 13.1 to bring the external (glibc) version
> in sync with. the used version in the PostgreSQL database?

If I recall correctly, between versions 10 and 14 you need to use ALTER
COLLATION name REFRESH VERSION on every collation where there's a
version mismatch. (This can happen with ICU collations since version 10,
and with glibc collations since version 13.)

I only know of a small number of changes between glibc versions 2.31 and
2.38 ~ I suspect you already know to look out for glibc 2.28 which was
the crazy one. (SLE15 Service Pack 3.) Most databases would be
corrupted by that update, and running "refresh version" would simply
stop the warnings without fixing the corruption.

-Jeremy



Re: COLLATION update in 13.1

From
Matthias Apitz
Date:
Thanks for your hint, Jeremy. But this does not work either:

postgres=# SELECT collname, collversion FROM pg_collation where collname = 'de_DE.utf8';
  collname  | collversion
------------+-------------
 de_DE.utf8 | 2.38
(1 row)

postgres=# ALTER COLLATION de_DE.utf8 REFRESH VERSION;
ERROR:  schema "de_de" does not exist

What do I wrong?

Matthia

On Mon, Feb 24, 2025 at 11:32 AM Jeremy Schneider <schneider@ardentperf.com> wrote:
On Mon, 24 Feb 2025 11:08:43 +0100
Matthias Apitz <gurucubano@googlemail.com> wrote:

>
> What is the procedure on 13.1 to bring the external (glibc) version
> in sync with. the used version in the PostgreSQL database?

If I recall correctly, between versions 10 and 14 you need to use ALTER
COLLATION name REFRESH VERSION on every collation where there's a
version mismatch. (This can happen with ICU collations since version 10,
and with glibc collations since version 13.)

I only know of a small number of changes between glibc versions 2.31 and
2.38 ~ I suspect you already know to look out for glibc 2.28 which was
the crazy one. (SLE15 Service Pack 3.) Most databases would be
corrupted by that update, and running "refresh version" would simply
stop the warnings without fixing the corruption.

-Jeremy

Re: COLLATION update in 13.1

From
Dominique Devienne
Date:
On Mon, Feb 24, 2025 at 12:33 PM Matthias Apitz <gurucubano@googlemail.com> wrote:
Thanks for your hint, Jeremy. But this does not work either:

postgres=# SELECT collname, collversion FROM pg_collation where collname = 'de_DE.utf8';
  collname  | collversion
------------+-------------
 de_DE.utf8 | 2.38
(1 row)

postgres=# ALTER COLLATION de_DE.utf8 REFRESH VERSION;
ERROR:  schema "de_de" does not exist

What do I wrong?

Missing quotes.  ALTER COLLATION "de_DE.utf8" REFRESH VERSION;

Re: COLLATION update in 13.1

From
Laurenz Albe
Date:
On Mon, 2025-02-24 at 02:32 -0800, Jeremy Schneider wrote:
> On Mon, 24 Feb 2025 11:08:43 +0100
> Matthias Apitz <gurucubano@googlemail.com> wrote:
> > What is the procedure on 13.1 to bring the external (glibc) version
> > in sync with. the used version in the PostgreSQL database?
> 
> If I recall correctly, between versions 10 and 14 you need to use ALTER
> COLLATION name REFRESH VERSION on every collation where there's a
> version mismatch. (This can happen with ICU collations since version 10,
> and with glibc collations since version 13.)

Perhaps I need not say that, but ALTER COLLATION ... REFRESH VERSION only
makes the warning disappear.

If you want to avoid data corruption, rebuild all indexes on strings,
then make the warning disappear.

Yours,
Laurenz Albe

-- 

*E-Mail Disclaimer*
Der Inhalt dieser E-Mail ist ausschliesslich fuer den 
bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat 
dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte, 
dass jede Form der Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung oder 
Weitergabe des Inhalts dieser E-Mail unzulaessig ist. Wir bitten Sie, sich 
in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen.

*CONFIDENTIALITY NOTICE & DISCLAIMER
*This message and any attachment are 
confidential and may be privileged or otherwise protected from disclosure 
and solely for the use of the person(s) or entity to whom it is intended. 
If you have received this message in error and are not the intended 
recipient, please notify the sender immediately and delete this message and 
any attachment from your system. If you are not the intended recipient, be 
advised that any use of this message is prohibited and may be unlawful, and 
you must not copy this message or attachment or disclose the contents to 
any other person.



Re: COLLATION update in 13.1

From
Matthias Apitz
Date:
Thanks. I tried a lot of combinations. Based on the output of \l

                                                  List of databases
    Name    |  Owner   | Encoding |   Collate   |    Ctype    | ICU Locale | Locale Provider |   Access privileges
------------+----------+----------+-------------+-------------+------------+-----------------+-----------------------
 bar        | foo      | UTF8     | de_DE.UTF-8 | de_DE.UTF-8 |            | libc            |
 customers  | sisis    | UTF8     | de_DE.UTF-8 | de_DE.UTF-8 |            | libc            |
...

postgres=# ALTER COLLATION "de_DE.UTF8" REFRESH VERSION;
ERROR:  collation "de_DE.UTF8" for encoding "UTF8" does not exist

yours (Dominique) seems to work:

postgres=# ALTER COLLATION "de_DE.utf8" REFRESH VERSION;
NOTICE:  version has not changed
ALTER COLLATION

If I understand the other reply from Laurenz Albe right, the correct procedure would be:

pgsql -Usisis sisis
sisis=# REINDEX (VERBOSE) DATABASE sisis;
sisis=# ALTER COLLATION "de_DE.utf8" REFRESH VERSION;
ALTER COLLATION

Correct?



On Mon, Feb 24, 2025 at 12:35 PM Dominique Devienne <ddevienne@gmail.com> wrote:
On Mon, Feb 24, 2025 at 12:33 PM Matthias Apitz <gurucubano@googlemail.com> wrote:
Thanks for your hint, Jeremy. But this does not work either:

postgres=# SELECT collname, collversion FROM pg_collation where collname = 'de_DE.utf8';
  collname  | collversion
------------+-------------
 de_DE.utf8 | 2.38
(1 row)

postgres=# ALTER COLLATION de_DE.utf8 REFRESH VERSION;
ERROR:  schema "de_de" does not exist

What do I wrong?

Missing quotes.  ALTER COLLATION "de_DE.utf8" REFRESH VERSION;

Re: COLLATION update in 13.1

From
Laurenz Albe
Date:
On Mon, 2025-02-24 at 12:53 +0100, Matthias Apitz wrote:
> If I understand the other reply from Laurenz Albe right, the correct procedure would be:
> 
> pgsql -Usisis sisis
> sisis=# REINDEX (VERBOSE) DATABASE sisis;
> sisis=# ALTER COLLATION "de_DE.utf8" REFRESH VERSION;
> ALTER COLLATION
> 
> Correct?

That REINDEX is certainly correct, even though it rebuilds way more
indexes than necessary.

If the ALTER COLLATION statement is correct or not depends on the
collation you are using.  You could look at "\l" and "pg_collation"
to get the name right.

Yours,
Laurenz Albe

-- 

*E-Mail Disclaimer*
Der Inhalt dieser E-Mail ist ausschliesslich fuer den 
bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat 
dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte, 
dass jede Form der Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung oder 
Weitergabe des Inhalts dieser E-Mail unzulaessig ist. Wir bitten Sie, sich 
in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen.

*CONFIDENTIALITY NOTICE & DISCLAIMER
*This message and any attachment are 
confidential and may be privileged or otherwise protected from disclosure 
and solely for the use of the person(s) or entity to whom it is intended. 
If you have received this message in error and are not the intended 
recipient, please notify the sender immediately and delete this message and 
any attachment from your system. If you are not the intended recipient, be 
advised that any use of this message is prohibited and may be unlawful, and 
you must not copy this message or attachment or disclose the contents to 
any other person.



Re: COLLATION update in 13.1

From
Matthias Apitz
Date:
Thanks. I did \l before which gives:

                                                  List of databases
    Name    |  Owner   | Encoding |   Collate   |    Ctype    | ICU Locale | Locale Provider |   Access privileges
------------+----------+----------+-------------+-------------+------------+-----------------+-----------------------
 bar        | foo      | UTF8     | de_DE.UTF-8 | de_DE.UTF-8 |            | libc            |
 customers  | sisis    | UTF8     | de_DE.UTF-8 | de_DE.UTF-8 |            | libc            |
...

But why the ALTER statement needs the spelling different as the output of \l :

sisis=# ALTER COLLATION "de_DE.UTF-8" REFRESH VERSION;
ERROR:  collation "de_DE.UTF-8" for encoding "UTF8" does not exist

sisis=# ALTER COLLATION "de_DE.utf8" REFRESH VERSION;
NOTICE:  version has not changed
ALTER COLLATION

?


On Mon, Feb 24, 2025 at 12:57 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Mon, 2025-02-24 at 12:53 +0100, Matthias Apitz wrote:
> If I understand the other reply from Laurenz Albe right, the correct procedure would be:
>
> pgsql -Usisis sisis
> sisis=# REINDEX (VERBOSE) DATABASE sisis;
> sisis=# ALTER COLLATION "de_DE.utf8" REFRESH VERSION;
> ALTER COLLATION
>
> Correct?

That REINDEX is certainly correct, even though it rebuilds way more
indexes than necessary.

If the ALTER COLLATION statement is correct or not depends on the
collation you are using.  You could look at "\l" and "pg_collation"
to get the name right.

Yours,
Laurenz Albe

--

*E-Mail Disclaimer*
Der Inhalt dieser E-Mail ist ausschliesslich fuer den
bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat
dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte,
dass jede Form der Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung oder
Weitergabe des Inhalts dieser E-Mail unzulaessig ist. Wir bitten Sie, sich
in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen.

*CONFIDENTIALITY NOTICE & DISCLAIMER
*This message and any attachment are
confidential and may be privileged or otherwise protected from disclosure
and solely for the use of the person(s) or entity to whom it is intended.
If you have received this message in error and are not the intended
recipient, please notify the sender immediately and delete this message and
any attachment from your system. If you are not the intended recipient, be
advised that any use of this message is prohibited and may be unlawful, and
you must not copy this message or attachment or disclose the contents to
any other person.

Re: COLLATION update in 13.1

From
Ron Johnson
Date:
On Mon, Feb 24, 2025 at 6:53 AM Matthias Apitz <gurucubano@googlemail.com> wrote:
[snip] 
pgsql -Usisis sisis
sisis=# REINDEX (VERBOSE) DATABASE sisis;
sisis=# ALTER COLLATION "de_DE.utf8" REFRESH VERSION;
ALTER COLLATION

Correct?

Just reindex those with text columns.

create or replace view dba.all_indices_types as
    select tbcl.relnamespace::regnamespace::text||'.'||tbcl.relname as table_name
            , ndcl.relname as index_name
            , array_agg(ty.typname order by att.attnum) as index_types
    from pg_class ndcl
        inner join pg_index nd
            on (ndcl.oid = nd.indexrelid and ndcl.relkind = 'i')
        inner join pg_class tbcl
            on (nd.indrelid = tbcl.oid and tbcl.relkind = 'r')
        inner join pg_attribute att
            on att.attrelid = nd.indexrelid
        inner join pg_type ty
            on att.atttypid = ty.oid
    where tbcl.relnamespace::regnamespace::text != 'pg_catalog'
    group by tbcl.relnamespace::regnamespace::text||'.'||tbcl.relname
            , ndcl.relname
    order by 1, 2;
select * from dba.all_indices_types where index_types && '{"text","varchar","char"}';

(This view might not handle indices on the parents of declared-partition tables.)

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: COLLATION update in 13.1

From
Laurenz Albe
Date:
On Mon, 2025-02-24 at 13:07 +0100, Matthias Apitz wrote:
> Thanks. I did \l before which gives:
>
>                                                   List of databases
>     Name    |  Owner   | Encoding |   Collate   |    Ctype    | ICU Locale | Locale Provider |   Access privileges
> ------------+----------+----------+-------------+-------------+------------+-----------------+-----------------------
>  bar        | foo      | UTF8     | de_DE.UTF-8 | de_DE.UTF-8 |            | libc            |
>  customers  | sisis    | UTF8     | de_DE.UTF-8 | de_DE.UTF-8 |            | libc            |
> ...
>
> But why the ALTER statement needs the spelling different as the output of \l :
>
> sisis=# ALTER COLLATION "de_DE.UTF-8" REFRESH VERSION;
> ERROR:  collation "de_DE.UTF-8" for encoding "UTF8" does not exist
>
> sisis=# ALTER COLLATION "de_DE.utf8" REFRESH VERSION;
> NOTICE:  version has not changed
> ALTER COLLATION

Yes, that is annoying.  You can use other collation names in CREATE DATABASE
than the ones in pg_collation.  You have to choose the name that is used in
pg_collation, which is probably "de_DE.utf8".

Really, they refer to the same collation.

Yours,
Laurenz Albe

--

*E-Mail Disclaimer*
Der Inhalt dieser E-Mail ist ausschliesslich fuer den
bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat
dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte,
dass jede Form der Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung oder
Weitergabe des Inhalts dieser E-Mail unzulaessig ist. Wir bitten Sie, sich
in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen.

*CONFIDENTIALITY NOTICE & DISCLAIMER
*This message and any attachment are
confidential and may be privileged or otherwise protected from disclosure
and solely for the use of the person(s) or entity to whom it is intended.
If you have received this message in error and are not the intended
recipient, please notify the sender immediately and delete this message and
any attachment from your system. If you are not the intended recipient, be
advised that any use of this message is prohibited and may be unlawful, and
you must not copy this message or attachment or disclose the contents to
any other person.



Re: COLLATION update in 13.1

From
"Daniel Verite"
Date:
    Matthias Apitz wrote:

> Thanks. I did \l before which gives:
>
>                                                  List of databases
>    Name    |  Owner   | Encoding |   Collate   |    Ctype    | ICU Locale
> | Locale Provider |   Access privileges
> ------------+----------+----------+-------------+-------------+------------+-----------------+-----------------------
> bar        | foo      | UTF8     | de_DE.UTF-8 | de_DE.UTF-8 |
> | libc            |
> customers  | sisis    | UTF8     | de_DE.UTF-8 | de_DE.UTF-8 |
> | libc            |
> ...
>
> But why the ALTER statement needs the spelling different as the output of
> \l :
>
> sisis=# ALTER COLLATION "de_DE.UTF-8" REFRESH VERSION;
> ERROR:  collation "de_DE.UTF-8" for encoding "UTF8" does not exist

The "Collate" and "Ctype" columns in the output of \l refer to a
locale name that is passed to libc to use locale-dependant functions.
It's somewhat counter-intuitive, but it's technically not a database
collation, and it's independent from the de_DE.utf8 collation that
exists in the database, in the sense that de_DE.utf8 is the name
of a database object whereas a locale is not a database object.

In fact, if your applications always use the default collation like
most apps do (that is, it never uses explicit COLLATE clauses), then
you could even issue DROP COLLATION "de_DE.utf8" or
ALTER COLLATION "de_DE.utf8" RENAME TO "foobar" and it
would not have any notable effect.
 \l would still report "de_DE.UTF-8" as it did previously.
That's because "de_DE.utf8" is not the default collation, it's
a collation that happens to correspond to the same locale as the
default collation. The default collation is named "default", it
lives in the "pg_catalog" namespace, and it cannot be dropped since
it's a system object.

Technically the ALTER DATABASE xxx REFRESH COLLATION VERSION updates
the pg_database.datcollversion field (for PG15+, before that it did
not exist), whereas the ALTER COLLATION xxx REFRESH VERSION updates
the pg_collation.collversion field.

With PG15+, ALTER DATABASE xxx REFRESH COLLATION VERSION does
not imply any ALTER COLLATION. If you do only the ALTER DATABASE,
all the collations in pg_collation still have their collversion
fields that lag behind. But it only matters if these collations
are actually used by explicit COLLATE clauses, otherwise
Postgres will never use them and thus never emit any warning.


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/