Thread: Q: error on updating collation version information

Q: error on updating collation version information

From
Karsten Hilbert
Date:
Dear all,

following an ICU upgrade, collations in a stock Debian PG 15.1
cluster now have divergent version information in pg_collations.

Now

    gnumed_v22=> ALTER COLLATION pg_catalog."br_FR@euro" REFRESH VERSION;
    ERROR:  collation "pg_catalog.br_FR@euro" for encoding "UTF8" does not exist

despite

    gnumed_v22=> select *, pg_encoding_to_char(collencoding) from pg_collation where collname = 'br_FR@euro';
    -[ RECORD 1 ]-------+-----------
    oid                 | 12413
    collname            | br_FR@euro
    collnamespace       | 11
    collowner           | 10
    collprovider        | c
    collisdeterministic | t
    collencoding        | 16
    collcollate         | br_FR@euro
    collctype           | br_FR@euro
    colliculocale       |
    collversion         | 2.35
    pg_encoding_to_char | LATIN9


However, note the UTF8 vs LATIN9.

The manual sayeth:

    Some (less frequently used) encodings are not supported
    by ICU. When the database encoding is one of these, ICU
    collation entries in pg_collation are ignored. Attempting
    to use one will draw an error along the lines of
    “collation "de-x-icu" for encoding "WIN874" does not
    exist”.

which sounds somewhat related.

The database encoding is UTF8. That br_FR@euro.LATIN9 had
_not_ been added manually. It is also not actively used in my
database(s).

What is the canonical advice on the way forward here ?  Is
the _suggested_ solution to delete the collation or am I
missing to see the "proper" approach to fixing it ?

Thanks,
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: Q: error on updating collation version information

From
Karsten Hilbert
Date:
Am Sun, Dec 04, 2022 at 01:22:02PM +0100 schrieb Karsten Hilbert:

>     gnumed_v22=> ALTER COLLATION pg_catalog."br_FR@euro" REFRESH VERSION;
>     ERROR:  collation "pg_catalog.br_FR@euro" for encoding "UTF8" does not exist

The OS (libc) does seem to know that collation:

    @hermes:~$ locale -a | grep br_FR
    br_FR
    br_FR@euro
    br_FR.utf8

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: Q: error on updating collation version information

From
Karsten Hilbert
Date:
Am Sun, Dec 04, 2022 at 01:22:02PM +0100 schrieb Karsten Hilbert:

> following an ICU upgrade, collations in a stock Debian PG 15.1
> cluster now have divergent version information in pg_collations.

Correction: this is following a libc upgrade 2.35 -> 2.36

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: Q: error on updating collation version information

From
Adrian Klaver
Date:
On 12/4/22 04:35, Karsten Hilbert wrote:
> Am Sun, Dec 04, 2022 at 01:22:02PM +0100 schrieb Karsten Hilbert:
> 
>> following an ICU upgrade, collations in a stock Debian PG 15.1
>> cluster now have divergent version information in pg_collations.
> 
> Correction: this is following a libc upgrade 2.35 -> 2.36

So to be clear this database is not using ICU, but collations from libc?

How was the database installed?

In first post you had:

gnumed_v22=> select *, pg_encoding_to_char(collencoding) from 
pg_collation where collname = 'br_FR@euro';
    -[ RECORD 1 ]-------+-----------
    oid                 | 12413
    collname            | br_FR@euro
    collnamespace       | 11
    collowner           | 10
    collprovider        | c
    collisdeterministic | t
    collencoding        | 16
    collcollate         | br_FR@euro
    collctype           | br_FR@euro
    colliculocale       |
    collversion         | 2.35
    pg_encoding_to_char | LATIN9

where collprovider c means libc and collversion 2.35.

Not exactly sure how that interacts with from here:

https://www.postgresql.org/docs/current/catalog-pg-collation.html

collversion text

Provider-specific version of the collation. This is recorded when the 
collation is created and then checked when it is used, to detect changes 
in the collation definition that could lead to data corruption.

> 
> Karsten
> --
> GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Q: error on updating collation version information

From
Karsten Hilbert
Date:
Am Sun, Dec 04, 2022 at 10:09:47AM -0800 schrieb Adrian Klaver:

> >>following an ICU upgrade, collations in a stock Debian PG 15.1
> >>cluster now have divergent version information in pg_collations.
> >
> >Correction: this is following a libc upgrade 2.35 -> 2.36
>
> So to be clear this database is not using ICU, but collations from libc?

Sorry for the confusion.

This database carries collations from _both_ libc and ICU in
pg_collations.

The collation in question (br_FR@euro) is _not_ in use (as in
being depended on by any in-database object).

> How was the database installed?

stock Debian

    apt-get install postgresql-15  (which gives 15.1)

followed by

    CREATE DATABASE "gnumed_v22" with owner = "redacted :-)" template = "template1" encoding = 'unicode';

as "postgres".

> In first post you had:
>
> gnumed_v22=> select *, pg_encoding_to_char(collencoding) from pg_collation where
> collname = 'br_FR@euro';
>     -[ RECORD 1 ]-------+-----------
>     oid                 | 12413
>     collname            | br_FR@euro
>     collnamespace       | 11
>     collowner           | 10
>     collprovider        | c
>     collisdeterministic | t
>     collencoding        | 16
>     collcollate         | br_FR@euro
>     collctype           | br_FR@euro
>     colliculocale       |
>     collversion         | 2.35
>     pg_encoding_to_char | LATIN9
>
> where collprovider c means libc and collversion 2.35.

Yeah, that's when I figured that I misspoke about the ICU upgrade.

Yes, there was an ICU upgrade, and yes, it did affect
collations. Those I was able to fix up (the "reindex /
revalidate constraint / refresh collation version" dance).

There also was a libc upgrade which also affected locales.
Most of them were fixable by that dance but some popped up
(such as br_FR@euro) to not be "correctable" showing the
"does not exist for encoding" error.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: Q: error on updating collation version information

From
Karsten Hilbert
Date:
Is this to be expected ?

PG 15.1 on Debian:

    gnumed_v22=# select *, pg_collation_actual_version(oid), pg_encoding_to_char(collencoding) from pg_collation where
collname= 'zh_TW'; 
    -[ RECORD 1 ]---------------+------------
    oid                         | 12985
    collname                    | zh_TW
    collnamespace               | 11
    collowner                   | 10
    collprovider                | c
    collisdeterministic         | t
    collencoding                | 4
    collcollate                 | zh_TW.euctw
    collctype                   | zh_TW.euctw
    colliculocale               |
    collversion                 | 2.35
    pg_collation_actual_version | 2.36
    pg_encoding_to_char         | EUC_TW
    -[ RECORD 2 ]---------------+------------
    oid                         | 12986
    collname                    | zh_TW
    collnamespace               | 11
    collowner                   | 10
    collprovider                | c
    collisdeterministic         | t
    collencoding                | 6
    collcollate                 | zh_TW.utf8
    collctype                   | zh_TW.utf8
    colliculocale               |
    collversion                 | 2.36
    pg_collation_actual_version | 2.36
    pg_encoding_to_char         | UTF8

    gnumed_v22=# begin;
    BEGIN
    gnumed_v22=*# alter collation pg_catalog."zh_TW" refresh version ;
    NOTICE:  version has not changed
    ALTER COLLATION
    gnumed_v22=*# alter collation pg_catalog."zh_TW.utf8" refresh version ;
    NOTICE:  version has not changed
    ALTER COLLATION
    gnumed_v22=*# alter collation pg_catalog."zh_TW.euctw" refresh version ;
    ERROR:  collation "pg_catalog.zh_TW.euctw" for encoding "UTF8" does not exist
    gnumed_v22=!#

As far as I can tell the documentation asserts that since the
database encoding is UTF8 the pg_catalog."zh_TW.euctw" will
be ignored by the server for all practical purposes.

Does this mean it is impossible to "correct" its version
information ?

And if so, that is expected to be non-harmful and is not
expected to trigger nag messages ?

Thanks,
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: Q: error on updating collation version information

From
"Daniel Verite"
Date:
    Karsten Hilbert wrote:

> The database encoding is UTF8. That br_FR@euro.LATIN9 had
> _not_ been added manually. It is also not actively used in my
> database(s).

br_FR@euro.LATIN9 cannot be used actively in an UTF-8 database
because it's for a different encoding than the database.

It was probably available in the OS at initdb time. Every available
locale (see locale -a) gets imported into the template databases,
and then into the other databases, since CREATE DATABASE copies a
template without filtering out the locales that are incompatible
with the target database.

There's no need to upgrade anything in the OS to get the
ALTER COLLATION... REFRESH error you mention.
It's sufficient to have a mix of collations for different encodings
and try to refresh collations whose encoding are not compatible
with the current database.


> What is the canonical advice on the way forward here ?  Is
> the _suggested_ solution to delete the collation or am I
> missing to see the "proper" approach to fixing it ?

ISTM that dropping that collation from any non-LATIN9 database
is the more sensible action.


Maybe it could be discussed as a possible improvement to have
ALTER COLLATION... REFRESH ignore the database encoding and still
refresh the version number.


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite



dropped default locale

From
Karsten Hilbert
Date:
Dear all,

I managed to drop the "special" collations default, C, and
POSIX with OIDs 100, 950, 951.

Is there a way to recreate them (short of restoring a backup)
? Naive attempts with create collation do not seem to work
out.

Thanks,
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: dropped default locale

From
Laurenz Albe
Date:
On Wed, 2022-12-21 at 15:24 +0100, Karsten Hilbert wrote:
> I managed to drop the "special" collations default, C, and
> POSIX with OIDs 100, 950, 951.
> 
> Is there a way to recreate them (short of restoring a backup)
> ? Naive attempts with create collation do not seem to work
> out.

I would definitely go for the backup, but here is how you can
create these three rows in PostgreSQL v15:

  INSERT INTO pg_collation
    (oid, collname, collnamespace, collowner, collprovider,
     collisdeterministic, collencoding, collcollate, collctype)
  VALUES
    (100, 'default', 11, 10, 'd', TRUE, -1, NULL,    NULL),
    (950, 'C',       11, 10, 'c', TRUE, -1, 'C',     'C'),
    (951, 'POSIX',   11, 10, 'c', TRUE, -1, 'POSIX', 'POSIX');

Yours,
Laurenz Albe



Re: dropped default locale

From
Karsten Hilbert
Date:
Am Wed, Dec 21, 2022 at 03:41:24PM +0100 schrieb Laurenz Albe:

> I would definitely go for the backup, but here is how you can
> create these three rows in PostgreSQL v15:
>
>   INSERT INTO pg_collation
>     (oid, collname, collnamespace, collowner, collprovider,
>      collisdeterministic, collencoding, collcollate, collctype)
>   VALUES
>     (100, 'default', 11, 10, 'd', TRUE, -1, NULL,    NULL),
>     (950, 'C',       11, 10, 'c', TRUE, -1, 'C',     'C'),
>     (951, 'POSIX',   11, 10, 'c', TRUE, -1, 'POSIX', 'POSIX');

Many thanks !  I wasn't so sure whether inserting appropriate
rows would be equivalent to create collation... (pg_collation
might have been a view projecting inner workings of the
server engine).

Thanks,
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: dropped default locale

From
Karsten Hilbert
Date:
Am Wed, Dec 21, 2022 at 03:46:48PM +0100 schrieb Karsten Hilbert:

> I wasn't so sure whether inserting appropriate
> rows would be equivalent to create collation...

For that matter, is DELETE FROM pg_collation ... equivalent
to DROP COLLATION ?

Thanks,
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: dropped default locale

From
Tom Lane
Date:
Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:
> For that matter, is DELETE FROM pg_collation ... equivalent
> to DROP COLLATION ?

There's also entries in pg_depend and pg_shdepend to worry
about.

For these built-in collations, as of v15 there are no such
entries, but prior versions had explicit "pin" entries.

            regards, tom lane