Thread: Q: fixing collation version mismatches

Q: fixing collation version mismatches

From
Karsten Hilbert
Date:
Dear all,

just to confirm my understanding:

Is it correct to say that the following sequence will "fix"
all current collation version issues in a given database ?

    REINDEX DATABASE db_in_question;
    ALTER DATABASE db_in_question REFRESH COLLATION VERSION;
    ALTER COLLATION every_collation_from_pg_collation REFRESH VERSION;

Note that I am currently _not_ concerned with minimizing
work by running this on objects only that really need a
reindex/refresh.

Thanks,
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: Q: fixing collation version mismatches

From
Christophe Pettus
Date:

> On Nov 13, 2022, at 12:45, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote:
>     REINDEX DATABASE db_in_question;
>     ALTER DATABASE db_in_question REFRESH COLLATION VERSION;
>     ALTER COLLATION every_collation_from_pg_collation REFRESH VERSION;

I may be totally off-base here, but shouldn't the REINDEX be the last step?



Re: Q: fixing collation version mismatches

From
Karsten Hilbert
Date:
Am Sun, Nov 13, 2022 at 12:46:53PM -0800 schrieb Christophe Pettus:

> > On Nov 13, 2022, at 12:45, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote:
> >     REINDEX DATABASE db_in_question;
> >     ALTER DATABASE db_in_question REFRESH COLLATION VERSION;
> >     ALTER COLLATION every_collation_from_pg_collation REFRESH VERSION;
>
> I may be totally off-base here, but shouldn't the REINDEX be the last step?

To my understanding, the REFRESH statements "merely" update
the version information stored in the related objects. They
do not change anything else; and the REINDEX does not
reference them in any way.

I suppose the REINDEX goes first as it does the actual fixing
of now-invalid objects by rebuilding them. After that one is
back to a usable database state, even if left with pesky
(albeit harmless) warnings on version mismatches -- which to
get rid of one runs the REFRESH statements.

Or so my understanding...

Which is why my question still stands: does the above
three-strikes operation safely take care of any collation
issues that may currently exist in a database ?

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: Q: fixing collation version mismatches

From
Julien Rouhaud
Date:
Le lun. 14 nov. 2022 à 05:58, Karsten Hilbert <Karsten.Hilbert@gmx.net> a écrit :
Am Sun, Nov 13, 2022 at 12:46:53PM -0800 schrieb Christophe Pettus:

> > On Nov 13, 2022, at 12:45, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote:
> >     REINDEX DATABASE db_in_question;
> >     ALTER DATABASE db_in_question REFRESH COLLATION VERSION;
> >     ALTER COLLATION every_collation_from_pg_collation REFRESH VERSION;
>
> I may be totally off-base here, but shouldn't the REINDEX be the last step?

To my understanding, the REFRESH statements "merely" update
the version information stored in the related objects. They
do not change anything else; and the REINDEX does not
reference them in any way.

I suppose the REINDEX goes first as it does the actual fixing
of now-invalid objects by rebuilding them. After that one is
back to a usable database state, even if left with pesky
(albeit harmless) warnings on version mismatches -- which to
get rid of one runs the REFRESH statements.

Or so my understanding...

yes exactly. but it's likely that people will have some form of automation to run the reindex if there's any discrepancy between the recorded collation version and recorded version, so if you first fix the versions metada and then encounter any error during the reindex, you won't know if you need to reindex or not and might end up with corrupt indexes. 

Which is why my question still stands: does the above
three-strikes operation safely take care of any collation
issues that may currently exist in a database ?

yes

Re: Q: fixing collation version mismatches

From
Julien Rouhaud
Date:
Le lun. 14 nov. 2022 à 13:10, Julien Rouhaud <rjuju123@gmail.com> a écrit :
yes exactly. but it's likely that people will have some form of automation to run the reindex if there's any discrepancy between the recorded collation version and recorded version, 

sorry I meant "and the current version" 

Re: Q: fixing collation version mismatches

From
"Daniel Verite"
Date:
    Karsten Hilbert wrote:

> Which is why my question still stands: does the above
> three-strikes operation safely take care of any collation
> issues that may currently exist in a database ?

For the indexes, yes, but theorically, all constraints involving collatable
types need a recheck.

For foreign key constraints with non-deterministic collations, there
might be equality tests that pass with an older Unicode version and fail
with a newer Unicode version.

For check constraints as well, checks applied to strings with recent
Unicode characters can give different results after an upgrade.


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite



Re: Q: fixing collation version mismatches

From
Karsten Hilbert
Date:
Am Mon, Nov 14, 2022 at 05:42:16PM +0100 schrieb Daniel Verite:

> > Which is why my question still stands: does the above
> > three-strikes operation safely take care of any collation
> > issues that may currently exist in a database ?
>
> For the indexes, yes, but theorically, all constraints involving collatable
> types need a recheck.
>
> For foreign key constraints with non-deterministic collations, there
> might be equality tests that pass with an older Unicode version and fail
> with a newer Unicode version.

Which gives weight to the argument that using real-world data
(instead of surrogate keys) may lead to trouble.

> For check constraints as well, checks applied to strings with recent
> Unicode characters can give different results after an upgrade.

Thanks for pointing this out more clearly. My thinking
already evolved towards also including VALIDATE CONSTRAINT.

I shall, for the record, update the sequence in question:

    -- indices
    REINDEX DATABASE db_in_question;
    -- constraints (check, foreign key)
    UPDATE pg_constraint SET convalidated = false WHERE all_check_and_FK_constraints;
    ALTER TABLE table_with_constraint VALIDATE CONSTRAINT constraint_on_that_table;
    -- other things, see below
    -- ...
    -- refresh collation versions if no errors above
    ALTER DATABASE db_in_question REFRESH COLLATION VERSION;
    ALTER COLLATION every_collation_from_pg_collation REFRESH VERSION;

What else needs to be taken care of, and how ?

partitions

    Need to re-sort rows into the proper partition as needed.

    Can this be achievd by

        UPDATE each_partitioned_table SET each_partitioned_key = each_partitioned_key;

    ?

    Courtesy of (ddl-partitioning.html#DDL-PARTITIONING-CONSTRAINT-EXCLUSION):

        Updating the partition key of a row will cause it to
        be moved into a different partition if it no longer
        satisfies the partition bounds of its original
        partition.

range types

    Will this

        UPDATE table_with_range_type_column SET ranged_column = ranged_column

    find all relevant issues ?

domains

    Will this

        UPDATE table_with_domain_type_column SET domained_column = domained_column

    find all relevant issues ?

custom types

    ??

function immutability ??

    It can be argued that functions marked IMMUTABLE really
    are not in case they involve sorting of a collatable data
    type, and are thus wrongly marked as IMMUTABLE.

    IOW pre-existing user error.

If all this has been discussed in detail, I'd be glad for a
pointer into the archive.

Thanks,
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B