Thread: Q: fixing collation version mismatches
Dear all, just to confirm my understanding: Is it correct to say that the following sequence will "fix" all current collation version issues in a given database ? REINDEX DATABASE db_in_question; ALTER DATABASE db_in_question REFRESH COLLATION VERSION; ALTER COLLATION every_collation_from_pg_collation REFRESH VERSION; Note that I am currently _not_ concerned with minimizing work by running this on objects only that really need a reindex/refresh. Thanks, Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
> On Nov 13, 2022, at 12:45, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote: > REINDEX DATABASE db_in_question; > ALTER DATABASE db_in_question REFRESH COLLATION VERSION; > ALTER COLLATION every_collation_from_pg_collation REFRESH VERSION; I may be totally off-base here, but shouldn't the REINDEX be the last step?
Am Sun, Nov 13, 2022 at 12:46:53PM -0800 schrieb Christophe Pettus: > > On Nov 13, 2022, at 12:45, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote: > > REINDEX DATABASE db_in_question; > > ALTER DATABASE db_in_question REFRESH COLLATION VERSION; > > ALTER COLLATION every_collation_from_pg_collation REFRESH VERSION; > > I may be totally off-base here, but shouldn't the REINDEX be the last step? To my understanding, the REFRESH statements "merely" update the version information stored in the related objects. They do not change anything else; and the REINDEX does not reference them in any way. I suppose the REINDEX goes first as it does the actual fixing of now-invalid objects by rebuilding them. After that one is back to a usable database state, even if left with pesky (albeit harmless) warnings on version mismatches -- which to get rid of one runs the REFRESH statements. Or so my understanding... Which is why my question still stands: does the above three-strikes operation safely take care of any collation issues that may currently exist in a database ? Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
Le lun. 14 nov. 2022 à 05:58, Karsten Hilbert <Karsten.Hilbert@gmx.net> a écrit :
Am Sun, Nov 13, 2022 at 12:46:53PM -0800 schrieb Christophe Pettus:
> > On Nov 13, 2022, at 12:45, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote:
> > REINDEX DATABASE db_in_question;
> > ALTER DATABASE db_in_question REFRESH COLLATION VERSION;
> > ALTER COLLATION every_collation_from_pg_collation REFRESH VERSION;
>
> I may be totally off-base here, but shouldn't the REINDEX be the last step?
To my understanding, the REFRESH statements "merely" update
the version information stored in the related objects. They
do not change anything else; and the REINDEX does not
reference them in any way.
I suppose the REINDEX goes first as it does the actual fixing
of now-invalid objects by rebuilding them. After that one is
back to a usable database state, even if left with pesky
(albeit harmless) warnings on version mismatches -- which to
get rid of one runs the REFRESH statements.
Or so my understanding...
yes exactly. but it's likely that people will have some form of automation to run the reindex if there's any discrepancy between the recorded collation version and recorded version, so if you first fix the versions metada and then encounter any error during the reindex, you won't know if you need to reindex or not and might end up with corrupt indexes.
Which is why my question still stands: does the above
three-strikes operation safely take care of any collation
issues that may currently exist in a database ?
yes
Le lun. 14 nov. 2022 à 13:10, Julien Rouhaud <rjuju123@gmail.com> a écrit :
yes exactly. but it's likely that people will have some form of automation to run the reindex if there's any discrepancy between the recorded collation version and recorded version,
sorry I meant "and the current version"
Karsten Hilbert wrote: > Which is why my question still stands: does the above > three-strikes operation safely take care of any collation > issues that may currently exist in a database ? For the indexes, yes, but theorically, all constraints involving collatable types need a recheck. For foreign key constraints with non-deterministic collations, there might be equality tests that pass with an older Unicode version and fail with a newer Unicode version. For check constraints as well, checks applied to strings with recent Unicode characters can give different results after an upgrade. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Am Mon, Nov 14, 2022 at 05:42:16PM +0100 schrieb Daniel Verite: > > Which is why my question still stands: does the above > > three-strikes operation safely take care of any collation > > issues that may currently exist in a database ? > > For the indexes, yes, but theorically, all constraints involving collatable > types need a recheck. > > For foreign key constraints with non-deterministic collations, there > might be equality tests that pass with an older Unicode version and fail > with a newer Unicode version. Which gives weight to the argument that using real-world data (instead of surrogate keys) may lead to trouble. > For check constraints as well, checks applied to strings with recent > Unicode characters can give different results after an upgrade. Thanks for pointing this out more clearly. My thinking already evolved towards also including VALIDATE CONSTRAINT. I shall, for the record, update the sequence in question: -- indices REINDEX DATABASE db_in_question; -- constraints (check, foreign key) UPDATE pg_constraint SET convalidated = false WHERE all_check_and_FK_constraints; ALTER TABLE table_with_constraint VALIDATE CONSTRAINT constraint_on_that_table; -- other things, see below -- ... -- refresh collation versions if no errors above ALTER DATABASE db_in_question REFRESH COLLATION VERSION; ALTER COLLATION every_collation_from_pg_collation REFRESH VERSION; What else needs to be taken care of, and how ? partitions Need to re-sort rows into the proper partition as needed. Can this be achievd by UPDATE each_partitioned_table SET each_partitioned_key = each_partitioned_key; ? Courtesy of (ddl-partitioning.html#DDL-PARTITIONING-CONSTRAINT-EXCLUSION): Updating the partition key of a row will cause it to be moved into a different partition if it no longer satisfies the partition bounds of its original partition. range types Will this UPDATE table_with_range_type_column SET ranged_column = ranged_column find all relevant issues ? domains Will this UPDATE table_with_domain_type_column SET domained_column = domained_column find all relevant issues ? custom types ?? function immutability ?? It can be argued that functions marked IMMUTABLE really are not in case they involve sorting of a collatable data type, and are thus wrongly marked as IMMUTABLE. IOW pre-existing user error. If all this has been discussed in detail, I'd be glad for a pointer into the archive. Thanks, Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B