Home > mailing lists

Re: Ways to change a database collation with removing duplicates - Mailing list pgsql-admin

From	Alexey Murz Korepov
Subject	Re: Ways to change a database collation with removing duplicates
Date	March 12, 2023 05:52:00
Msg-id	CAL5pyKt6fW7wErKUf0ev7i6+=rszdR3Qo4rxbOH9kjCiwo9F2Q@mail.gmail.com Whole thread
In response to	Re: Ways to change a database collation with removing duplicates (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-admin

Tree view

Thanks for the explanation! Yes, it seems this is not related to collations, because I see an integer duplicate there now, thanks for pointing to this. Maybe the restoration of the database is started twice somehow, that produces that duplicates, so I will try to restart the migration process from scratch.

On Sun, Mar 12, 2023 at 12:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Erik Wienhold <ewie@ewie.name> writes:
> On 11/03/2023 19:23 CET Alexey Murz Korepov <murznn@gmail.com> wrote:
>> Could anyone suggest to me the ways to change a database collation with
>> removing all the duplicates, caused by this change?

> Collations can only affect uniqueness if they are nondeterministic or if you
> have functional indexes, e.g. using lower(text) for a case-insensitive unique
> index. Otherwise the collations only affect text ordering.

Yeah. I suspect that what actually happened here was a previous change in
the host system's sort ordering (cf [1]), leading to text indexes becoming
functionally corrupt and unable to enforce uniqueness correctly, after
which you accumulated some unintentional duplicates. If you try
reindex'ing on the source database you'll probably find that it fails with
the same errors. I don't know of any automatic tools for fixing up such
duplications, and wouldn't trust one hugely anyway --- you'll probably
need manual curation of the fixes.

regards, tom lane

[1] https://wiki.postgresql.org/wiki/Locale_data_changes

Best regards,
Alexey Murz Korepov.
E-mail: murznn@gmail.com
Messengers: Matrix - https://matrix.to/#/@murz:ru-matrix.org Telegram - @MurzNN

pgsql-admin by date:

From: Erik Wienhold
Date: 11 March 2023, 21:23:11
Subject: Re: LOG: could not rename temporary statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat"

From: Laurenz Albe
Date: 12 March 2023, 13:47:18
Subject: Re: Tooling for per table autovacuum tuning

Re: Ways to change a database collation with removing duplicates - Mailing list pgsql-admin

Previous

Next