Re: 16: Collation versioning and dependency helpers - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: 16: Collation versioning and dependency helpers
Date
Msg-id CA+hUKGJv6V94Px59hR9KpSTJoqSP64cRxmrdqhs7EFhVODOFrA@mail.gmail.com
Whole thread Raw
In response to 16: Collation versioning and dependency helpers  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: 16: Collation versioning and dependency helpers  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Sun, Oct 30, 2022 at 5:41 PM Jeff Davis <pgsql@j-davis.com> wrote:
> We haven't fully solved the changing collation-provider problem. An
> upgrade of the OS may change the version of libc or icu, and that might
> affect the collation, which could leave you with various corrupt
> database objects including:
>
>   * indexes
>   * constraints
>   * range types or multiranges (or other types dependent
>     on collation for internal consistency)
>   * materialized views
>   * partitioned tables (range or hash)

Check.

> There's discussion about trying to reliably detect these changes and
> remedy them. But there are major challenges; for instance, glibc
> doesn't give a reliable signal that a collation may have changed, which
> would leave us with a lot of false positives and create a new set of
> problems (e.g. reindexing when it's unnecessary). And even with ICU, we
> don't have a way to support multiple versions of a provider or of a
> single collation, so trying to upgrade would still be a hassle.

FWIW some experimental code for multi-version ICU is proposed for
discussion here:

https://commitfest.postgresql.org/40/3956/

> Proposal:
>
> Add in some tools to make it easier for administrators to find out if
> they are at risk and solve the problem for themselves in a systematic
> way.

Excellent goal.

> Patches:
>
>   0001: Treat "default" collation as unpinned, so that entries in
> pg_depend are created. The rationale is that, since the "default"
> collation can change, it's not really an immutable system object, and
> it's worth tracking which objects are affected by it. It seems to bloat
> pg_depend by about 5-10% though -- that doesn't seem great, but I'm not
> sure if it's a real problem or not.

FWIW we did this (plus a lot more) in the per-index version tracking
feature reverted from 14.

>   0002: Enable pg_collation_actual_version() to work on the default
> collation (oid=100) so that it doesn't need to be treated as a special
> case.

Makes sense.

>   0003: Fix ALTER COLLATION "default" REFRESH VERSION, which currently
> throws an unhelpful internal error. Instead, issue a more helpful error
> that suggests "ALTER DATABASE ... REFRESH COLLATION VERSION" instead.

Makes sense.

>   0004: Add system views:
>     pg_collation_versions: quickly see the current (from the catalog)
> and actual (from the provider) versions of each collation
>     pg_collation_dependencies: map of objects to the collations they
> depend on
>
> Along with these patches, you can use some tricks to verify data, such
> as /contrib/amcheck; or fix the data with things like:
>
>   * REINDEX
>   * VACUUM FULL/TRUNCATE/CLUSTER
>   * REFRESH MATERIALIZED VIEW
>
> And then refresh the collation version when you're confident that your
> data is valid.

Here you run into an argument that we had many times in that cycle:
what's the point of views that suffer both false positives and false
negatives?

> TODO:

>   * Consider better tracking of which collation versions were active on
> a particular object since the last REINDEX (or REFRESH MATERIALIZED
> VIEW, TRUNCATE, or other command that would remove any trace of data
> affected by the previous collation version).

Right, the per-object dependency tracking feature, reverted from 14,
aimed to do exactly that.  It fell down on (1) some specific bugs that
were hard to fix, like dependencies inherited via composite types when
you change the composite type, and (2) doubt expressed by Tom, and
earlier Stephen, that pg_depend was a good place to store version
information.



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: 16: Collation versioning and dependency helpers
Next
From: Andy Fan
Date:
Subject: Re: Pulling up direct-correlated ANY_SUBLINK