Re: Q: documentation improvement re collation version mismatch - Mailing list pgsql-general

From Julien Rouhaud
Subject Re: Q: documentation improvement re collation version mismatch
Date
Msg-id 20221110113621.lkjh25rnzlacd635@jrouhaud
Whole thread Raw
In response to Aw: Re: Q: documentation improvement re collation version mismatch  (Karsten Hilbert <Karsten.Hilbert@gmx.net>)
Responses Aw: Re: Q: documentation improvement re collation version mismatch  (Karsten Hilbert <Karsten.Hilbert@gmx.net>)
List pgsql-general
On Thu, Nov 10, 2022 at 11:47:01AM +0100, Karsten Hilbert wrote:
> Thanks, Julien, for your explanation.
> 
> > > regarding changed collation versions this
> > >
> > >     https://www.postgresql.org/docs/devel/sql-altercollation.html
> > >
> > > says:
> > >
> > >     The following query can be used to identify all
> > >     collations in the current database that need to be
> > >     refreshed and the objects that depend on them:
> > >
> > >     SELECT pg_describe_object(refclassid, refobjid, refobjsubid) AS "Collation",
> > >            pg_describe_object(classid, objid, objsubid) AS "Object"
> > >       FROM pg_depend d JOIN pg_collation c
> > >            ON refclassid = 'pg_collation'::regclass AND refobjid = c.oid
> > >       WHERE c.collversion <> pg_collation_actual_version(c.oid)
> > >       ORDER BY 1, 2;
> > >
> > > I feel the result of that query can be slightly surprising
> > > because it does not return (to my testing) any objects
> > > depending on the database default collation, nor the database
> > > itself (as per a collation version mismatch in pg_database).
> >
> > Indeed.  The default collation is "pinned", so we don't record any dependency
> > on it.
> 
> Indirectly we do, don't we ?  Or else
> 
> > >     WHERE
> > >         collprovider IN ('d', 'c')
> 
> would not make much sense, right ?

What I meant is that we don't insert record in pg_depend to track dependencies
on pinned object, including the default collation.  The collprovider here comes
from pg_index.indcollation which is a different thing.  It can indeed store the
default collation, but it's only a small step toward less false negative.

Try that query with e.g.

CREATE INDEX ON sometable ( (somecol > 'somevalue') );

or

CREATE INDEX ON sometable (someid) WHERE somecol > 'somevalue';

Both clearly can get corrupted if the underlying collation library changes the
result of somecol > 'somevalue', but wouldn't be detected by that query.  There
are likely a lot more cases that would be missed, you can refer to the
discussions from a couple years ago when we tried to properly track all index
collation dependencies.

> The comment above the query in the official documentation is rather assertive
> (even if may true to the letter) and may warrant some more cautionary
> wording ?   Added, perhaps, some variation of this:
> 
> > For now, the only safe way to go is either reindex everything, or everything
> > except some safe cases (non-partial indexes on plain-non-collatable datatypes
> > only).

I think the comment is very poorly worded, as it leads readers to believe that
objects with a pg_depend dependency on a collation are the only one that would
get corrupted in case of glibc/ICU upgrade.

I agree that there should be a big fat red warning saying something like
"reindex everything if there's any discrepancy between the recorded collation
version and the currently reported one unless you REALLY know what you're
doing."



pgsql-general by date:

Previous
From: Karsten Hilbert
Date:
Subject: Aw: Re: Q: documentation improvement re collation version mismatch
Next
From: Ron
Date:
Subject: Re: Q: documentation improvement re collation version mismatch