Re: Collation versioning - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Collation versioning
Date
Msg-id CA+hUKGJ-TqYomCAYgJt53_0b9KmfSyD2qW59xfzmZa3ftRJFzA@mail.gmail.com
Whole thread Raw
In response to Re: Collation versioning  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Collation versioning
List pgsql-hackers
On Thu, Feb 13, 2020 at 8:13 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> Hearing no complaints on the suggestions, I'm attaching v8 to address that:
>
> - pg_dump is now using a binary_upgrade_set_index_coll_version() function
>   rather than plain DDL
> - the additional DDL is now of the form:
>   ALTER INDEX name ALTER COLLATION name REFRESH VERSION

+1

A couple of thoughts:

@@ -1115,21 +1117,44 @@ index_create(Relation heapRelation,
...
+               /*
+                * Get required distinct dependencies on collations
for all index keys.
+                * Collations of directly referenced column in hash
indexes can be
+                * skipped is they're deterministic.
+                */
                for (i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
                {
-                       if (OidIsValid(collationObjectId[i]) &&
-                               collationObjectId[i] != DEFAULT_COLLATION_OID)
+                       Oid colloid = collationObjectId[i];
+
+                       if (OidIsValid(colloid))
                        {
-                               referenced.classId = CollationRelationId;
-                               referenced.objectId = collationObjectId[i];
-                               referenced.objectSubId = 0;
+                               if ((indexInfo->ii_Am != HASH_AM_OID) ||
+
!get_collation_isdeterministic(colloid))

I still don't like the way catalog/index.c has hard-coded knowledge of
HASH_AM_OID here.  Although it errs on the side of the assuming that
there *is* a version dependency (good), there is already another AM in
the tree that could safely skip it for deterministic collations AFAIK:
Bloom indexes.  I suppose that any extension AM that is doing some
kind of hashing would also like to be able to be able to opt out of
collation version checking, when that is safe.  The question is how to
model that in our system...

One way would be for each AM to declare whether it is affected by
collations; the answer could be yes/maybe (default), no,
only-non-deterministic-ones.  But that still feels like the wrong
level, not taking advantage of knowledge about operators.

A better way might be to make declarations about that sort of thing in
the catalog, somewhere in the vicinity of the operator classes, or
maybe just to have hard coded knowledge about operator classes (ie
making declarations in the manual about what eg hash functions are
allowed to consult and when), and then check which of those an index
depends on.  I am not sure what would be best, I'd need to spend some
time studying the am operator system.

Perhaps for the first version of this feature, we should just add a
new local function
index_can_skip_collation_version_dependency(indexInfo, colloid) to
encapsulate your existing logic, but add a comment that in future we
might be able to support skipping in more cases through analysis of
the catalogs.

+   <varlistentry>
+    <term><literal>ALTER COLLATION</literal></term>
+    <listitem>
+     <para>
+      This command declares that the index is compatible with the currently
+      installed version of the collations that determine its order.  It is used
+      to silence warnings caused by collation version incompatibilities and
+      should be called after rebuilding the index or otherwise verifying its
+      consistency.  Be aware that incorrect use of this command can hide index
+      corruption.
+     </para>
+    </listitem>
+   </varlistentry>

This sounds like something that you need to do after you reindex, but
that's not true, is it?  This is something you can do *instead* of
reindex, to make it shut up about versions.  Shouldn't it be something
like "... should be issued only if the ordering is known not to have
changed since the index was built"?

+-- Test ALTER INDEX name ALTER COLLATION name REFRESH VERSION
+UPDATE pg_depend SET refobjversion = 'not a version'
+WHERE refclassid = 'pg_collation'::regclass
+AND objid::regclass::text = 'icuidx17_part'
+AND refobjversion IS NOT NULL;
+SELECT objid::regclass FROM pg_depend WHERE refobjversion = 'not a version';
+     objid
+---------------
+ icuidx17_part
+(1 row)
+
+ALTER INDEX icuidx17_part ALTER COLLATION "en-x-icu" REFRESH VERSION;
+SELECT objid::regclass FROM pg_depend WHERE refobjversion = 'not a version';
+ objid
+-------
+(0 rows)
+

Would it be better to put refobjversion = 'not a version' in the
SELECT list, instead of the WHERE clause, with a WHERE clause that
hits that one row, so that we can see that the row still exists after
the REFRESH VERSION (while still hiding the unstable version string)?



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: assert pg_class.relnatts is consistent
Next
From: Masahiko Sawada
Date:
Subject: Re: error context for vacuum to include block number