Re: Collation versioning - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Collation versioning |
Date | |
Msg-id | CA+hUKGJ-TqYomCAYgJt53_0b9KmfSyD2qW59xfzmZa3ftRJFzA@mail.gmail.com Whole thread Raw |
In response to | Re: Collation versioning (Julien Rouhaud <rjuju123@gmail.com>) |
Responses |
Re: Collation versioning
|
List | pgsql-hackers |
On Thu, Feb 13, 2020 at 8:13 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > Hearing no complaints on the suggestions, I'm attaching v8 to address that: > > - pg_dump is now using a binary_upgrade_set_index_coll_version() function > rather than plain DDL > - the additional DDL is now of the form: > ALTER INDEX name ALTER COLLATION name REFRESH VERSION +1 A couple of thoughts: @@ -1115,21 +1117,44 @@ index_create(Relation heapRelation, ... + /* + * Get required distinct dependencies on collations for all index keys. + * Collations of directly referenced column in hash indexes can be + * skipped is they're deterministic. + */ for (i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++) { - if (OidIsValid(collationObjectId[i]) && - collationObjectId[i] != DEFAULT_COLLATION_OID) + Oid colloid = collationObjectId[i]; + + if (OidIsValid(colloid)) { - referenced.classId = CollationRelationId; - referenced.objectId = collationObjectId[i]; - referenced.objectSubId = 0; + if ((indexInfo->ii_Am != HASH_AM_OID) || + !get_collation_isdeterministic(colloid)) I still don't like the way catalog/index.c has hard-coded knowledge of HASH_AM_OID here. Although it errs on the side of the assuming that there *is* a version dependency (good), there is already another AM in the tree that could safely skip it for deterministic collations AFAIK: Bloom indexes. I suppose that any extension AM that is doing some kind of hashing would also like to be able to be able to opt out of collation version checking, when that is safe. The question is how to model that in our system... One way would be for each AM to declare whether it is affected by collations; the answer could be yes/maybe (default), no, only-non-deterministic-ones. But that still feels like the wrong level, not taking advantage of knowledge about operators. A better way might be to make declarations about that sort of thing in the catalog, somewhere in the vicinity of the operator classes, or maybe just to have hard coded knowledge about operator classes (ie making declarations in the manual about what eg hash functions are allowed to consult and when), and then check which of those an index depends on. I am not sure what would be best, I'd need to spend some time studying the am operator system. Perhaps for the first version of this feature, we should just add a new local function index_can_skip_collation_version_dependency(indexInfo, colloid) to encapsulate your existing logic, but add a comment that in future we might be able to support skipping in more cases through analysis of the catalogs. + <varlistentry> + <term><literal>ALTER COLLATION</literal></term> + <listitem> + <para> + This command declares that the index is compatible with the currently + installed version of the collations that determine its order. It is used + to silence warnings caused by collation version incompatibilities and + should be called after rebuilding the index or otherwise verifying its + consistency. Be aware that incorrect use of this command can hide index + corruption. + </para> + </listitem> + </varlistentry> This sounds like something that you need to do after you reindex, but that's not true, is it? This is something you can do *instead* of reindex, to make it shut up about versions. Shouldn't it be something like "... should be issued only if the ordering is known not to have changed since the index was built"? +-- Test ALTER INDEX name ALTER COLLATION name REFRESH VERSION +UPDATE pg_depend SET refobjversion = 'not a version' +WHERE refclassid = 'pg_collation'::regclass +AND objid::regclass::text = 'icuidx17_part' +AND refobjversion IS NOT NULL; +SELECT objid::regclass FROM pg_depend WHERE refobjversion = 'not a version'; + objid +--------------- + icuidx17_part +(1 row) + +ALTER INDEX icuidx17_part ALTER COLLATION "en-x-icu" REFRESH VERSION; +SELECT objid::regclass FROM pg_depend WHERE refobjversion = 'not a version'; + objid +------- +(0 rows) + Would it be better to put refobjversion = 'not a version' in the SELECT list, instead of the WHERE clause, with a WHERE clause that hits that one row, so that we can see that the row still exists after the REFRESH VERSION (while still hiding the unstable version string)?
pgsql-hackers by date: