Re: Collation versioning - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Collation versioning
Date
Msg-id CAOBaU_YDq=ijm-BYfS7fr20TqZ0t5b+bNuS5Vwk832S5bk-iOg@mail.gmail.com
Whole thread Raw
In response to Re: Collation versioning  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Fri, Nov 8, 2019 at 2:24 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Thu, Nov 7, 2019 at 10:20 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > Attached 4th patch handles default collation.  I went with an
> > ignore_systempin flag in recordMultipleDependencies.
>
> Thanks for working on this!  I haven't looked closely or tested yet,
> but this seems reasonable.  Obviously it assumes that the default
> provider is really "libc" in disguise for now, but Peter's other patch
> will extend that to cover ICU later.

Yes, that should require minimal changes here.

>
> > > > * Andres mentioned off-list that pg_depend rows might get blown away
> > > > and recreated in some DDL circumstances.  We need to look into that.
> >
> > I tried various flavour of DDL but I couldn't wipe out the pg_depend
> > rows without having an index rebuild triggered (like changing the
> > underlying column datatype).  Do you have any scenario where the index
> > rebuild wouldn't be triggered?
>
> Ah, OK, if we only do that when the old index contents will also be
> destroyed, that's great news.
>
> > > > * Another is that pg_upgrade won't preserve pg_depend rows, so you'd
> > > > need some catalog manipulation (direct or via new DDL) to fix that.
> >
> > Attached 5th patch add a new "ALTER INDEX idx_name DEPENDS ON
> > COLLATION coll_oid VERSION coll_version_text" that can only be
> > executed in binary upgrade mode, and teach pg_dump to generate such
> > commands (including for indexes created for constraints).
>
> It's nice that you were able to make up a reasonable grammar out of
> existing keywords.  I wonder if we should make this user accessible...
> it could be useful for expert users.  If so, maybe it should use
> collation names, not OIDs?

I thought about it, but I'm wondering if it's a good idea to expose
this to users.  The command is not really POLA compliant, as what it
actually mean is "update the recorded version for all existing
pg_depend rows, if any, for this index and collation", while one could
assume that it'll add a new pg_depend row if none exist.  Ideally,
users would only need to use command that says "trust me this index
(or collation version) is actually compatible with this collation's
current version" or similar, but not some user provided string.

> > I didn't do anything for the spurious warning when running a reindex,
> > and kept original approach of pg_depend catalog.
>
> I see three options:
>
> 1.  Change all or some of index_open(), relation_open(),
> RelationIdGetRelation(), RelationBuildDesc() and
> RelationInitIndexAccessInfo() to take some kind of flag so we can say
> NO_COLLATION_VERSION_CHECK_PLEASE, and then have ReindexIndex() pass
> that flag down when opening it for the purpose of rebuilding it.
> 2.  Use a global state to suppress these warnings while opening that
> index.  Perhaps ReindexIndex() would call RelCacheWarnings(false)
> before index_open(), and use PG_FINALLY to make sure it restores
> warnings with RelCacheWarnings(true).  (This is a less-code-churn
> bad-programming-style version of #1.)
> 3.  Move the place that we run the collation version check.  Instead
> of doing it in RelationInitIndexAccessInfo() so that it runs when the
> relation cache first loads the index, put it into a new routine
> RelationCheckValidity() and call that from ... I don't know, some
> other place that runs whenever we access indexes but not when we
> rebuild them.
>
> 3's probably a better idea, if you can find a reasonable place to call
> it from.  I'm thinking some time around the time the executor acquires
> locks, but using a flag in the relcache entry to make sure it doesn't
> run the check again if there were no warnings last time (so one
> successful version check turns the extra work off for the rest of this
> relcache entry's lifetime).  I think it'd be a feature, not a bug, if
> it gave you the warning every single time you executed a query using
> an index that has a mismatch...  but a practical alternative would be
> to check only once per index and that probably makes more sense.

OTOH, 2 is more generic, and could maybe be a better way with Peter's
idea of new catalog that would also fit other use cases?



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: heapam_index_build_range_scan's anyvisible
Next
From: Julien Rouhaud
Date:
Subject: Re: Collation versioning