Re: Collation versioning - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Collation versioning
Date
Msg-id CA+hUKGKi=Vjj0J1tk9=JbKOFqR7Ot4WcRdp7gkzam-JFJF0Agg@mail.gmail.com
Whole thread Raw
In response to Re: Collation versioning  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Collation versioning  (Michael Paquier <michael@paquier.xyz>)
Re: Collation versioning  (Julien Rouhaud <rjuju123@gmail.com>)
Re: Collation versioning  (Julien Rouhaud <rjuju123@gmail.com>)
List pgsql-hackers
On Thu, Nov 7, 2019 at 10:20 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> Attached 4th patch handles default collation.  I went with an
> ignore_systempin flag in recordMultipleDependencies.

Thanks for working on this!  I haven't looked closely or tested yet,
but this seems reasonable.  Obviously it assumes that the default
provider is really "libc" in disguise for now, but Peter's other patch
will extend that to cover ICU later.

> > > * Andres mentioned off-list that pg_depend rows might get blown away
> > > and recreated in some DDL circumstances.  We need to look into that.
>
> I tried various flavour of DDL but I couldn't wipe out the pg_depend
> rows without having an index rebuild triggered (like changing the
> underlying column datatype).  Do you have any scenario where the index
> rebuild wouldn't be triggered?

Ah, OK, if we only do that when the old index contents will also be
destroyed, that's great news.

> > > * Another is that pg_upgrade won't preserve pg_depend rows, so you'd
> > > need some catalog manipulation (direct or via new DDL) to fix that.
>
> Attached 5th patch add a new "ALTER INDEX idx_name DEPENDS ON
> COLLATION coll_oid VERSION coll_version_text" that can only be
> executed in binary upgrade mode, and teach pg_dump to generate such
> commands (including for indexes created for constraints).

It's nice that you were able to make up a reasonable grammar out of
existing keywords.  I wonder if we should make this user accessible...
it could be useful for expert users.  If so, maybe it should use
collation names, not OIDs?

> One issue
> is that older versions don't have pg_depend information,  so pg_dump
> can't find the dependencies to generate such commands and override the
> version with anything else.  It means that the upgraded cluster will
> show all indexes as depending on the current collation provider
> version.  I'm not sure if that's the best thing to do, or if we should
> change all pg_depend rows to mention "unknown" version or something
> like that.  It would generate so much noise that it's probably not
> worth it.

Right, so this is basically a policy decision: do we assume that all
pre-13 indexes that depend on collations are potentially corrupted, or
assume that they are not?  The "correct" thing to do would be to
assume they are potentially corrupted and complain until the user
reindexes, but I think the pragmatic thing to do would be to assume
that they're not and just let them adopt the current versions, even
though it's a lie.  I lean towards the pragmatic choice; we're trying
to catch future problems, not give the entire user base a load of
extra work to do on their next pg_upgrade for mostly theoretical
reasons.  (That said, given the new glibc versioning, we'll
effectively be giving most of our user base a load of extra work to do
on their next OS upgrade and that'll be a characteristic of PostgreSQL
going forward, once the versioning-for-default-provider patch goes
in.)  Any other opinions?

> I didn't do anything for the spurious warning when running a reindex,
> and kept original approach of pg_depend catalog.

I see three options:

1.  Change all or some of index_open(), relation_open(),
RelationIdGetRelation(), RelationBuildDesc() and
RelationInitIndexAccessInfo() to take some kind of flag so we can say
NO_COLLATION_VERSION_CHECK_PLEASE, and then have ReindexIndex() pass
that flag down when opening it for the purpose of rebuilding it.
2.  Use a global state to suppress these warnings while opening that
index.  Perhaps ReindexIndex() would call RelCacheWarnings(false)
before index_open(), and use PG_FINALLY to make sure it restores
warnings with RelCacheWarnings(true).  (This is a less-code-churn
bad-programming-style version of #1.)
3.  Move the place that we run the collation version check.  Instead
of doing it in RelationInitIndexAccessInfo() so that it runs when the
relation cache first loads the index, put it into a new routine
RelationCheckValidity() and call that from ... I don't know, some
other place that runs whenever we access indexes but not when we
rebuild them.

3's probably a better idea, if you can find a reasonable place to call
it from.  I'm thinking some time around the time the executor acquires
locks, but using a flag in the relcache entry to make sure it doesn't
run the check again if there were no warnings last time (so one
successful version check turns the extra work off for the rest of this
relcache entry's lifetime).  I think it'd be a feature, not a bug, if
it gave you the warning every single time you executed a query using
an index that has a mismatch...  but a practical alternative would be
to check only once per index and that probably makes more sense.



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: pg_waldump and PREPARE
Next
From: Michael Paquier
Date:
Subject: Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY