On Sun, Oct 4, 2020 at 4:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> > As for the patch, I wonder if we want to make this change. I'm not very
> > familiar with how unaccent works, but if changes to unicode rules would
> > really silently break indexes, it's kinda similar to the collation
> > issues caused by glibc updates. And we've generally considered that a
> > case of data corruption, I think, so it'd be strange to allow that here.
>
> Yeah. The fact that we have a problem with collation updates doesn't
> mean that it's okay to introduce the same problem elsewhere.
>
> Note the extremely large amount of work that's ongoing to try to
> track collation changes so we can know whether such an update has
> invalidated indexes. Unless this patch comes with equivalent
> infrastructure to detect when the unaccent mapping has changed,
> I think it should be rejected.
>
> The problem here is somewhat different than for collations, in
> that collation changes are hidden behind more-or-less-obscure
> library APIs. Here I think we'd "just" have to track whether
> the user has changed the associated unaccent mapping file.
> However, detecting which indexes are invalidated by such a
> change still involves a lot of infrastructure that's not there.
> And it'd be qualitatively different, because a config file
> change could happen at any time --- we could not relegate
> the checks to pg_upgrade or the like.
As a potential next use for refobjversion (the system I'm hoping to
commit soon for collation version tracking), I have wondered about
declarative function versions. For those not following that thread,
the basic idea is that you get WARNING messages telling you to REINDEX
if certain things change, and the warnings only stop for each
dependent index once you've actually done it. Once the collation
stuff is in the tree, it wouldn't be too hard to do a naive function
version tracking system, but there are a couple of tricky problems to
make it really useful: (1) plpgsql function f1() calls f2(), f2()
changed, so an index on ((f1(x))) needs to be rebuilt, (2) unaccent()
and ts_parse() access other complicated objects, perhaps even
depending on arguments passed in to them. You'd probably need to
design some kind of dependency analyser handler that PLs could
provide, and likewise, individual functions could perhaps provide
their own analyser, or maybe you could redesign the functions so that
a naive single version scheme could work. Or something like
that.</handwaving>