Re: Collation versioning - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Collation versioning
Date
Msg-id 20180916210215.GE4184@tamriel.snowman.net
Whole thread Raw
In response to Re: Collation versioning  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: Collation versioning
Re: Collation versioning
List pgsql-hackers
Greetings,

* Thomas Munro (thomas.munro@enterprisedb.com) wrote:
> On Mon, Sep 17, 2018 at 6:13 AM Douglas Doole <dougdoole@gmail.com> wrote:
> > On Sun, Sep 16, 2018 at 1:20 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
> >> 3.  Fix the tracking of when reindexes need to be rebuilt, so that you
> >> can't get it wrong (as you're alluding to above).
> >
> > I've mentioned this in the past, but didn't seem to get any traction, so I'll try it again ;-)
>
> Probably because we agree with you, but don't have all the answers :-)

Agreed.

> > The focus on indexes when a collation changes is, in my opinion, the least of the problems. You definitely have to
worryabout indexes, but they can be easily rebuilt. What about other places where collation is hardened into the
system,such as constraints? 
>
> We have to start somewhere and indexes are the first thing that people
> notice, and are much likely to actually be a problem (personally I've
> encountered many cases of index corruption due to collation changes in
> the wild, but never a constraint corruption, though I fully understand
> the theoretical concern).  Several of us have observed specifically
> that the same problems apply to CHECK constraints and PARTITION
> boundaries, and there may be other things like that.  You could
> imagine tracking collation dependencies on those, requiring a RECHECK
> or REPARTITION operation to update them after a depended-on collation
> version changes.
>
> Perhaps that suggests that there should be a more general way to store
> collation dependencies -- something more like pg_depend, rather than
> bolting something like indcollversion onto indexes and every other
> kind of catalog that might need it.  I don't know.

Agreed.  If we start thinking about pg_depend then maybe we realize
that this all comes back to pg_attribute as the holder of the
column-level information and maybe what we should be thinking about is a
way to encode version information into the typmod for text-based
types...

> > And constraints problems are even easier than triggers. Consider a database with complex BI rules that are
implementedthrough triggers that fire when values are/are not equal. If the equality of strings change, there could be
baddata throughout the tables. (At least with constraints the inter-column dependencies are explicit in the catalogs.
Withtriggers anything goes.) 
>
> Once you get into downstream effects of changes (whether they are
> recorded in the database or elsewhere), I think it's basically beyond
> our event horizon.  Why and when did the collation definition change
> (bug fix in CLDR, decree by the Académie Française taking effect on 1
> January 2019, ...)?  We could all use bitemporal databases and
> multi-version ICU, but at some point it all starts to look like an
> episode of Dr Who.  I think we should make a clear distinction between
> things that invalidate the correct working of the database, and more
> nebulous effects that we can't possibly track in general.

I tend to agree in general, but I don't think it's beyond us to consider
multi-version ICU and being able to perform online reindexing (such that
a given system could be migrated from one collation to another over a
time while the system is still online, instead of having to take a
potentially long downtime hit to rebuild indexes after an upgrade, or
having to rebuild the entire system using some kind of logical
replication...).

Thanks!

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Dave Cramer
Date:
Subject: Re: ssl tests README and certs
Next
From: Alexander Korotkov
Date:
Subject: Re: [HACKERS] [PATCH] kNN for SP-GiST