Jim Nasby <nasbyj@amazon.com> writes: >> I think the real problem here is that the underlying software mostly >> doesn't take this issue seriously.
> The first step to a solution is admitting that the problem exists. > Ignoring broken backups, segfaults and data corruption as a "rant" > implies that we simply throw in the towel and tell users to suck it up > or switch engines. There are other ways to address this short of the > community doing all the work itself. One simple example would be to > refuse to start if the collation provider has changed since initdb > (which we'd need to allow users to override).
You're conveniently skipping over the hard part, which is to tell whether the collation provider has changed behavior (which we'd better do with pretty darn high accuracy, if we're going to refuse to start on the basis of thinking it has). Unfortunately, giving a reliable indication of collation behavioral changes is *exactly* the thing that the providers aren't taking seriously.
Is this more involved than creating a list of all valid Unicode characters (~144 thousand), sorting them, then running crc32 over the sorted order to create the "version" for the library/collation pair? Far from free but few databases use more than a couple different collations.