Re: Collation versioning - Mailing list pgsql-hackers
From | Julien Rouhaud |
---|---|
Subject | Re: Collation versioning |
Date | |
Msg-id | CAOBaU_YHbuMbwDNNjCHcA6-qyRVeR1j6qFehA1gnz+bM3UZJNA@mail.gmail.com Whole thread Raw |
In response to | Re: Collation versioning (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Collation versioning
|
List | pgsql-hackers |
On Sat, Oct 31, 2020 at 10:28 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Fri, Oct 30, 2020 at 1:20 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > 4. I didn't really like the use of '' for unknown. I figured out how > > to use NULL for that. > > Hrmph. That logic didn't work for the pattern ops case. I fixed > that, by partially reintroducing a special value, but this time > restricting the code that knows about that to just pg_dump, and I used > the value 'unknown', so really it's not special at all as far as the > server is concerned and there is only one kind of warning message. Ok, I'm fine with that. > Furthermore, I realised that I really don't like the policy of > assuming that all text-related indexes imported from older releases > need the "unknown" warning. That'll just make this feature > unnecessarily noisy and unpopular when 14 comes out, essentially > crying wolf, even though it's technically true that the collations in > imported-from-before-14 indexes are of unknown version. Even worse, > instructions might be shared around the internet to show how to shut > the warnings up without reindexing, and then later when there really > is a version change, someone might find those instructions and follow > them! So I propose that the default should be to assume that indexes > are not corrupted, unless you opt into the more pessimistic policy > with --index-collation-versions-unknown. Done like that. Yes, I was also worried about spamming this kind of messages after an upgrade. Note that this was initially planned for REL_13_STABLE, whose release date was very close to glibc 2.28, so at that time this would have been more likely to have a real corruption on the indexes. I'm fine with the new behavior. > I also realised that I don't like carrying a bunch of C code to > support binary upgrades, when it's really just a hand-coded trivial > UPDATE of pg_depend. Is there any reason pg_dump --binary-upgrade > shouldn't just dump UPDATE statements, and make this whole feature a > lot less mysterious, and shorter? Done like that. I just thought that it wouldn't be acceptable to do plain DML on the catalogs. If that's ok, then definitely this approach is better. > While testing on another OS that will be encountered in the build farm > when I commit this, I realised that I needed to add --encoding=UTF8 to > tests under src/bin/pg_dump and src/test/locale, because they now do > things with ICU collations (if built with ICU support) and that only > works with UTF8. Oh I didn't know that. > Another option would be to find a way to skip those > tests if the encoding is not UTF8. Hmm, I wonder if it's bad to > effectively remove the testing that comes for free from buildfarm > animals running this under non-UTF8 encodings; but if we actually > valued that, I suppose we'd do it explicitly as another test pass with > SQL_ASCII. I guess it would be better to keep checking non-UTF8 encodings, but the current approach looks quite random and I don't have any better suggestions. Note that v34 now fails when run on a without that don't have defined(__GLIBC__) (e.g. macos). The failure are in collate.icu.utf8.sql, of the form: - icuidx06_d_en_fr_ga | "default" | up to date + icuidx06_d_en_fr_ga | "default" | version not tracked Given the new approach, the only option I can see is to simply remove any attempt to cover the default collation in the tests. An alternative file would be a pain to maintain and it wouldn't bring any value apart from checking that the default collation is either always tracked or never, but not a mix of those.
pgsql-hackers by date: