Home > mailing lists

Re: Collation versioning - Mailing list pgsql-hackers

From	Julien Rouhaud
Subject	Re: Collation versioning
Date	November 2, 2020 06:59:10
Msg-id	CAOBaU_YHbuMbwDNNjCHcA6-qyRVeR1j6qFehA1gnz+bM3UZJNA@mail.gmail.com Whole thread Raw
In response to	Re: Collation versioning (Thomas Munro <thomas.munro@gmail.com>)
Responses	Re: Collation versioning
List	pgsql-hackers

Tree view

On Sat, Oct 31, 2020 at 10:28 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Fri, Oct 30, 2020 at 1:20 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > 4.  I didn't really like the use of '' for unknown.  I figured out how
> > to use NULL for that.
>
> Hrmph.  That logic didn't work for the pattern ops case.  I fixed
> that, by partially reintroducing a special value, but this time
> restricting the code that knows about that to just pg_dump, and I used
> the value 'unknown', so really it's not special at all as far as the
> server is concerned and there is only one kind of warning message.

Ok, I'm fine with that.

> Furthermore, I realised that I really don't like the policy of
> assuming that all text-related indexes imported from older releases
> need the "unknown" warning.  That'll just make this feature
> unnecessarily noisy and unpopular when 14 comes out, essentially
> crying wolf, even though it's technically true that the collations in
> imported-from-before-14 indexes are of unknown version.  Even worse,
> instructions might be shared around the internet to show how to shut
> the warnings up without reindexing, and then later when there really
> is a version change, someone might find those instructions and follow
> them!  So I propose that the default should be to assume that indexes
> are not corrupted, unless you opt into the more pessimistic policy
> with --index-collation-versions-unknown.  Done like that.

Yes, I was also worried about spamming this kind of messages after an
upgrade.  Note that this was initially planned for REL_13_STABLE,
whose release date was very close to glibc 2.28, so at that time this
would have been more likely to have a real corruption on the indexes.
I'm fine with the new behavior.

> I also realised that I don't like carrying a bunch of C code to
> support binary upgrades, when it's really just a hand-coded trivial
> UPDATE of pg_depend.  Is there any reason pg_dump --binary-upgrade
> shouldn't just dump UPDATE statements, and make this whole feature a
> lot less mysterious, and shorter?  Done like that.

I just thought that it wouldn't be acceptable to do plain DML on the
catalogs.  If that's ok, then definitely this approach is better.

> While testing on another OS that will be encountered in the build farm
> when I commit this, I realised that I needed to add --encoding=UTF8 to
> tests under src/bin/pg_dump and src/test/locale, because they now do
> things with ICU collations (if built with ICU support) and that only
> works with UTF8.

Oh I didn't know that.

> Another option would be to find a way to skip those
> tests if the encoding is not UTF8.  Hmm, I wonder if it's bad to
> effectively remove the testing that comes for free from buildfarm
> animals running this under non-UTF8 encodings; but if we actually
> valued that, I suppose we'd do it explicitly as another test pass with
> SQL_ASCII.

I guess it would be better to keep checking non-UTF8 encodings, but
the current approach looks quite random and I don't have any better
suggestions.

Note that v34 now fails when run on a without that don't have
defined(__GLIBC__) (e.g. macos).  The failure are in
collate.icu.utf8.sql, of the form:

- icuidx06_d_en_fr_ga               | "default"  | up to date
+ icuidx06_d_en_fr_ga               | "default"  | version not tracked

Given the new approach, the only option I can see is to simply remove
any attempt to cover the default collation in the tests.  An
alternative file would be a pain to maintain and it wouldn't bring any
value apart from checking that the default collation is either always
tracked or never, but not a mix of those.

pgsql-hackers by date:

From: Luc Vlaming
Date: 02 November 2020, 06:53:45
Subject: Re: should INSERT SELECT use a BulkInsertState?

From: Justin Pryzby
Date: 02 November 2020, 07:00:06
Subject: reindex partitioned indexes: refactor ReindexRelationConcurrently ?

Re: Collation versioning - Mailing list pgsql-hackers

Previous

Next