Re: Collation version tracking for macOS - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Collation version tracking for macOS
Date
Msg-id CA+hUKG+OSQtrRAk-bHwMJmuvqp-b-LGuxsF2PoDBVyQcT+VEAQ@mail.gmail.com
Whole thread Raw
In response to Re: Collation version tracking for macOS  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Collation version tracking for macOS
Re: Collation version tracking for macOS
List pgsql-hackers
Replying to Peter and Jeff in one email.

On Sat, Nov 12, 2022 at 3:57 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> On 22.10.22 03:22, Thomas Munro wrote:
> > I'd love to hear others' thoughts on how we can turn this into a
> > workable solution.  Hopefully while staying simple...
>
> I played with this patch a bit.  It looks like a reasonable approach.

Great news.

> Attached is a small patch to get the dynamic libicu* lookup working with
> the library naming on macOS.

Thanks, squashed.

> Instead of packing the ICU version into the locale field ('63:en'), I
> would make it a separate field in pg_collation and a separate argument
> in CREATE COLLATION.

I haven't tried this yet, as I focused on coming up with a way of testing in
this iteration.  I can try this next.  I'm imagining that we'd have
pg_collation.collicuversion and pg_database.daticuversion, and they'd default
to 0 for "use the GUC", and perhaps you'd even be able to ALTER them.  Perhaps
we wouldn't even need the GUC then...  0 could mean "the linked version", and
if you don't like it, you ALTER it.  Thinking about this.

> At this point, perhaps it would be good to start building some tests to
> demonstrate various upgrade scenarios and to ensure portability.

OK, here's what I came up with.  You enable it in PG_TEST_EXTRA, and
tell it about an alternative ICU version you have in the standard library
search path that is not the same as the main/linked one:

$ meson configure -DPG_TEST_EXTRA="icu=63"
$ meson test icu/020_multiversion

Another change from your feedback:  you mentioned that RHEL7 shipped with ICU
50, so I removed my suggestion of dropping some extra code we carry for
versions before 54 and set the minimum acceptable version to 50.  It probably
works further back than that, but that's a decent range, I think.

On Tue, Nov 15, 2022 at 1:55 PM Jeff Davis <pgsql@j-davis.com> wrote:
> I looked at v6.

Thanks for jumping in and testing!

>   * We'll need some clearer instructions on how to build/install extra
> ICU versions that might not be provided by the distribution packaging.
> For instance, I got a cryptic error until I used --enable-rpath, which
> might not be obvious to all users.

Suggestions welcome.  No docs at all yet...

>   * Can we have a better error when the library was built with --
> disable-renaming? We can just search for the plain (no suffix) symbol.

I threw out that symbol probing logic, and wrote something simpler that should
now also work with --disable-renaming (though not tested).  Now it does a
cross-check with the library's self-reported major version, just to make
sure there wasn't a badly named library file, which may be more likely
with --disable-renaming.

>   * We should use dlerror() instead of %m to report dlopen() errors.

Fixed.

>   * It seems like the collation version is just there to issue WARNINGs
> when a user is using the non-versioned locale syntax and the library
> changes underneath them (or if there is collation version change within
> a single ICU major version)?

Correct.

I have now updated the warning messages you get when they don't match, to
provide a hint about what to do about it.  I am sure they need some more
word-smithing, though.

>   * How are you testing this?

Ad hoc noodling before now, but see attached.

> I realize your patch is experimental, but when there is a better
> consensus on the approach, we should consider adding declarative syntax
> such as:
>
>    CREATE COLLATION (or LOCALE?) PROVIDER icu67
>      TYPE icu VERSION '67' AS '/path/to/icui18n.so.67';
>
> It will offer more opportunities to catch errors early and offer better
> error messages. It would also enable it to function if the library is
> built with --disable-renaming (though we'd have to trust the user).

Earlier in this and other threads, we wondered if each ICU major version should
be a separate provider, which is what you're showing there, or should be an
independent property of an individual COLLATION, which is what v6 did with
'63:en' and what Peter suggested I make more formal with CREATE COLLATION foo
(..., ICU_VERSION=63).  I actually started out thinking we'd have multiple
providers, but I couldn't really think of any advantage, and I think it makes
some upgrade scenarios more painful.  Can you elaborate on why you'd want
that model?

> On Sat, 2022-10-22 at 14:22 +1300, Thomas Munro wrote:
> > Problem 1:  Suppose you're ready to start using (say) v72.  I guess
> > you'd use the REFRESH command, which would open the main linked ICU's
> > collversion and stamp that into the catalogue, at which point new
> > sessions would start using that, and then you'd have to rebuild all
> > your indexes (with no help from PG to tell you how to find everything
> > that needs to be rebuilt, as belaboured in previous reverted work).
> > Aside from the possibility of getting the rebuilding job wrong (as
> > belaboured elsewhere), it's not great, because there is still a
> > transitional period where you can be using the wrong version for your
> > data.  So this requires some careful planning and understanding from
> > the administrator.
>
> How is this related to the search-by-collversion design? It seems like
> it's hard no matter what.

Yeah.  I just don't like the way it *appears* to be doing something clever, but
it doesn't solve any fundamental problem at all because the collversion
information is under human control and so it's really doing something stupid.
Hence desire to build something that at least admits that it's primitive and
just gives you some controls, in a first version.  We could always reconsider
that in later work though, maybe even an optional policy or something?

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Allow single table VACUUM in transaction block
Next
From: Alvaro Herrera
Date:
Subject: Re: allowing for control over SET ROLE