Re: Collation version tracking for macOS - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Collation version tracking for macOS |
Date | |
Msg-id | CA+hUKGKWX6q_UXsDhVbhYns2+VsrPLW5o6ohqO6C=cZ6r23G7w@mail.gmail.com Whole thread Raw |
In response to | Re: Collation version tracking for macOS (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Collation version tracking for macOS
Re: Collation version tracking for macOS |
List | pgsql-hackers |
On Tue, Nov 29, 2022 at 7:51 PM Jeff Davis <pgsql@j-davis.com> wrote: > On Sat, 2022-11-26 at 18:27 +1300, Thomas Munro wrote: > > On Thu, Nov 24, 2022 at 5:48 PM Thomas Munro <thomas.munro@gmail.com> > > wrote: > > > On Thu, Nov 24, 2022 at 3:07 PM Jeff Davis <pgsql@j-davis.com> > > > wrote: > > > > I'd vote for 1 on the grounds that it's easier to document and > > > > understand a single collation version, which comes straight from > > > > ucol_getVersion(). This approach makes it a separate problem to > > > > find > > > > the collation version among whatever libraries the admin can > > > > provide; > > > > but adding some observability into the search should mitigate any > > > > confusion. > > > > > > OK, it sounds like I should code that up next. > > > > Here's the first iteration. > > Thank you. Thanks for the review. Responses further down. And thanks also for the really interesting discussion about how the version numbers work (or in some cases, don't work...), and practical packaging and linking problems. To have a hope of making something happen for PG16, which I think means we need a serious contender patch in the next few weeks, we really need to make some decisions. I enjoyed trying out search-by-collversion, but it's still not my favourite. On the ballot we have two main questions: 1. Should we commit to search-by-collversion, or one of the explicit library version ideas, and if the latter, which? 2. Should we try to support being specific about minor versions (in various different ways according to the choice made for #1)? My tentative votes are: 1. I think we should seriously consider provider = ICU63. I still think search-by-collversion is a little too magical, even though it clearly can be made to work. Of the non-magical systems, I think encoding the choice of library into the provider name would avoid the need to add a second confusing "X_version" concept alongside our existing "X_version" columns in catalogues and DDL syntax, while still making it super clear what is going on. This would include adding DDL commands so you can do ALTER DATABASE/COLLATION ... PROVIDER = ICU63 to make warnings go way. 2. I think we should ignore minor versions for now (other than reporting them in the relevant introspection functions), but not make any choices that would prevent us from changing our mind about that in a later release. For example, having two levels of specificity ICU and ICU68 in the libver-in-provider-name design wouldn't preclude us from adding support for ICU68_2 later I haven't actually tried that design out in code yet, but I'm willing to try to code that up very soon. So no new patch from me yet. Does anyone else want to express a view? > Proposed changes: > > * I attached a first pass of some documentation. Thanks. Looks pretty good, and much of it would stay if we changed to one of the other models. > * Should be another GUC to turn WARNING into an ERROR. Useful at least > for testing; perhaps too dangerous for production. OK, will add that into the next version. > * The libraries should be loaded in a more diliberate order. The "*" > should be expanded in a descending fashion so that later versions are > preferred. Yeah, I agree. > * GUCs should be validated. Will do. > * Should validate that loaded library has expected version. Will do. > * We need to revise or remove pg_collation_actual_version() and > pg_database_collation_actual_version(). I never liked that use of the word "actual"... > * The GUCs are PGC_SUSET, but don't take effect because > icu_library_list_fully_loaded is never reset. True. Just rought edges because I was trying to prototype search-by-collversion fast. Will consider this for the next version. > * The extra collations you're adding at bootstrap time are named based > on the library major version. I suppose it might be more "proper" to > name them based on the collation version, but that would be more > verbose, so I won't advocate for that. Just pointing it out. Ah, yes, the ones with names like "en-US-x-icu68". I agree that made a little less sense in the search-by-collversion patch. Maybe we wouldn't want these at all in the search-by-collversion model. But I think they're perfect the way they are in the provider = ICU68 model. The other idea I considered ages ago was that we could use namespaces: you could "icu68.en-US", or just "en-US" in some contexts to get what your search path sees, but that all seemed a little too cute and not really like anything else we do with system-created catalogues, so I gave that idea up. > * It looks hard (or impossible) to mix multiple ICU libraries with the > same major version and different minor versions. That's because, > e.g., libicui18n.so.63.1 links against libicuuc.63 and libicudata.63, > and when you install ICU 63.2, those dependencies get clobbered with > the 63.2 versions. That fails the sanity check I proposed above about > the library version number matching the requested library version > number. And it also just seems wrong -- why would you have minor- > version precision about an ICU library but then only major-version > precision about the ICU dependencies of that library? Doesn't that > defeat the whole purpose of this naming scheme? (Maybe another ICU > bug?). I don't think it's a bug exactly. That scheme is designed to advertise ABI stability, and not intended to support parallel installation of minor versions. It does seem a little silly for libraries that are shipped together as one atomic unit not to use fully qualified dependency names, though. I think there would be various technical solutions, if you're prepared to give up existing ready-made packages and build stuff yourself. Install them into different directories with different DT_RPATH so they can't see each other (but then our icu_library_path needs to support a list of paths or it won't find these ones which will have to be not in the usual system path), or clobber the DT_NEEDED (but I guess not the DT_SONAME) to mention the minor version, and equivalent concepts for other non-elf systems (at a glance the same problem applies on macOS), or re-roll the libraries into a single .so. Or convince them to support a single library build mode (maybe there is one already? I couldn't find it). That's all a bit against the grain for now, and makes me want to abandon the notion of minor versions completely for now but leave the option open for later exploration. In the meantime, I think the feature is still pretty useful. For example, it helps you with the common case of a major OS upgrade or streaming replication across major OS versions: just find the right .deb/rpm/whatever for the older one, and install it, until you're ready to upgrade and REFRESH. The story is not quite as good for someone with an index full of Chinese or Turkish text who gets a surprise warning after a minor apt-get update, because the Japanese have decided to invent a new character. We can't offer a nice solution to that: they have to determine that it is safe to REFRESH to clear the warning, with or without rebuild, or downgrade/pin the ICU package until they are ready to REFRESH. But that is already the case today and this patch neither helps nor hinders. The only reason we didn't know about this pre-existing type of problem is because (approximately) nobody uses ICU yet, because it wasn't available as a database default yet. > Minor comments: > > * ICU_I18N is defined in make_icu_library_name() but used outside of > it. One solution might be to have it return both library names to the > caller and rename it as make_icu_library_names(). Good idea, will do. > * get_icu_function() could use a clarifying comment or a better name. > Something that communicates that you are looking for the function in > the given library with the given major version number (which may or may > not be needed depending on how the library was compiled). Agreed. > * typo in comment over make_icu_collator: > s/u_getVersion/ucol_getVersion/ Thanks. > * The return value of make_icu_collator() seems backwards to me, > stylistically. I typically see the false-is-good pattern with integer > returns. Agreed. > * weird bracketing style in get_icu_collator for the "else" Yep. > > The version rosetta stone functions look like this: > > > > postgres=# select * from pg_icu_library_versions(); > > icu_version | unicode_version | cldr_version > > -------------+-----------------+-------------- > > 67.1 | 13.0 | 37.0 > > 63.1 | 11.0 | 34.0 > > 57.1 | 8.0 | 29.0 > > (3 rows) > > > > postgres=# select * from pg_icu_collation_versions('zh'); > > icu_version | uca_version | collator_version > > -------------+-------------+------------------ > > 67.1 | 13.0 | 153.14.37 > > 63.1 | 11.0 | 153.88.34 > > 57.1 | 8.0 | 153.64.29 > > (3 rows) > > I like these functions. Yeah, they've been quite educational. Now I'm wondering what form these functions would take in the provider = ICU68 patch.
pgsql-hackers by date: