Thread: Move defaults toward ICU in 16?
As a project, do we want to nudge users toward ICU as the collation provider as the best practice going forward? If so, is version 16 the right time to adjust defaults to favor ICU? * At build time, default to --with-icu (-Dicu=enabled); users who don't want ICU can specify --without-icu (-Dicu=disabled/auto) * At initdb time, default to --locale-provider=icu if built with ICU support If we don't want to nudge users toward ICU, is it because we are waiting for something, or is there a lack of consensus that ICU is actually better? Regards, Jeff Davis
On Thu, Feb 2, 2023 at 8:13 AM Jeff Davis <pgsql@j-davis.com> wrote: > If we don't want to nudge users toward ICU, is it because we are > waiting for something, or is there a lack of consensus that ICU is > actually better? Do you think it's better? -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, 2023-02-02 at 08:44 -0500, Robert Haas wrote: > On Thu, Feb 2, 2023 at 8:13 AM Jeff Davis <pgsql@j-davis.com> wrote: > > If we don't want to nudge users toward ICU, is it because we are > > waiting for something, or is there a lack of consensus that ICU is > > actually better? > > Do you think it's better? Yes: * ICU more featureful: e.g. supports case-insensitive collations (the citext docs suggest looking at ICU instead). * It's faster: a simple non-contrived sort is something like 70% faster[1] than one using glibc. * It can provide consistent semantics across platforms. I believe the above reasons are enough to call ICU "better", but it also seems like a better path for addressing/mitigating collator versioning problems: * Easier for users to control what library version is available on their system. We can also ask packagers to keep some old versions of ICU available for an extended period of time. * If one of the ICU multilib patches makes it in, it will be easier for users to select which of the library versions Postgres will use. * Reports versions for indiividual collators, distinct from the library version. The biggest disadvantage (rather, the flip side of its advantages) is that it's a separate dependency. Will ICU still be maintained in 10 years or will we end up stuck maintaining it ourselves? Then again, we've already been shipping it, so I don't know if we can avoid that problem entirely now even if we wanted to. I don't mean that ICU solves all of our problems -- far from it. But you asked if I think it's better, and my answer is yes. Regards, Jeff Davis [1] https://postgr.es/m/64039a2dbcba6f42ed2f32bb5f0371870a70afda.camel@j-davis.com
On Fri, Feb 3, 2023 at 5:31 AM Jeff Davis <pgsql@j-davis.com> wrote: > On Thu, 2023-02-02 at 08:44 -0500, Robert Haas wrote: > > On Thu, Feb 2, 2023 at 8:13 AM Jeff Davis <pgsql@j-davis.com> wrote: > > > If we don't want to nudge users toward ICU, is it because we are > > > waiting for something, or is there a lack of consensus that ICU is > > > actually better? > > > > Do you think it's better? > > Yes: > > * ICU more featureful: e.g. supports case-insensitive collations (the > citext docs suggest looking at ICU instead). > * It's faster: a simple non-contrived sort is something like 70% > faster[1] than one using glibc. > * It can provide consistent semantics across platforms. +1 > * Easier for users to control what library version is available on > their system. We can also ask packagers to keep some old versions of > ICU available for an extended period of time. > * If one of the ICU multilib patches makes it in, it will be easier > for users to select which of the library versions Postgres will use. > * Reports versions for indiividual collators, distinct from the > library version. +1 > The biggest disadvantage (rather, the flip side of its advantages) is > that it's a separate dependency. Will ICU still be maintained in 10 > years or will we end up stuck maintaining it ourselves? Then again, > we've already been shipping it, so I don't know if we can avoid that > problem entirely now even if we wanted to. It has a pretty special status, with an absolutely enormous amount of technology depending on it. http://blog.unicode.org/2016/05/icu-joins-unicode-consortium.html https://unicode.org/consortium/consort.html https://home.unicode.org/membership/members/ https://home.unicode.org/about-unicode/ I mean, who knows what the future holds, but ultimately what we're doing here is taking the de facto reference implementation of the Unicode collation algorithm. Are Unicode and the consortium still going to be here in 10 years? We're all in on Unicode, and it's also tangled up with ISO standards, as are parts of the collation stuff. Sure, there could be a clean-room implementation that replaces it in some sense (just as there is a Java implementation) but it would very likely be "the same" because the real thing we're buying here is the set of algorithms and data maintenance that the whole industry has agreed on. Unless Britain decides to exit the Latin alphabet, terminate membership of ISO and revert to anglo-saxon runes with a sort order that is defined in the new constitution as "the opposite of whatever Unicode says", it's hard to see obstacles to ICU's long term universal applicability. It's still important to have libc support as an option, though, because it's a totally reasonable thing to want sort order to agree with the "sort" command on the same host, and you are willing to deal with all the complexities that we're trying to escape.
On Thu, Feb 2, 2023 at 2:15 PM Thomas Munro <thomas.munro@gmail.com> wrote: > Sure, there could be a clean-room implementation that replaces it in > some sense (just as there is a Java implementation) but it would very > likely be "the same" because the real thing we're buying here is the > set of algorithms and data maintenance that the whole industry has > agreed on. I don't think that a clean room implementation is implausible. They seem to already exist, and be explicitly provided for by CLDR, which is not joined at the hip to ICU: https://github.com/elixir-cldr/cldr Most of the value that we tend to think of as coming from ICU actually comes from CLDR itself, as well as related Unicode Consortium and IETF standards/RFCs such as BCP-47. > Unless Britain decides to exit the Latin alphabet, terminate > membership of ISO and revert to anglo-saxon runes with a sort order > that is defined in the new constitution as "the opposite of whatever > Unicode says", it's hard to see obstacles to ICU's long term universal > applicability. It would have to literally be defined as "not unicode" for it to present a real problem. A key goal of Unicode is to accommodate political and cultural shifts, since even countries can come and go. In principle Unicode should be able to accommodate just about any change in preferences, except when there is an irreconcilable difference of opinion among people that are from the same natural language group. For example it can accommodate relatively minor differences of opinion about how text should be sorted among groups that each speak a regional dialect of the same language. Hardly anybody even notices this. Accommodating these variations can only come from making a huge investment. Most of the work is actually done by natural language scholars, not technologists. That effort is very unlikely to be duplicated by some other group with its own conflicting goals. AFAICT there is no great need for any schisms, since differences of opinion can usually be accommodated under the umbrella of Unicode. -- Peter Geoghegan
Thomas Munro <thomas.munro@gmail.com> writes: > It's still important to have libc support as an option, though, > because it's a totally reasonable thing to want sort order to agree > with the "sort" command on the same host, and you are willing to deal > with all the complexities that we're trying to escape. Yeah. I would be resistant to making ICU a required dependency, but it doesn't seem unreasonable to start moving towards it being our default collation support. regards, tom lane
On Thu, 2023-02-02 at 18:10 -0500, Tom Lane wrote: > Yeah. I would be resistant to making ICU a required dependency, > but it doesn't seem unreasonable to start moving towards it being > our default collation support. Patch attached. To get the default locale, the patch initializes a UCollator with NULL for the locale name, and then queries it for the locale name. Then it's converted to a language tag, which is consistent with the initial collation import. I'm not sure that's the best way, but it seems reasonable. If it's a user-provided locale (--icu-locale=), then the patch leaves it as-is, and does not convert it to a language tag (consistent with CREATE COLLATION and CREATE DATABASE). I opened another discussion about whether we want to try harder to validate or canonicalize the locale name: https://www.postgresql.org/message-id/11b1eeb7e7667fdd4178497aeb796c48d26e69b9.camel@j-davis.com -- Jeff Davis PostgreSQL Contributor Team - AWS
Attachment
On 2023-02-08 12:16:46 -0800, Jeff Davis wrote: > On Thu, 2023-02-02 at 18:10 -0500, Tom Lane wrote: > > Yeah. I would be resistant to making ICU a required dependency, > > but it doesn't seem unreasonable to start moving towards it being > > our default collation support. > > Patch attached. Unfortunately this fails widely on CI, with both compile time and runtime issues: https://cirrus-ci.com/build/5116408950947840
On Wed, 2023-02-08 at 18:22 -0800, Andres Freund wrote: > On 2023-02-08 12:16:46 -0800, Jeff Davis wrote: > > On Thu, 2023-02-02 at 18:10 -0500, Tom Lane wrote: > > > Yeah. I would be resistant to making ICU a required dependency, > > > but it doesn't seem unreasonable to start moving towards it being > > > our default collation support. > > > > Patch attached. > > Unfortunately this fails widely on CI, with both compile time and > runtime New patches attached. 0001: build defaults to requiring ICU 0002: initdb defaults to using ICU (if server built with ICU) One CI test is failing: "Windows - Server 2019, VS 2019 - Meson & ninja"; if I apply Andres patch ( https://github.com/anarazel/postgres/commit/dde7c68 ), then it works. I ran into one annoyance with pg_upgrade, which is that a v15 cluster initialized with the defaults requires that the v16 cluster is initialized with --locale-provider=libc, because otherwise the old and new cluster will have mismatching template databases. Simple to fix once you see the error, but I wonder how many initdb scripts might be broken? I suppose it's just the cost of changing a default? Would an environment variable help for cases where it's difficult to pass that extra option down through a script? I also considered posting another patch to change the default for CREATE COLLATION, but there are a few issues I'm not sure about. Should the default be based on whether ICU support is available? Or the datlocprovider for the current database? And/or some kind of compatibility GUC? Notes on the tests I needed to fix, in case they are interesting or point to some kind of larger problem: * ecpg has a test that involves setting the client_encoding to LATIN1 which required a compatible server encoding so it was setting ENCODING=SQL_ASCII, which ICU doesn't support. The ecpg test did not look particularly sensitive to the locale, so I changed it to use client_encoding=SQL_ASCII instead, so that the server encoding doesn't matter. * citext has a test involving Turkish characters, which works for all libc locales, but in ICU the test only works in Turkish locales. I skip the test if datlocprovider='i', because citext doesn't seem very important in an ICU world. * unaccent is broken if the database provider is ICU and LC_CTYPE=C, because the t_isspace() (etc.) functions do not properly handle ICU. Probably some other things are broken with that combination, but only this test seems to exercise it. I just skipped the test for that broken combination, but perhaps it should be fixed in the future. * initdb was being built with ICU as a dependency in meson, but not autoconf. I assume it's fine to link ICU into initdb, so I changed the Makefile. * I changed a couple tests to initialize with --locale-provider=libc. They were testing that creating a database with the ICU provider but no ICU locale fails, and that's easiest to test if the template is libc. * The CI test CompilerWarnings:mingw_cross_warning was failing because ICU is not available. I added --without-icu in the .cirrus.yml file and it works. -- Jeff Davis PostgreSQL Contributor Team - AWS
Attachment
Hi, On 2023-02-10 16:17:00 -0800, Jeff Davis wrote: > One CI test is failing: "Windows - Server 2019, VS 2019 - Meson & > ninja"; if I apply Andres patch ( > https://github.com/anarazel/postgres/commit/dde7c68 ), then it works. Until something like my patch above is done more generally applicable, I think your patch should disable ICU on windows. Can't just fail to build. Perhaps we don't need to force ICU use to on with the meson build, given that it defaults to auto-detection? > I ran into one annoyance with pg_upgrade, which is that a v15 cluster > initialized with the defaults requires that the v16 cluster is > initialized with --locale-provider=libc, because otherwise the old and > new cluster will have mismatching template databases. Simple to fix > once you see the error, but I wonder how many initdb scripts might be > broken? I suppose it's just the cost of changing a default? Would an > environment variable help for cases where it's difficult to pass that > extra option down through a script? That seems problematic to me. But, shouldn't pg_upgrade be able to deal with this? As long as the databases are created with template0, we can create the collations at that point? > @@ -15323,7 +15311,7 @@ else > We can't simply define LARGE_OFF_T to be 9223372036854775807, > since some C++ compilers masquerading as C compilers > incorrectly reject 9223372036854775807. */ > -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) > +#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31)) > int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721 > && LARGE_OFF_T % 2147483647 == 1) > ? 1 : -1]; This stuff shouldn't be in here, it's due to a debian patched autoconf. Greetings, Andres Freund
On Thu, 2023-02-02 at 05:13 -0800, Jeff Davis wrote: > As a project, do we want to nudge users toward ICU as the collation > provider as the best practice going forward? One consideration here is security. Any vulnerability in ICU collation routines could easily become a vulnerability in Postgres. I looked at these lists: https://www.cvedetails.com/vulnerability-list/vendor_id-17477/Icu-project.html https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=icu https://unicode-org.atlassian.net/issues/?jql=labels%20%3D%20%22security%22 https://unicode-org.atlassian.net/issues/?jql=labels%20%3D%20%22was_sensitive%22 Here are the recent CVEs: CVE-2021-30535 https://unicode-org.atlassian.net/browse/ICU-21587 CVE-2020-21913 https://unicode-org.atlassian.net/browse/ICU-20850 CVE-2020-10531 https://unicode-org.atlassian.net/browse/ICU-20958 But there are quite a few JIRAs that look concerning that don't have a CVE assigned: 2021 https://unicode-org.atlassian.net/browse/ICU-21537 2021 https://unicode-org.atlassian.net/browse/ICU-21597 2021 https://unicode-org.atlassian.net/browse/ICU-21676 2021 https://unicode-org.atlassian.net/browse/ICU-21749 Not sure which of these are exploitable, and if they are, why they don't have a CVE. If someone else finds more issues, please let me know. The good news is that the Chrome/Chromium projects are actively finding and reporting issues. I didn't look for comparable information about glibc, but I would guess that exploitable memory errors in setlocale/strcoll are very rare, otherwise it would be a security disaster for many projects. -- Jeff Davis PostgreSQL Contributor Team - AWS
On Fri, 2023-02-10 at 18:00 -0800, Andres Freund wrote: > Until something like my patch above is done more generally > applicable, I think > your patch should disable ICU on windows. Can't just fail to build. > > Perhaps we don't need to force ICU use to on with the meson build, > given that > it defaults to auto-detection? Done. I changed it back to 'auto', and tests pass. > > But, shouldn't pg_upgrade be able to deal with this? As long as the > databases > are created with template0, we can create the collations at that > point? Are you saying that the upgraded cluster could have a different default collation for the template databases than the original cluster? That would be wrong to do, at least by default, but I could see it being a useful option. Or maybe I misunderstand what you're saying? > > This stuff shouldn't be in here, it's due to a debian patched > autoconf. Removed, thank you. -- Jeff Davis PostgreSQL Contributor Team - AWS
Attachment
Hi, On 2023-02-14 09:48:08 -0800, Jeff Davis wrote: > On Fri, 2023-02-10 at 18:00 -0800, Andres Freund wrote: > > But, shouldn't pg_upgrade be able to deal with this? As long as the > > databases > > are created with template0, we can create the collations at that > > point? > > Are you saying that the upgraded cluster could have a different default > collation for the template databases than the original cluster? > That would be wrong to do, at least by default, but I could see it > being a useful option. > > Or maybe I misunderstand what you're saying? I am saying that pg_upgrade should be able to deal with the difference. The details of how to implement that, don't matter that much. FWIW, I don't think it matters much what collation template0 has, since we allow to change the locale provider when using template0 as the template. We could easily update template0, if we think that's necessary. But I don't think it really is. As long as the newly created databases have the right provider, I'd lean towards not touching template0. But whatever... Greetings, Andres Freund
On 2/13/23 8:11 PM, Jeff Davis wrote: > On Thu, 2023-02-02 at 05:13 -0800, Jeff Davis wrote: >> As a project, do we want to nudge users toward ICU as the collation >> provider as the best practice going forward? > > One consideration here is security. Any vulnerability in ICU collation > routines could easily become a vulnerability in Postgres. Would it be any different than a vulnerability in OpenSSL et al? I know that's a general, nuanced question but it would be good to understand if we are exposing ourselves to any more vulnerabilities. And would it be any different than today, given people can build PG with libicu as is? Continuing on $SUBJECT, I wanted to understand performance comparisons. I saw your comments[1] in response to Robert's question, looked at your benchmarks[2] and one that ICU ran on older versions[3]. It seems that in general, users would see performance gains switching to ICU. The only one in [3] that stood out to me was the tests on the "ko_KR" collation underperformed on a list of Korean names, but maybe that is better in newer versions. I agree with most of your points in [1]. The platform-consistent behavior is a good point, especially with more PG deployments running on different systems. While taking on a new dependency is a concern, ICU was released in 1999[4], has an active community, and seems to follow standards (i.e. the Unicode Consortium). I do wonder about upgrades, beyond the ongoing work with pg_upgrade. I think the logical methods (pg_dumpall, logical replication) should generally be OK, but we should ensure we think of things that could go wrong and how we'd answer them. Based on the available data, I think it's OK to move towards ICU as the default, or preferred, collation provider. I agree (for now) in not taking a hard dependency on ICU. Thanks, Jonathan [1] https://www.postgresql.org/message-id/b676252eeb57ab8da9dbb411d0ccace95caeda0a.camel%40j-davis.com [2] https://www.postgresql.org/message-id/64039a2dbcba6f42ed2f32bb5f0371870a70afda.camel@j-davis.com [3] https://icu.unicode.org/charts/collation-icu4c48-glibc [4] https://en.wikipedia.org/wiki/International_Components_for_Unicode
Attachment
On Tue, 2023-02-14 at 16:27 -0500, Jonathan S. Katz wrote: > Would it be any different than a vulnerability in OpenSSL et al? In principle, no, but sometimes the details matter. I'm just trying to add data to the discussion. > It seems that > in general, users would see performance gains switching to ICU. That's great news, and consistent with my experience. I don't think it should be a driving factor though. If there's a choice is between platform-independent semantics (ICU) and performance, platform- independence should be the default. > I agree with most of your points in [1]. The platform-consistent > behavior is a good point, especially with more PG deployments running > on > different systems. Now I think semantics are the most important driver, being consistent across platforms and based on some kind of trusted independent organization that we can point to. It feels very wrong to me to explain that sort order is defined by the operating system on which Postgres happens to run. Saying that it's defined by ICU, which is part of the Unicode consotium, is much better. It doesn't eliminate versioning issues, of course, but I think it's a better explanation for users. Many users have other systems in their data infrastructure, running on a variety of platforms, and could (in theory) try to synchronize around a common ICU version to avoid subtle bugs in their data pipeline. > Based on the available data, I think it's OK to move towards ICU as > the > default, or preferred, collation provider. I agree (for now) in not > taking a hard dependency on ICU. I count several favorable responses, so I'll take it that we (as a community) are intending to change the default for build and initdb in v16. Robert expressed some skepticism[1], though I don't see an objection. If I read his concerns correctly, he's mainly concerned with quality issues like documentaiton, bugs, etc. I understand those concerns (I'm the one that raised them), but they seem like the kind of issues that one finds any time they dig into a dependency enough. "Setting our sights very high"[1], to me, would just be ICU with a bit more rigorous attention to quality issues. [1] https://www.postgresql.org/message-id/CA%2BTgmoYmeGJaW%3DPy9tAZtrnCP%2B_Q%2BzRQthv%3Dzn_HyA_nqEDM-A%40mail.gmail.com -- Jeff Davis PostgreSQL Contributor Team - AWS
On Thu, Feb 16, 2023 at 1:01 AM Jeff Davis <pgsql@j-davis.com> wrote: > It feels very wrong to me to explain that sort order is defined by the > operating system on which Postgres happens to run. Saying that it's > defined by ICU, which is part of the Unicode consotium, is much better. > It doesn't eliminate versioning issues, of course, but I think it's a > better explanation for users. The fact that we can't use ICU on Windows, though, weakens this argument a lot. In my experience, we have a lot of Windows users, and they're not any happier with the operating system collations than Linux users. Possibly less so. I feel like this is a very difficult kind of change to judge. If everyone else feels this is a win, we should go with it, and hopefully we'll end up better off. I do feel like there are things that could go wrong, though, between the imperfect documentation, the fact that a substantial chunk of our users won't be able to use it because they run Windows, and everybody having to adjust to the behavior change. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, 2023-02-16 at 15:05 +0530, Robert Haas wrote: > On Thu, Feb 16, 2023 at 1:01 AM Jeff Davis <pgsql@j-davis.com> wrote: > > It feels very wrong to me to explain that sort order is defined by the > > operating system on which Postgres happens to run. Saying that it's > > defined by ICU, which is part of the Unicode consotium, is much better. > > It doesn't eliminate versioning issues, of course, but I think it's a > > better explanation for users. > > The fact that we can't use ICU on Windows, though, weakens this > argument a lot. In my experience, we have a lot of Windows users, and > they're not any happier with the operating system collations than > Linux users. Possibly less so. > > I feel like this is a very difficult kind of change to judge. If > everyone else feels this is a win, we should go with it, and hopefully > we'll end up better off. I do feel like there are things that could go > wrong, though, between the imperfect documentation, the fact that a > substantial chunk of our users won't be able to use it because they > run Windows, and everybody having to adjust to the behavior change. Unless I misunderstand, the lack of Windows support is not a matter of principle and can be added later on, right? I am in favor of changing the default. It might be good to add a section to the documentation in "Server setup and operation" recommending that if you go with the default choice of ICU, you should configure your package manager not to upgrade the ICU library. Yours, Laurenz Albe
On 2/16/23 4:35 AM, Robert Haas wrote: > On Thu, Feb 16, 2023 at 1:01 AM Jeff Davis <pgsql@j-davis.com> wrote: >> It feels very wrong to me to explain that sort order is defined by the >> operating system on which Postgres happens to run. Saying that it's >> defined by ICU, which is part of the Unicode consotium, is much better. >> It doesn't eliminate versioning issues, of course, but I think it's a >> better explanation for users. > > The fact that we can't use ICU on Windows, though, weakens this > argument a lot. In my experience, we have a lot of Windows users, and > they're not any happier with the operating system collations than > Linux users. Possibly less so. This is one reason why we're discussing ICU as the "preferred default" vs. "the default." While it may not completely eliminate platform dependent behavior for collations, it takes a step forward. And AIUI, it does sound like ICU is available on newer versions of Windows[1]. > I feel like this is a very difficult kind of change to judge. If > everyone else feels this is a win, we should go with it, and hopefully > we'll end up better off. I do feel like there are things that could go > wrong, though, between the imperfect documentation, the fact that a > substantial chunk of our users won't be able to use it because they > run Windows, and everybody having to adjust to the behavior change. We should continue to improve our documentation. Personally, I found the biggest challenge was understanding how to set ICU locales / rules, particularly for nondeterministic collations as it was challenging to find where these were listed. I was able to overcome this with the examples in our docs + blogs, but I agree it's an area we can continue to improve upon. Thanks, Jonathan [1] https://learn.microsoft.com/en-us/dotnet/core/extensions/globalization-icu
Attachment
Hi, On 2023-02-16 15:05:10 +0530, Robert Haas wrote: > The fact that we can't use ICU on Windows, though, weakens this > argument a lot. In my experience, we have a lot of Windows users, and > they're not any happier with the operating system collations than > Linux users. Possibly less so. Why can't you use ICU on windows? It works today, afaict? Greetings, Andres Freund
On Thu, Feb 16, 2023 at 9:45 PM Andres Freund <andres@anarazel.de> wrote: > On 2023-02-16 15:05:10 +0530, Robert Haas wrote: > > The fact that we can't use ICU on Windows, though, weakens this > > argument a lot. In my experience, we have a lot of Windows users, and > > they're not any happier with the operating system collations than > > Linux users. Possibly less so. > > Why can't you use ICU on windows? It works today, afaict? Uh, I had the contrary impression from the discussion upthread, but it sounds like I might be misunderstanding the situation? -- Robert Haas EDB: http://www.enterprisedb.com
Hi, On February 16, 2023 9:14:05 PM PST, Robert Haas <robertmhaas@gmail.com> wrote: >On Thu, Feb 16, 2023 at 9:45 PM Andres Freund <andres@anarazel.de> wrote: >> On 2023-02-16 15:05:10 +0530, Robert Haas wrote: >> > The fact that we can't use ICU on Windows, though, weakens this >> > argument a lot. In my experience, we have a lot of Windows users, and >> > they're not any happier with the operating system collations than >> > Linux users. Possibly less so. >> >> Why can't you use ICU on windows? It works today, afaict? > >Uh, I had the contrary impression from the discussion upthread, but it >sounds like I might be misunderstanding the situation? That was about the build environment in CI / cfbot, I think. Jeff was making icu a hard requirement by default, but ICU wasn'tinstalled in a usable way, so the build failed. The patch he referred to was just building ICU during the CI run. I do remember encountering issues with the mkvcbuild.pl build not building against a downloaded modern icu build, but thatwas just about library naming or directory structure, or such. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Fri, Feb 17, 2023 at 10:44:05AM +0530, Robert Haas wrote: > Uh, I had the contrary impression from the discussion upthread, but it > sounds like I might be misunderstanding the situation? IMO, it would be nice to be able to have the automatic detection of meson work in the CFbot to see how this patch goes. Perhaps that's not a reason enough to hold on this patch, though.. Separate question: what's the state of the Windows installers provided by the community regarding libicu? Is that embedded in the MSI? -- Michael
Attachment
Hi, On February 16, 2023 9:40:17 PM PST, Michael Paquier <michael@paquier.xyz> wrote: >On Fri, Feb 17, 2023 at 10:44:05AM +0530, Robert Haas wrote: >> Uh, I had the contrary impression from the discussion upthread, but it >> sounds like I might be misunderstanding the situation? > >IMO, it would be nice to be able to have the automatic detection of >meson work in the CFbot to see how this patch goes. Perhaps that's >not a reason enough to hold on this patch, though.. Fwiw, the manually triggered mingw task today builds with ICU support. One thing the patch could do is to just comment outthe "manual" piece in .cirrus.yml, then cfbot would run it for just this cf entry. I am planning to build the optional libraries that are easily built, as part of the image build for use by CI. Just haven'tgotten around to it. The patch Jeff linked to is part of the experimentation on the way to that. If somebody elsewants to finish that, even better. IIRC that prototype builds all optional dependencies except for kerberos and ossp-uuid. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Tue, 2023-02-14 at 09:59 -0800, Andres Freund wrote: > I am saying that pg_upgrade should be able to deal with the > difference. The > details of how to implement that, don't matter that much. To clarify, you're saying that pg_upgrade should simply update pg_database to set the new databases' collation fields equal to that of the old cluster? I'll submit it as a separate patch because it would be independently useful. Regards, Jeff Davis
On Fri, 2023-02-17 at 14:40 +0900, Michael Paquier wrote: > Separate question: what's the state of the Windows installers provided > by the community regarding libicu? Is that embedded in the MSI? The EDB installer installs a quite old version of the ICU library for compatibility reasons, as far as I know. Yours, Laurenz Albe
On Fri, 2023-02-17 at 00:06 -0800, Jeff Davis wrote: > On Tue, 2023-02-14 at 09:59 -0800, Andres Freund wrote: > > I am saying that pg_upgrade should be able to deal with the > > difference. The > > details of how to implement that, don't matter that much. > > To clarify, you're saying that pg_upgrade should simply update > pg_database to set the new databases' collation fields equal to that > of > the old cluster? Thinking about this more, it's not clear to me if this would be in scope for pg_upgrade or not. If pg_upgrade is fixing up the new cluster rather than checking for compatibility, why doesn't it just take over and do the initdb for the new cluster itself? That would be less confusing for users, and avoid some weirdness (like, if you drop the database "postgres" on the original, why does it reappear after an upgrade?). Someone might want to do something interesting to the new cluster before the upgrade, but it's not clear from the docs what would be both useful and safe. Regards, Jeff Davis
Hi, On 2023-02-17 09:01:54 -0800, Jeff Davis wrote: > On Fri, 2023-02-17 at 00:06 -0800, Jeff Davis wrote: > > On Tue, 2023-02-14 at 09:59 -0800, Andres Freund wrote: > > > I am saying that pg_upgrade should be able to deal with the > > > difference. The > > > details of how to implement that, don't matter that much. > > > > To clarify, you're saying that pg_upgrade should simply update > > pg_database to set the new databases' collation fields equal to that > > of > > the old cluster? Yes. > Thinking about this more, it's not clear to me if this would be in > scope for pg_upgrade or not. I don't think we should consider changing the default collation provider without making this more seamless, one way or another. > If pg_upgrade is fixing up the new cluster rather than checking for > compatibility, why doesn't it just take over and do the initdb for the new > cluster itself? That would be less confusing for users, and avoid some > weirdness (like, if you drop the database "postgres" on the original, why > does it reappear after an upgrade?). I've wondered about that as well. There are some initdb-time options you can set, but pg_upgrade could forward those. Greetings, Andres Freund
pá 17. 2. 2023 v 18:02 odesílatel Jeff Davis <pgsql@j-davis.com> napsal:
On Fri, 2023-02-17 at 00:06 -0800, Jeff Davis wrote:
> On Tue, 2023-02-14 at 09:59 -0800, Andres Freund wrote:
> > I am saying that pg_upgrade should be able to deal with the
> > difference. The
> > details of how to implement that, don't matter that much.
>
> To clarify, you're saying that pg_upgrade should simply update
> pg_database to set the new databases' collation fields equal to that
> of
> the old cluster?
Thinking about this more, it's not clear to me if this would be in
scope for pg_upgrade or not. If pg_upgrade is fixing up the new cluster
rather than checking for compatibility, why doesn't it just take over
and do the initdb for the new cluster itself? That would be less
confusing for users, and avoid some weirdness (like, if you drop the
database "postgres" on the original, why does it reappear after an
upgrade?).
Someone might want to do something interesting to the new cluster
before the upgrade, but it's not clear from the docs what would be both
useful and safe.
Today I tested icu for Czech sorting. It is a little bit slower, but not too much, but it produces partially different results.
select row_number() over (order by nazev collate "cs-x-icu"), nazev from obce
except select row_number() over (order by nazev collate "cs_CZ"), nazev from obce;
returns a not empty set. So minimally for Czech collate, an index rebuild is necessary
Regards
Pavel
Regards,
Jeff Davis
On Fri, 2023-02-17 at 09:05 -0800, Andres Freund wrote: > > Thinking about this more, it's not clear to me if this would be in > > scope for pg_upgrade or not. > > I don't think we should consider changing the default collation > provider > without making this more seamless, one way or another. I guess I'm fine hacking pg_upgrade, but I think I'd like to make it conditional on this specific case: only perform the fixup if the old cluster is 15 or earlier and using libc and the newer cluster is 16 or later and using icu. There's already a check that the new cluster is empty, so I think it's safe to hack the pg_database locale fields. Regards, Jeff Davis >
Hi, On 2023-02-17 10:00:41 -0800, Jeff Davis wrote: > I guess I'm fine hacking pg_upgrade, but I think I'd like to make it > conditional on this specific case: only perform the fixup if the old > cluster is 15 or earlier and using libc and the newer cluster is 16 or > later and using icu. -1. That's just going to cause pain one major version upgrade further down the line. Why would we want to incur that pain? > There's already a check that the new cluster is empty, so I think it's > safe to hack the pg_database locale fields. I don't think we need to, we do issue the CREATE DATABASEs. So we just need to make sure that includes the collation provider info, and the proper template database, in pg_upgrade mode. Greetings, Andres Freund
On Fri, Feb 17, 2023 at 09:01:54AM -0800, Jeff Davis wrote: > On Fri, 2023-02-17 at 00:06 -0800, Jeff Davis wrote: > > On Tue, 2023-02-14 at 09:59 -0800, Andres Freund wrote: > > > I am saying that pg_upgrade should be able to deal with the > > > difference. The > > > details of how to implement that, don't matter that much. > > > > To clarify, you're saying that pg_upgrade should simply update > > pg_database to set the new databases' collation fields equal to that > > of > > the old cluster? > > Thinking about this more, it's not clear to me if this would be in > scope for pg_upgrade or not. If pg_upgrade is fixing up the new cluster > rather than checking for compatibility, why doesn't it just take over > and do the initdb for the new cluster itself? That would be less > confusing for users, and avoid some weirdness (like, if you drop the > database "postgres" on the original, why does it reappear after an > upgrade?). > > Someone might want to do something interesting to the new cluster > before the upgrade, but it's not clear from the docs what would be both > useful and safe. This came up before - I'm of the opinion that it's unsupported and/or useless to try to do anything on the new cluster between initdb and pg_upgrade. https://www.postgresql.org/message-id/20220707184410.GB13040@telsasoft.com https://www.postgresql.org/message-id/20220905170322.GM31833@telsasoft.com -- Justin
On Fri, 2023-02-17 at 10:09 -0800, Andres Freund wrote: > -1. That's just going to cause pain one major version upgrade further > down the > line. Why would we want to incur that pain? OK, we can just always do the fixup as long as the old one is libc and the new one is ICU. I'm just trying to avoid this becoming a general mechanism to fix up an incompatible new cluster. > > There's already a check that the new cluster is empty, so I think > > it's > > safe to hack the pg_database locale fields. > > I don't think we need to, we do issue the CREATE DATABASEs. So we > just need to > make sure that includes the collation provider info, and the proper > template > database, in pg_upgrade mode. We must fixup template1/postgres in the new cluster (change it to libc to match the old cluster), because any objects existing in those databases in the old cluster may depend on the default collation. I don't see how we can do that without updating pg_database, so I'm not following your point. (I think you're right that template0 is optional; but since we're fixing up the other databases it would be less surprising if we also fixed up template0.) And if we do fixup template0/template1/postgres to match the old cluster, then CREATE DATABASE will have no issue. -- Jeff Davis PostgreSQL Contributor Team - AWS
On Fri, 2023-02-17 at 18:27 +0100, Pavel Stehule wrote: > Today I tested icu for Czech sorting. It is a little bit slower, but > not too much, but it produces partially different results. Thank you for trying it. If it's a significant slowdown, can you please send more information? ICU version, libc version, and testcase? > select row_number() over (order by nazev collate "cs-x-icu"), nazev > from obce > except select row_number() over (order by nazev collate "cs_CZ"), > nazev from obce; > > returns a not empty set. So minimally for Czech collate, an index > rebuild is necessary Yes, that's true of any locale change, provider change, or even provider version change. -- Jeff Davis PostgreSQL Contributor Team - AWS
Hi, On 2023-02-17 12:36:05 -0800, Jeff Davis wrote: > > > There's already a check that the new cluster is empty, so I think > > > it's > > > safe to hack the pg_database locale fields. > > > > I don't think we need to, we do issue the CREATE DATABASEs. So we > > just need to > > make sure that includes the collation provider info, and the proper > > template > > database, in pg_upgrade mode. > > We must fixup template1/postgres in the new cluster (change it to libc > to match the old cluster), because any objects existing in those > databases in the old cluster may depend on the default collation. I > don't see how we can do that without updating pg_database, so I'm not > following your point. I think we just drop/recreate template1 and postgres during pg_upgrade. Yep, looks like it. See create_new_objects(): /* * template1 database will already exist in the target installation, * so tell pg_restore to drop and recreate it; otherwise we would fail * to propagate its database-level properties. */ create_opts = "--clean --create"; and then: /* * postgres database will already exist in the target installation, so * tell pg_restore to drop and recreate it; otherwise we would fail to * propagate its database-level properties. */ if (strcmp(old_db->db_name, "postgres") == 0) create_opts = "--clean --create"; else create_opts = "--create"; Greetings, Andres Freund
On Fri, 2023-02-17 at 12:50 -0800, Andres Freund wrote: > I think we just drop/recreate template1 and postgres during > pg_upgrade. Thank you, that makes much more sense now. I was confused because pg_upgrade loops through to check compatibility with all the databases, which makes zero sense if it's going to drop all of them except template0 anyway. The checks on template1/postgres should be bypassed. So the two approaches are: 1. Don't bother with locale provider compatibility checks at all (even on template0). The emitted CREATE DATABASE statements already specify the locale provider, so that will take care of everything except template0. Maybe the locale provider of template0 shouldn't matter, but some users might be surprised if it changes during upgrade. It also raises some questions about the other properties of template0 like encoding, lc_collate, and lc_ctype, which also don't matter a whole lot (because they can all be changed when using template0 as a template). 2. Update the pg_database entry for template0. This has less potential for surprise in case people are actually using template0 for a template. -- Jeff Davis PostgreSQL Contributor Team - AWS
pá 17. 2. 2023 v 21:43 odesílatel Jeff Davis <pgsql@j-davis.com> napsal:
On Fri, 2023-02-17 at 18:27 +0100, Pavel Stehule wrote:
> Today I tested icu for Czech sorting. It is a little bit slower, but
> not too much, but it produces partially different results.
Thank you for trying it.
If it's a significant slowdown, can you please send more information?
ICU version, libc version, and testcase?
no - this slowdown is not significant - although 1% can looks too much - but it is just two ms
It looks so libicu has little bit more expensive initialization, but the execution is little bit faster
But when I try to repeat the measurements, the results are very unstable on my desktop :-/
SELECT * FROM obce ORDER BY nazev LIMIT 10 // is faster with glibc little bit
SELECT * FROM obce ORDER BY nazev // is faster with libicu
You can download dataset https://pgsql.cz/files/obce.sql
It is table of municipalities in czech republic (real names) - about 6000 rows
I use fedora 37 - so libicu 71.1, glibc 2.36
Regards
Pavel
> select row_number() over (order by nazev collate "cs-x-icu"), nazev
> from obce
> except select row_number() over (order by nazev collate "cs_CZ"),
> nazev from obce;
>
> returns a not empty set. So minimally for Czech collate, an index
> rebuild is necessary
Yes, that's true of any locale change, provider change, or even
provider version change.
--
Jeff Davis
PostgreSQL Contributor Team - AWS
On Fri, Feb 17, 2023 at 10:32 PM Jeff Davis <pgsql@j-davis.com> wrote: > Thinking about this more, it's not clear to me if this would be in > scope for pg_upgrade or not. If pg_upgrade is fixing up the new cluster > rather than checking for compatibility, why doesn't it just take over > and do the initdb for the new cluster itself? That would be less > confusing for users, and avoid some weirdness (like, if you drop the > database "postgres" on the original, why does it reappear after an > upgrade?). > > Someone might want to do something interesting to the new cluster > before the upgrade, but it's not clear from the docs what would be both > useful and safe. I agree with all of this. I think it would be fantastic if pg_upgrade did the initdb itself. It would be simple to make this optional behavior, too: if the destination directory does not exist or is empty, initdb into it, otherwise skip that. That might be too automagical, so we could add add a --no-initdb option. If not specified, the destination directory must either not exist or be empty; else it must exist and look like a data directory. I completely concur with the idea that doing something with the new cluster before the upgrade is weird, and I don't think we should encourage people to do it. Nevertheless, as the threads to which Justin linked probably say, I'm not sure that it's a good idea to completely slam the door shut on that option. If we did want to move in that direction, though, having pg_upgrade do the initdb would be an excellent first step. We could at some later time decide to remove the --no-initdb option; or maybe we'll decide that it's good to keep it for emergencies, which is my present bias. In any event, the resulting system would be more usable and less error-prone than what we have today. -- Robert Haas EDB: http://www.enterprisedb.com
On 17.02.23 21:43, Jeff Davis wrote: >> select row_number() over (order by nazev collate "cs-x-icu"), nazev >> from obce >> except select row_number() over (order by nazev collate "cs_CZ"), >> nazev from obce; >> >> returns a not empty set. So minimally for Czech collate, an index >> rebuild is necessary > Yes, that's true of any locale change, provider change, or even > provider version change. I'm confused. We are not going to try to change existing databases to a different collation provider during pg_upgrade, are we?
On Mon, 2023-02-20 at 15:55 +0100, Peter Eisentraut wrote: > I'm confused. We are not going to try to change existing databases > to a > different collation provider during pg_upgrade, are we? No, certainly not. I interpreted Pavel's comments as a comparison of ICU and libc in general and not specific to this patch. Changing providers obviously requires an index rebuild to be safe. -- Jeff Davis PostgreSQL Contributor Team - AWS
On Fri, 2023-02-17 at 15:07 -0800, Jeff Davis wrote: > 2. Update the pg_database entry for template0. This has less > potential > for surprise in case people are actually using template0 for a > template. New patches attached. 0001: default autoconf to build with ICU (meson already uses 'auto') 0002: update template0 in new cluster (as described above) 0003: default initdb to use ICU Updating template0, as in 0002, seems straightforward and unsurprising, since only template0 is preserved and it was only initialized for the purposes of upgrading. Also, template0 is not sensitive to locale settings, and doesn't even have the datcollversion set. The patch updates encoding, datlocprovider, datcollate, datctype, and daticulocale on the new cluster. No doc update, because there are some initdb settings (like checksums) which still need to be compatible between the old and the new cluster. -- Jeff Davis PostgreSQL Contributor Team - AWS
Attachment
On Fri, 2023-02-24 at 15:54 -0800, Jeff Davis wrote: > 0001: default autoconf to build with ICU (meson already uses > 'auto') What's the best way to prepare for the impact of this on the buildfarm? How should we migrate to using --without-icu for those animals not currently specifying --with-icu? Regards, Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Fri, 2023-02-24 at 15:54 -0800, Jeff Davis wrote: >> 0001: default autoconf to build with ICU (meson already uses >> 'auto') > What's the best way to prepare for the impact of this on the buildfarm? > How should we migrate to using --without-icu for those animals not > currently specifying --with-icu? Tell the buildfarm owners to add --without-icu to their config if they don't have and don't want to install ICU. Wait a couple weeks. Commit, then nag the owners whose machines turn red. regards, tom lane
On 25.02.23 00:54, Jeff Davis wrote: > On Fri, 2023-02-17 at 15:07 -0800, Jeff Davis wrote: >> 2. Update the pg_database entry for template0. This has less >> potential >> for surprise in case people are actually using template0 for a >> template. > > New patches attached. > > 0001: default autoconf to build with ICU (meson already uses 'auto') I would skip this. There was a brief discussion about this at [0], where I pointed out that if we are going to do something like that, there would be other candidates among the optional dependencies to promote, such as certainly openssl and arguably lz4. If we don't do this consistently across dependencies, then there will be confusion. In practice, I don't think it matters. Almost all installations are made by packagers, who will make their own choices. Flipping the default in configure is only going to cause some amount of confusion and annoyance in some places, but won't actually have the ostensibly desired impact in practice. [0]: https://www.postgresql.org/message-id/flat/534fed4a262fee534662bd07a691c5ef%40postgrespro.ru > 0002: update template0 in new cluster (as described above) This makes sense. I'm confused what the code originally wanted to achieve, e.g., -/* - * Check that every database that already exists in the new cluster is - * compatible with the corresponding database in the old one. - */ -static void -check_databases_are_compatible(void) Was there once support for the new cluster having additional databases in place? Weird. In any case, I think you can remove additional code from get_db_infos() related to fields that are no longer used, such as db_encoding, and the corresponding struct fields in DbInfo. > 0003: default initdb to use ICU What are the changes in the citext tests about? Is it the same issue as in unaccent? In that case, the OR should be an AND? Maybe add a comment? Why is unaccent is "broken" if the default collation is provided by ICU and LC_CTYPE=C? Is that a general problem? Should we prevent this combination? What are the changes in the ecpg tests about? Looks harmless, but if there is a need, maybe it should be commented somewhere, otherwise what prevents someone from changing it back?
On Thu, 2023-03-02 at 10:37 +0100, Peter Eisentraut wrote: > I would skip this. There was a brief discussion about this at [0], > where I pointed out that if we are going to do something like that, > there would be other candidates among the optional dependencies to > promote, such as certainly openssl and arguably lz4. If we don't do > this consistently across dependencies, then there will be confusion. The difference is that ICU affects semantics of collations, and collations are not really an optional feature. If postgres is built without ICU, that will affect the default at initdb time (after patch 003, anyway), which will then affect the default collations in all databases. > In practice, I don't think it matters. Almost all installations are > made by packagers, who will make their own choices. Mostly true, but the discussion at [0] reveals that some people do build postgresql themselves for whatever reason. When I first started out with postgres I always built from source. That was quite a while ago, so maybe that means nothing; but it would be sad to think that the build-it-yourself experience doesn't matter. > > 0002: update template0 in new cluster (as described above) > > This makes sense. I'm confused what the code originally wanted to > achieve, e.g., > > -/* > - * Check that every database that already exists in the new cluster > is > - * compatible with the corresponding database in the old one. > - */ > -static void > -check_databases_are_compatible(void) > > Was there once support for the new cluster having additional > databases > in place? Weird. It looks like 33755e8edf was the last significant change here. CC Heikki for comment. > In any case, I think you can remove additional code from > get_db_infos() > related to fields that are no longer used, such as db_encoding, and > the > corresponding struct fields in DbInfo. You're right: there's not much of an intersection between the code that needs a DbInfo and the code that needs the locale fields. I created a separate DbLocaleInfo struct for the template0 locale information, and removed the locale fields from DbInfo. I also added a TAP test. > > 0003: default initdb to use ICU > > What are the changes in the citext tests about? There's a test in citext_utf8.sql for: SELECT 'i'::citext = 'İ'::citext AS t; citext_eq uses DEFAULT_COLLATION_OID, comparing the results after applying lower(). Apparently: lower('İ' collate "en_US") = 'i' -- true lower('İ' collate "tr-TR-x-icu") = 'i' -- true lower('İ' collate "en-US-x-icu") = 'i' -- false the test was passing before because it seems to be true for all libc locales. But for ICU, it seems to only be true in the "tr-TR" locale at colstrength=secondary (and true for other ICU locales at colstrength=primary). We can't fix the test by being explicit about the collation, because citext doesn't pass it down; it always uses the default collation. We could fix citext to pass it down properly, but that seems like a different patch. In any case, citext doesn't seem very important to those using ICU (we have a doc note suggesting ICU instead), so I don't see a strong reason to test the combination. So, I just exit the test early if it's ICU. I added a better comment. > Is it the same issue as > in unaccent? In that case, the OR should be an AND? Maybe add a > comment? > > Why is unaccent is "broken" if the default collation is provided by > ICU > and LC_CTYPE=C? Is that a general problem? Should we prevent this > combination? A different issue: unaccent is calling t_isspace(), which is just not properly handling locales. t_isspace() always passes NULL as the last argument to char2wchar. There are TODO comments throughout that file. Specifically what happens: lc_ctype_is_c(DEFAULT_COLLATION_OID) returns false so it calls char2wchar(), which calls mbstowcs() which returns an error because the LC_CTYPE=C Right now, that's a longstanding issue for all users of t_isspace() and related functions: tsearch, ltree, pg_trgm, dict_xsyn, and unaccent. I assume it was known and considered unimportant, otherwise we wouldn't have left the TODO comments in there. I believe it's only a problem when the provider is ICU and the LC_CTYPE is C. I think a quick fix would be to just test LC_CTYPE directly (from the environment or setlocale(LC_CTYPE, NULL)) rather than try to extract it from the default collation. It sounds like a separate patch, and should be handled as a bugfix and backported. A better fix would be to support character classification in ICU. I don't think that's hard, but ICU has quite a few options, and we'd need to discuss which are the right ones to support. We may also want to pass collation information down rather than just using the database default, but that may depend on the caller and we should discuss that, as well. > What are the changes in the ecpg tests about? Looks harmless, but if > there is a need, maybe it should be commented somewhere, otherwise > what > prevents someone from changing it back? ICU is not compatible with SQL_ASCII, so I had to remove the ENCODING=SQL_ASCII line from the ecpg test build. CC Michael Meskes who added the line in 1fa6be6f69 in case he has a comment. But when I did that, I got CI failures on windows because it couldn't transcode between LATIN1 and WIN1252. So I changed the ecpg test to just use SQL_ASCII for the client_encoding (not the server encoding). Michael Meskes added the client_encoding parameter test in 5e7710e725, so he might have a comment about that as well. Since I removed the code, I didn't see a clear place to add a comment, but if you have a suggestion I'll take it. -- Jeff Davis PostgreSQL Contributor Team - AWS
Attachment
On Fri, 2023-03-03 at 21:45 -0800, Jeff Davis wrote: > > > 0002: update template0 in new cluster (as described above) I think 0002 is about ready and I plan to commit it soon unless someone has more comments. I'm holding off on 0001 for now, because you objected. But I still think 0001 is a good idea so I'd like to hear more before I withdraw it. > > > 0003: default initdb to use ICU This is also about ready, and I plan to commit this soon after 0002. > A different issue: unaccent is calling t_isspace(), which is just not > properly handling locales. t_isspace() always passes NULL as the last > argument to char2wchar. There are TODO comments throughout that file. I posted a bug report and patch for this issue: https://www.postgresql.org/message-id/79e4354d9eccfdb00483146a6b9f6295202e7890.camel@j-davis.com Regards, Jeff Davis
On 08.03.23 06:55, Jeff Davis wrote: > On Fri, 2023-03-03 at 21:45 -0800, Jeff Davis wrote: >>>> 0002: update template0 in new cluster (as described above) > > I think 0002 is about ready and I plan to commit it soon unless someone > has more comments. 0002 seems fine to me. > I'm holding off on 0001 for now, because you objected. But I still > think 0001 is a good idea so I'd like to hear more before I withdraw > it. Let's come back to that after dealing with the other two. >>>> 0003: default initdb to use ICU > > This is also about ready, and I plan to commit this soon after 0002. This seems mostly ok to me. I have a few small comments. + default, ICU obtains the ICU locale from the ICU default collator. This seems to be a fancy way of saying, the default ICU locale will be set to something that approximates what you have set your operating system to. Which is what we want, I think. Can we say this in a more user-friendly way? +static void +check_icu_locale() should be check_icu_locale(void) + if (U_ICU_VERSION_MAJOR_NUM >= 54) + { If we're going to add more of these mysterious version checks, could we add a comment about what they are for? However, I suspect what this chunk is doing is some sort of canonicalization/language-tag conversion, which per the other thread, I have some questions about. How about for this patch, we skip this part and just do the else branch + icu_locale = pg_strdup(default_locale); and then put the canonicalization business into the canonicalization patch set?
On Thu, 2023-03-09 at 10:36 +0100, Peter Eisentraut wrote: > 0002 seems fine to me. Committed 0002 with some test improvements. > > Let's come back to that after dealing with the other two. Leaving 0001 open for now. 0003 committed after addressing your comments. -- Jeff Davis PostgreSQL Contributor Team - AWS
On 09.03.23 20:14, Jeff Davis wrote: >> Let's come back to that after dealing with the other two. > > Leaving 0001 open for now. I suspect making a change like this now would result in a bloodbath on the build farm that we could do without. I suggest revisiting this after the commit fest ends.
On 16.03.23 14:52, Peter Eisentraut wrote: > On 09.03.23 20:14, Jeff Davis wrote: >>> Let's come back to that after dealing with the other two. >> >> Leaving 0001 open for now. > > I suspect making a change like this now would result in a bloodbath on > the build farm that we could do without. I suggest revisiting this > after the commit fest ends. I don't object to this patch. I suggest waiting until next week to commit it and then see what happens. It's easy to revert if it goes terribly.
On Wed, Apr 05, 2023 at 09:33:25AM +0200, Peter Eisentraut wrote: > On 16.03.23 14:52, Peter Eisentraut wrote: > > On 09.03.23 20:14, Jeff Davis wrote: > > > > Let's come back to that after dealing with the other two. > > > > > > Leaving 0001 open for now. > > > > I suspect making a change like this now would result in a bloodbath on > > the build farm that we could do without. I suggest revisiting this > > after the commit fest ends. > > I don't object to this patch. I suggest waiting until next week to commit > it and then see what happens. It's easy to revert if it goes terribly. Is this still being considered for v16 ? -- Justin
On Mon, 2023-04-17 at 08:23 -0500, Justin Pryzby wrote: > > I don't object to this patch. I suggest waiting until next week to > > commit > > it and then see what happens. It's easy to revert if it goes > > terribly. > > Is this still being considered for v16 ? Yes, unless someone raises a procedural objection. Is now a reasonable time to check it in and see what breaks? It looks like there are quite a few buildfarm members that specify neither -- with-icu nor --without-icu. Regards, Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > Is now a reasonable time to check it in and see what breaks? It looks > like there are quite a few buildfarm members that specify neither -- > with-icu nor --without-icu. I see you just pinged buildfarm-members again, so I'd think it's polite to give people 24 hours or so to deal with that before you break things. (My animals are all set, I believe.) regards, tom lane
On 4/17/23 2:33 PM, Tom Lane wrote: > Jeff Davis <pgsql@j-davis.com> writes: >> Is now a reasonable time to check it in and see what breaks? It looks >> like there are quite a few buildfarm members that specify neither -- >> with-icu nor --without-icu. > > I see you just pinged buildfarm-members again, so I'd think it's > polite to give people 24 hours or so to deal with that before > you break things. [RMT hat] This thread has fallen silent and the RMT wanted to check in. The RMT did have a brief discussion on $SUBJECT. We agree with several points that regardless of if/when ICU becomes the default collation provider for PostgreSQL, we'll likely have to flush out several issues. The question is how long we want that period to be before releasing the default. Right now, and in absence of critical issues or objections, the RMT is OK with leaving in ICU as the default collation provider for Beta 1. If we're to revert back to glibc, we recommend doing this before Beta 2. However, if there are strong objections to this proposal, please do state them. Thanks, Jonathan