Thread: EDB builds Postgres 13 with an obsolete ICU version

EDB builds Postgres 13 with an obsolete ICU version

From
"Daniel Verite"
Date:
 Hi,

As a follow-up to bug #16570 [1] and other previous discussions
on the mailing-lists, I'm checking out PG13 beta for Windows
from:
 https://www.enterprisedb.com/postgresql-early-experience
and it ships with the same obsolete ICU 53 that was used
for PG 10,11,12.
Besides not having the latest Unicode features and fixes, ICU 53
ignores the BCP 47 tags syntax in collations used as examples
in Postgres documentation, which leads to confusion and
false bug reports.
The current version is ICU 67.

I don't see where the suggestion to upgrade it before the
next PG release should be addressed but maybe some people on
this list do know or have the leverage to make it happen?

[1]
https://www.postgresql.org/message-id/16570-58cc04e1a6ef3c3f%40postgresql.org

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite



Re: EDB builds Postgres 13 with an obsolete ICU version

From
Bruce Momjian
Date:
On Mon, Aug  3, 2020 at 08:56:06PM +0200, Daniel Verite wrote:
>  Hi,
> 
> As a follow-up to bug #16570 [1] and other previous discussions
> on the mailing-lists, I'm checking out PG13 beta for Windows
> from:
>  https://www.enterprisedb.com/postgresql-early-experience
> and it ships with the same obsolete ICU 53 that was used
> for PG 10,11,12.
> Besides not having the latest Unicode features and fixes, ICU 53
> ignores the BCP 47 tags syntax in collations used as examples
> in Postgres documentation, which leads to confusion and
> false bug reports.
> The current version is ICU 67.
> 
> I don't see where the suggestion to upgrade it before the
> next PG release should be addressed but maybe some people on
> this list do know or have the leverage to make it happen?

Well, you can ask EDB about this, but perhaps the have kept the same ICU
version so indexes will not need to be reindexed.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: EDB builds Postgres 13 with an obsolete ICU version

From
Dave Page
Date:


On Tue, Aug 4, 2020 at 1:04 AM Bruce Momjian <bruce@momjian.us> wrote:
On Mon, Aug  3, 2020 at 08:56:06PM +0200, Daniel Verite wrote:
>  Hi,
>
> As a follow-up to bug #16570 [1] and other previous discussions
> on the mailing-lists, I'm checking out PG13 beta for Windows
> from:
https://www.enterprisedb.com/postgresql-early-experience
> and it ships with the same obsolete ICU 53 that was used
> for PG 10,11,12.
> Besides not having the latest Unicode features and fixes, ICU 53
> ignores the BCP 47 tags syntax in collations used as examples
> in Postgres documentation, which leads to confusion and
> false bug reports.
> The current version is ICU 67.
>
> I don't see where the suggestion to upgrade it before the
> next PG release should be addressed but maybe some people on
> this list do know or have the leverage to make it happen?

Well, you can ask EDB about this, but perhaps the have kept the same ICU
version so indexes will not need to be reindexed.

Correct - updating ICU would mean a reindex is required following any upgrade, major or minor.

I would really like to find an acceptable solution to this however as it really would be good to be able to update ICU.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Re: EDB builds Postgres 13 with an obsolete ICU version

From
Thomas Kellerer
Date:
Dave Page schrieb am 04.08.2020 um 10:06:
> Correct - updating ICU would mean a reindex is required following any
> upgrade, major or minor.
>
> I would really like to find an acceptable solution to this however as
> it really would be good to be able to update ICU.
>

What about providing a newer ICU version as kind of an "add-on" download containing only the needed DLLs (assuming it's
aseasy as only replacing the DLLs)? 

Then everyone who wishes to use a newer ICU version can manually install them.
If that download carries a big "ATTENTION: reindex required" I don't think this would be a big risk.

Thomas





Re: EDB builds Postgres 13 with an obsolete ICU version

From
Magnus Hagander
Date:
On Tue, Aug 4, 2020 at 10:07 AM Dave Page <dpage@pgadmin.org> wrote:


On Tue, Aug 4, 2020 at 1:04 AM Bruce Momjian <bruce@momjian.us> wrote:
On Mon, Aug  3, 2020 at 08:56:06PM +0200, Daniel Verite wrote:
>  Hi,
>
> As a follow-up to bug #16570 [1] and other previous discussions
> on the mailing-lists, I'm checking out PG13 beta for Windows
> from:
https://www.enterprisedb.com/postgresql-early-experience
> and it ships with the same obsolete ICU 53 that was used
> for PG 10,11,12.
> Besides not having the latest Unicode features and fixes, ICU 53
> ignores the BCP 47 tags syntax in collations used as examples
> in Postgres documentation, which leads to confusion and
> false bug reports.
> The current version is ICU 67.
>
> I don't see where the suggestion to upgrade it before the
> next PG release should be addressed but maybe some people on
> this list do know or have the leverage to make it happen?

Well, you can ask EDB about this, but perhaps the have kept the same ICU
version so indexes will not need to be reindexed.

Correct - updating ICU would mean a reindex is required following any upgrade, major or minor.

I would really like to find an acceptable solution to this however as it really would be good to be able to update ICU.

It certainly couldn't and shouldn't be done in a minor.

But doing so in v13 doesn't seem entirely unreasonable, especially given that I believe we will detect the requirement to reindex thanks to the versioning, and not just start returning invalid results (like, say, with those glibc updates). 

Would it be possible to have the installer even check if there are any icu indexes in the database. If there aren't, just put in the new version of icu. If there are, give the user a choice of the old version or new version and reindex?

--

Re: EDB builds Postgres 13 with an obsolete ICU version

From
Dave Page
Date:


On Tue, Aug 4, 2020 at 10:29 AM Magnus Hagander <magnus@hagander.net> wrote:
On Tue, Aug 4, 2020 at 10:07 AM Dave Page <dpage@pgadmin.org> wrote:


On Tue, Aug 4, 2020 at 1:04 AM Bruce Momjian <bruce@momjian.us> wrote:
On Mon, Aug  3, 2020 at 08:56:06PM +0200, Daniel Verite wrote:
>  Hi,
>
> As a follow-up to bug #16570 [1] and other previous discussions
> on the mailing-lists, I'm checking out PG13 beta for Windows
> from:
https://www.enterprisedb.com/postgresql-early-experience
> and it ships with the same obsolete ICU 53 that was used
> for PG 10,11,12.
> Besides not having the latest Unicode features and fixes, ICU 53
> ignores the BCP 47 tags syntax in collations used as examples
> in Postgres documentation, which leads to confusion and
> false bug reports.
> The current version is ICU 67.
>
> I don't see where the suggestion to upgrade it before the
> next PG release should be addressed but maybe some people on
> this list do know or have the leverage to make it happen?

Well, you can ask EDB about this, but perhaps the have kept the same ICU
version so indexes will not need to be reindexed.

Correct - updating ICU would mean a reindex is required following any upgrade, major or minor.

I would really like to find an acceptable solution to this however as it really would be good to be able to update ICU.

It certainly couldn't and shouldn't be done in a minor.

But doing so in v13 doesn't seem entirely unreasonable, especially given that I believe we will detect the requirement to reindex thanks to the versioning, and not just start returning invalid results (like, say, with those glibc updates). 

Would it be possible to have the installer even check if there are any icu indexes in the database. If there aren't, just put in the new version of icu. If there are, give the user a choice of the old version or new version and reindex?

That would require fairly large changes to the installer to allow it to login to the database server (whether that would work would be dependent on how pg_hba.conf is configured), and also assumes that the ICU ABI hasn't changed between releases. It would also require some hacky renaming of DLLs, as they have the version number in them.

The chances of designing, building and testing that thoroughly before v13 is released is about zero I'd say.
 
--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Re: EDB builds Postgres 13 with an obsolete ICU version

From
Magnus Hagander
Date:


On Tue, Aug 4, 2020 at 11:42 AM Dave Page <dpage@pgadmin.org> wrote:


On Tue, Aug 4, 2020 at 10:29 AM Magnus Hagander <magnus@hagander.net> wrote:
On Tue, Aug 4, 2020 at 10:07 AM Dave Page <dpage@pgadmin.org> wrote:


On Tue, Aug 4, 2020 at 1:04 AM Bruce Momjian <bruce@momjian.us> wrote:
On Mon, Aug  3, 2020 at 08:56:06PM +0200, Daniel Verite wrote:
>  Hi,
>
> As a follow-up to bug #16570 [1] and other previous discussions
> on the mailing-lists, I'm checking out PG13 beta for Windows
> from:
https://www.enterprisedb.com/postgresql-early-experience
> and it ships with the same obsolete ICU 53 that was used
> for PG 10,11,12.
> Besides not having the latest Unicode features and fixes, ICU 53
> ignores the BCP 47 tags syntax in collations used as examples
> in Postgres documentation, which leads to confusion and
> false bug reports.
> The current version is ICU 67.
>
> I don't see where the suggestion to upgrade it before the
> next PG release should be addressed but maybe some people on
> this list do know or have the leverage to make it happen?

Well, you can ask EDB about this, but perhaps the have kept the same ICU
version so indexes will not need to be reindexed.

Correct - updating ICU would mean a reindex is required following any upgrade, major or minor.

I would really like to find an acceptable solution to this however as it really would be good to be able to update ICU.

It certainly couldn't and shouldn't be done in a minor.

But doing so in v13 doesn't seem entirely unreasonable, especially given that I believe we will detect the requirement to reindex thanks to the versioning, and not just start returning invalid results (like, say, with those glibc updates). 

Would it be possible to have the installer even check if there are any icu indexes in the database. If there aren't, just put in the new version of icu. If there are, give the user a choice of the old version or new version and reindex?

That would require fairly large changes to the installer to allow it to login to the database server (whether that would work would be dependent on how pg_hba.conf is configured), and also assumes that the ICU ABI hasn't changed between releases. It would also require some hacky renaming of DLLs, as they have the version number in them.

I assumed it had code for that stuff already. Mainly because I assumed it supported doing pg_upgrade, which requires similar things no?

 

The chances of designing, building and testing that thoroughly before v13 is released is about zero I'd say.

Yeah, I can see how it would be for 13 -- unfortunately. But I definitely think it's something that should go high on the list of things to get fixed for 14.

//Magnus

Re: EDB builds Postgres 13 with an obsolete ICU version

From
Jaime Casanova
Date:
On Mon, 3 Aug 2020 at 13:56, Daniel Verite <daniel@manitou-mail.org> wrote:
>
>  Hi,
>
> As a follow-up to bug #16570 [1] and other previous discussions
> on the mailing-lists, I'm checking out PG13 beta for Windows
> from:
>  https://www.enterprisedb.com/postgresql-early-experience
> and it ships with the same obsolete ICU 53 that was used
> for PG 10,11,12.
> Besides not having the latest Unicode features and fixes, ICU 53
> ignores the BCP 47 tags syntax in collations used as examples
> in Postgres documentation, which leads to confusion and
> false bug reports.
> The current version is ICU 67.
>

Hi,

Sadly, that is managed by EDB and not by the community.

You can try https://www.2ndquadrant.com/en/resources/postgresql-installer-2ndquadrant/
which uses ICU-62.2, is not the latest but should allow you to follow
the examples in the documentation.

-- 
Jaime Casanova                      www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: EDB builds Postgres 13 with an obsolete ICU version

From
Thomas Kellerer
Date:
Jaime Casanova schrieb am 11.08.2020 um 20:39:
>> As a follow-up to bug #16570 [1] and other previous discussions
>> on the mailing-lists, I'm checking out PG13 beta for Windows
>> from:
>>   https://www.enterprisedb.com/postgresql-early-experience
>> and it ships with the same obsolete ICU 53 that was used
>> for PG 10,11,12.
>> Besides not having the latest Unicode features and fixes, ICU 53
>> ignores the BCP 47 tags syntax in collations used as examples
>> in Postgres documentation, which leads to confusion and
>> false bug reports.
>> The current version is ICU 67.
>>
>
> Sadly, that is managed by EDB and not by the community.
>
> You can try https://www.2ndquadrant.com/en/resources/postgresql-installer-2ndquadrant/
> which uses ICU-62.2, is not the latest but should allow you to follow
> the examples in the documentation.


One of the reasons I prefer the EDB builds is, that they provide a ZIP file without the installer overhead.
Any chance 2ndQuadrant can supply something like that as well?

Thomas



Re: EDB builds Postgres 13 with an obsolete ICU version

From
Jaime Casanova
Date:
On Tue, 11 Aug 2020 at 13:45, Thomas Kellerer <shammat@gmx.net> wrote:
>
> Jaime Casanova schrieb am 11.08.2020 um 20:39:
> >> As a follow-up to bug #16570 [1] and other previous discussions
> >> on the mailing-lists, I'm checking out PG13 beta for Windows
> >> from:
> >>   https://www.enterprisedb.com/postgresql-early-experience
> >> and it ships with the same obsolete ICU 53 that was used
> >> for PG 10,11,12.
> >> Besides not having the latest Unicode features and fixes, ICU 53
> >> ignores the BCP 47 tags syntax in collations used as examples
> >> in Postgres documentation, which leads to confusion and
> >> false bug reports.
> >> The current version is ICU 67.
> >>
> >
> > Sadly, that is managed by EDB and not by the community.
> >
> > You can try https://www.2ndquadrant.com/en/resources/postgresql-installer-2ndquadrant/
> > which uses ICU-62.2, is not the latest but should allow you to follow
> > the examples in the documentation.
>
>
> One of the reasons I prefer the EDB builds is, that they provide a ZIP file without the installer overhead.
> Any chance 2ndQuadrant can supply something like that as well?
>

i don't think so, an unattended install mode is the closest

-- 
Jaime Casanova                      www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: EDB builds Postgres 13 with an obsolete ICU version

From
Bruce Momjian
Date:
On Tue, Aug 11, 2020 at 02:58:30PM +0200, Magnus Hagander wrote:
> On Tue, Aug 4, 2020 at 11:42 AM Dave Page <dpage@pgadmin.org> wrote:
>     That would require fairly large changes to the installer to allow it to
>     login to the database server (whether that would work would be dependent on
>     how pg_hba.conf is configured), and also assumes that the ICU ABI hasn't
>     changed between releases. It would also require some hacky renaming of
>     DLLs, as they have the version number in them.
> 
> I assumed it had code for that stuff already. Mainly because I assumed it
> supported doing pg_upgrade, which requires similar things no?

While pg_upgrade requires having the old and new cluster software in
place, I don't think it helps allowing different ICU versions for each
cluster.  I guess you can argue that if you know the user is _not_ going
to be using pg_upgrade, then a new ICU version should be used for the
new cluster.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: EDB builds Postgres 13 with an obsolete ICU version

From
Michael Paquier
Date:
On Fri, Aug 14, 2020 at 09:00:06AM -0400, Bruce Momjian wrote:
> On Tue, Aug 11, 2020 at 02:58:30PM +0200, Magnus Hagander wrote:
>> I assumed it had code for that stuff already. Mainly because I assumed it
>> supported doing pg_upgrade, which requires similar things no?
>
> While pg_upgrade requires having the old and new cluster software in
> place, I don't think it helps allowing different ICU versions for each
> cluster.  I guess you can argue that if you know the user is _not_ going
> to be using pg_upgrade, then a new ICU version should be used for the
> new cluster.

We have nothing in core, yet, that helps with this kind of problem
with binary upgrades.  In the last year, Julien and I worked on an
upgrade case where a glibc upgrade was involved with pg_upgrade used
for PG, and it could not afford the use of a new host to allow a
logical dump/restore to rebuild the indexes from scratch.  You can
always run a "reindex -a" after the upgrade to be sure that no indexes
are broken because of the changes with collation versions, but once
you have to give the guarantee that an upgrade does not take longer
than a certain amount of time, the reindex easily becomes the
bottleneck.  That's one motivation behind the recent work to add
collation versions to pg_depend entries, which would lead to more
filtering facilities for REINDEX on the backend to get for example the
option to only reindex collation-sensitive indexes (imagine just a
reindexdb --jobs with the collation filtering done at table-level,
that would be fast, or a script doing this work generated by
pg_upgrade).
--
Michael

Attachment

Re: EDB builds Postgres 13 with an obsolete ICU version

From
Bruce Momjian
Date:
On Fri, Aug 14, 2020 at 10:23:27PM +0900, Michael Paquier wrote:
> We have nothing in core, yet, that helps with this kind of problem
> with binary upgrades.  In the last year, Julien and I worked on an
> upgrade case where a glibc upgrade was involved with pg_upgrade used
> for PG, and it could not afford the use of a new host to allow a
> logical dump/restore to rebuild the indexes from scratch.  You can
> always run a "reindex -a" after the upgrade to be sure that no indexes
> are broken because of the changes with collation versions, but once
> you have to give the guarantee that an upgrade does not take longer
> than a certain amount of time, the reindex easily becomes the
> bottleneck.  That's one motivation behind the recent work to add
> collation versions to pg_depend entries, which would lead to more
> filtering facilities for REINDEX on the backend to get for example the
> option to only reindex collation-sensitive indexes (imagine just a
> reindexdb --jobs with the collation filtering done at table-level,
> that would be fast, or a script doing this work generated by
> pg_upgrade).

Agreed --- only a small percentage of indexes are affected by
collations, and it would be great if we could tell users how to easily
identify them.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: EDB builds Postgres 13 with an obsolete ICU version

From
Magnus Hagander
Date:


On Fri, Aug 14, 2020 at 3:00 PM Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Aug 11, 2020 at 02:58:30PM +0200, Magnus Hagander wrote:
> On Tue, Aug 4, 2020 at 11:42 AM Dave Page <dpage@pgadmin.org> wrote:
>     That would require fairly large changes to the installer to allow it to
>     login to the database server (whether that would work would be dependent on
>     how pg_hba.conf is configured), and also assumes that the ICU ABI hasn't
>     changed between releases. It would also require some hacky renaming of
>     DLLs, as they have the version number in them.
>
> I assumed it had code for that stuff already. Mainly because I assumed it
> supported doing pg_upgrade, which requires similar things no?

While pg_upgrade requires having the old and new cluster software in
place, I don't think it helps allowing different ICU versions for each
cluster. 

Depends on where they are installed (and disclaimer, I don't know how the windows installers do that). But as long as the ICU libraries are installed in separate locations for the two versions, which I *think* they are or at least used to be, it shouldn't have an effect on this in either direction.

That argument really only holds for different versions, not for different clusters of the same version. But I don't think the installers (natively) supports multiple clusters of the same version anyway.

The tricky thing is if you want to allow the user to *choose* which ICU version should be used with postgres version <x>.  Because then the user might also expect an upgrade-path wherein they only upgrade the icu library on an existing install...
 
I guess you can argue that if you know the user is _not_ going
to be using pg_upgrade, then a new ICU version should be used for the
new cluster.

Yes, that's exactly the argument I meant :) If the user got the option to "pick version of ICU: <old>, <new>", with a comment saying "pick old only if you plan to do a pg_upgrade based upgrade of a different cluster, or if this instance should participate in replication with a node using <old>", that would probably help for the vast majority of cases. And of course, if the installer through other options can determine with certainty that it's going to be running pg_upgrade for the user, then it can reword the dialog based on that (that is, it should still allow the user to pick the new version, as long as they know that their indexes are going to need reindexing)


--

Re: EDB builds Postgres 13 with an obsolete ICU version

From
Dave Page
Date:


On Mon, Aug 17, 2020 at 11:19 AM Magnus Hagander <magnus@hagander.net> wrote:


On Fri, Aug 14, 2020 at 3:00 PM Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Aug 11, 2020 at 02:58:30PM +0200, Magnus Hagander wrote:
> On Tue, Aug 4, 2020 at 11:42 AM Dave Page <dpage@pgadmin.org> wrote:
>     That would require fairly large changes to the installer to allow it to
>     login to the database server (whether that would work would be dependent on
>     how pg_hba.conf is configured), and also assumes that the ICU ABI hasn't
>     changed between releases. It would also require some hacky renaming of
>     DLLs, as they have the version number in them.
>
> I assumed it had code for that stuff already. Mainly because I assumed it
> supported doing pg_upgrade, which requires similar things no?

No, the installers don't support pg_upgrade directly. They ship it of course, and the user can manually run it, but the installers won't do that, and have no ability to login to a cluster except during the post-initdb phase.
 

While pg_upgrade requires having the old and new cluster software in
place, I don't think it helps allowing different ICU versions for each
cluster. 

Depends on where they are installed (and disclaimer, I don't know how the windows installers do that). But as long as the ICU libraries are installed in separate locations for the two versions, which I *think* they are or at least used to be, it shouldn't have an effect on this in either direction.

They are.
 

That argument really only holds for different versions, not for different clusters of the same version. But I don't think the installers (natively) supports multiple clusters of the same version anyway.

They don't. You'd need to manually init a new cluster and register a new server instance. The installer only has any knowledge of the cluster it sets up.
 

The tricky thing is if you want to allow the user to *choose* which ICU version should be used with postgres version <x>.  Because then the user might also expect an upgrade-path wherein they only upgrade the icu library on an existing install...
 
I guess you can argue that if you know the user is _not_ going
to be using pg_upgrade, then a new ICU version should be used for the
new cluster.

Yes, that's exactly the argument I meant :) If the user got the option to "pick version of ICU: <old>, <new>", with a comment saying "pick old only if you plan to do a pg_upgrade based upgrade of a different cluster, or if this instance should participate in replication with a node using <old>", that would probably help for the vast majority of cases. And of course, if the installer through other options can determine with certainty that it's going to be running pg_upgrade for the user, then it can reword the dialog based on that (that is, it should still allow the user to pick the new version, as long as they know that their indexes are going to need reindexing)

That seems like a very hacky and extremely user-unfriendly approach. How many users are going to understand options in the installer to deal with that, or want to go decode the ICU filenames on their existing installations (which our installers may not actually know about) to figure out what their current version is?

I would suggest that the better way to handle this would be for pg_upgrade to (somehow) check the ICU version on the old and new clusters and if there's a mismatch perform a reindex of any ICU based indexes. I suspect that may require that the server exposes the ICU version though. That way, the installers could freely upgrade the ICU version with a new major release.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Re: EDB builds Postgres 13 with an obsolete ICU version

From
Magnus Hagander
Date:


On Mon, Aug 17, 2020 at 1:44 PM Dave Page <dpage@pgadmin.org> wrote:


On Mon, Aug 17, 2020 at 11:19 AM Magnus Hagander <magnus@hagander.net> wrote:


On Fri, Aug 14, 2020 at 3:00 PM Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Aug 11, 2020 at 02:58:30PM +0200, Magnus Hagander wrote:
> On Tue, Aug 4, 2020 at 11:42 AM Dave Page <dpage@pgadmin.org> wrote:
>     That would require fairly large changes to the installer to allow it to
>     login to the database server (whether that would work would be dependent on
>     how pg_hba.conf is configured), and also assumes that the ICU ABI hasn't
>     changed between releases. It would also require some hacky renaming of
>     DLLs, as they have the version number in them.
>
> I assumed it had code for that stuff already. Mainly because I assumed it
> supported doing pg_upgrade, which requires similar things no?

No, the installers don't support pg_upgrade directly. They ship it of course, and the user can manually run it, but the installers won't do that, and have no ability to login to a cluster except during the post-initdb phase.

Oh, I just assumed it did :)

If it doesn't, I think shipping with a modern ICU is a much smaller problem really...


While pg_upgrade requires having the old and new cluster software in
place, I don't think it helps allowing different ICU versions for each
cluster. 

Depends on where they are installed (and disclaimer, I don't know how the windows installers do that). But as long as the ICU libraries are installed in separate locations for the two versions, which I *think* they are or at least used to be, it shouldn't have an effect on this in either direction.

They are.

Good. So putting both in wouldn't break things.



That argument really only holds for different versions, not for different clusters of the same version. But I don't think the installers (natively) supports multiple clusters of the same version anyway.

They don't. You'd need to manually init a new cluster and register a new server instance. The installer only has any knowledge of the cluster it sets up.

I'd say that's "unsupported enough" to not be a scenario one has to consider.



The tricky thing is if you want to allow the user to *choose* which ICU version should be used with postgres version <x>.  Because then the user might also expect an upgrade-path wherein they only upgrade the icu library on an existing install...
 
I guess you can argue that if you know the user is _not_ going
to be using pg_upgrade, then a new ICU version should be used for the
new cluster.

Yes, that's exactly the argument I meant :) If the user got the option to "pick version of ICU: <old>, <new>", with a comment saying "pick old only if you plan to do a pg_upgrade based upgrade of a different cluster, or if this instance should participate in replication with a node using <old>", that would probably help for the vast majority of cases. And of course, if the installer through other options can determine with certainty that it's going to be running pg_upgrade for the user, then it can reword the dialog based on that (that is, it should still allow the user to pick the new version, as long as they know that their indexes are going to need reindexing)

That seems like a very hacky and extremely user-unfriendly approach. How many users are going to understand options in the installer to deal with that, or want to go decode the ICU filenames on their existing installations (which our installers may not actually know about) to figure out what their current version is?


That was more if the installer actually handles the whole chain. It clearly doesn't today (since it doesn't support upgrades), I agree this might definitely be overkill. But then also I don't really see the problem with just putting a new version of ICU in with the newer versions of PostgreSQL. That's just puts the user in the same position as they are with any other platform wrt manual pg_upgrade runs.

 

I would suggest that the better way to handle this would be for pg_upgrade to (somehow) check the ICU version on the old and new clusters and if there's a mismatch perform a reindex of any ICU based indexes. I suspect that may require that the server exposes the ICU version though. That way, the installers could freely upgrade the ICU version with a new major release.

Having pg_upgrade spit out a script that does reindex specifically on the indexes required would certainly be useful in the generic case, and help others as well.


--

Re: EDB builds Postgres 13 with an obsolete ICU version

From
Dave Page
Date:


On Mon, Aug 17, 2020 at 4:14 PM Magnus Hagander <magnus@hagander.net> wrote:


On Mon, Aug 17, 2020 at 1:44 PM Dave Page <dpage@pgadmin.org> wrote:


On Mon, Aug 17, 2020 at 11:19 AM Magnus Hagander <magnus@hagander.net> wrote:


On Fri, Aug 14, 2020 at 3:00 PM Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Aug 11, 2020 at 02:58:30PM +0200, Magnus Hagander wrote:
> On Tue, Aug 4, 2020 at 11:42 AM Dave Page <dpage@pgadmin.org> wrote:
>     That would require fairly large changes to the installer to allow it to
>     login to the database server (whether that would work would be dependent on
>     how pg_hba.conf is configured), and also assumes that the ICU ABI hasn't
>     changed between releases. It would also require some hacky renaming of
>     DLLs, as they have the version number in them.
>
> I assumed it had code for that stuff already. Mainly because I assumed it
> supported doing pg_upgrade, which requires similar things no?

No, the installers don't support pg_upgrade directly. They ship it of course, and the user can manually run it, but the installers won't do that, and have no ability to login to a cluster except during the post-initdb phase.

Oh, I just assumed it did :)

If it doesn't, I think shipping with a modern ICU is a much smaller problem really...


While pg_upgrade requires having the old and new cluster software in
place, I don't think it helps allowing different ICU versions for each
cluster. 

Depends on where they are installed (and disclaimer, I don't know how the windows installers do that). But as long as the ICU libraries are installed in separate locations for the two versions, which I *think* they are or at least used to be, it shouldn't have an effect on this in either direction.

They are.

Good. So putting both in wouldn't break things.



That argument really only holds for different versions, not for different clusters of the same version. But I don't think the installers (natively) supports multiple clusters of the same version anyway.

They don't. You'd need to manually init a new cluster and register a new server instance. The installer only has any knowledge of the cluster it sets up.

I'd say that's "unsupported enough" to not be a scenario one has to consider.

Agreed. Plus it's not really any different from running multiple clusters on other OSs where we're likely to be using a vendor supplied ICU that the user also couldn't change easily.
 



The tricky thing is if you want to allow the user to *choose* which ICU version should be used with postgres version <x>.  Because then the user might also expect an upgrade-path wherein they only upgrade the icu library on an existing install...
 
I guess you can argue that if you know the user is _not_ going
to be using pg_upgrade, then a new ICU version should be used for the
new cluster.

Yes, that's exactly the argument I meant :) If the user got the option to "pick version of ICU: <old>, <new>", with a comment saying "pick old only if you plan to do a pg_upgrade based upgrade of a different cluster, or if this instance should participate in replication with a node using <old>", that would probably help for the vast majority of cases. And of course, if the installer through other options can determine with certainty that it's going to be running pg_upgrade for the user, then it can reword the dialog based on that (that is, it should still allow the user to pick the new version, as long as they know that their indexes are going to need reindexing)

That seems like a very hacky and extremely user-unfriendly approach. How many users are going to understand options in the installer to deal with that, or want to go decode the ICU filenames on their existing installations (which our installers may not actually know about) to figure out what their current version is?


That was more if the installer actually handles the whole chain. It clearly doesn't today (since it doesn't support upgrades), I agree this might definitely be overkill. But then also I don't really see the problem with just putting a new version of ICU in with the newer versions of PostgreSQL. That's just puts the user in the same position as they are with any other platform wrt manual pg_upgrade runs.

Well we can certainly do that if everyone is happy in the knowledge that it'll mean pg_upgrade users will need to reindex if they've used ICU collations.

Sandeep; can you have someone do a test build with the latest ICU please (for background, this would be with the Windows and Mac installers)? If noone objects, we can push that into the v13 builds before GA. We'd also need to update the README if we do so.
 

 

I would suggest that the better way to handle this would be for pg_upgrade to (somehow) check the ICU version on the old and new clusters and if there's a mismatch perform a reindex of any ICU based indexes. I suspect that may require that the server exposes the ICU version though. That way, the installers could freely upgrade the ICU version with a new major release.

Having pg_upgrade spit out a script that does reindex specifically on the indexes required would certainly be useful in the generic case, and help others as well.

+1 

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Re: EDB builds Postgres 13 with an obsolete ICU version

From
Bruce Momjian
Date:
On Mon, Aug 17, 2020 at 04:55:13PM +0100, Dave Page wrote:
>     That was more if the installer actually handles the whole chain. It clearly
>     doesn't today (since it doesn't support upgrades), I agree this might
>     definitely be overkill. But then also I don't really see the problem with
>     just putting a new version of ICU in with the newer versions of PostgreSQL.
>     That's just puts the user in the same position as they are with any other
>     platform wrt manual pg_upgrade runs.
> 
> Well we can certainly do that if everyone is happy in the knowledge that it'll
> mean pg_upgrade users will need to reindex if they've used ICU collations.
> 
> Sandeep; can you have someone do a test build with the latest ICU please (for
> background, this would be with the Windows and Mac installers)? If noone
> objects, we can push that into the v13 builds before GA. We'd also need to
> update the README if we do so.

Woh, we don't have any support in pg_upgrade to reindex just indexes
that use ICU collations, and frankly, if they have to reindex, they
might decide that they should just do pg_dump/reload of their cluster at
that point because pg_upgrade is going to be very slow, and they will be
surprised.  I can see a lot more people being disappointed by this than
will be happy to have Postgres using a newer ICU library.

Also, is it the ICU library version we should be tracking for reindex,
or each _collation_ version?  If the later, do we store the collation
version for each index?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: EDB builds Postgres 13 with an obsolete ICU version

From
Michael Paquier
Date:
On Mon, Aug 17, 2020 at 02:23:57PM -0400, Bruce Momjian wrote:
> Also, is it the ICU library version we should be tracking for reindex,
> or each _collation_ version?  If the later, do we store the collation
> version for each index?

You need to store the collation version(s) for each index.  This
thread deals with the problem:
https://commitfest.postgresql.org/29/2367/
https://www.postgresql.org/message-id/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com

That's not all of it as you would still need some filtering
capabilities in the backend to reindex only the collation-sensitive
indexes with a reindex, but that's one step forward into being able to
do that.
--
Michael

Attachment

Re: EDB builds Postgres 13 with an obsolete ICU version

From
Bruce Momjian
Date:
On Tue, Aug 18, 2020 at 09:44:35AM +0900, Michael Paquier wrote:
> On Mon, Aug 17, 2020 at 02:23:57PM -0400, Bruce Momjian wrote:
> > Also, is it the ICU library version we should be tracking for reindex,
> > or each _collation_ version?  If the later, do we store the collation
> > version for each index?
> 
> You need to store the collation version(s) for each index.  This
> thread deals with the problem:
> https://commitfest.postgresql.org/29/2367/
> https://www.postgresql.org/message-id/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
> 
> That's not all of it as you would still need some filtering
> capabilities in the backend to reindex only the collation-sensitive
> indexes with a reindex, but that's one step forward into being able to
> do that.

Oh, we don't even have the version in the system catalogs yet?  I guess
when pg_upgrade runs create_index we could grab it then, and for the
pg_upgrade _after_ that, do the checks.  This seems like it is years
away from being useful.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: EDB builds Postgres 13 with an obsolete ICU version

From
Dave Page
Date:


On Mon, Aug 17, 2020 at 7:23 PM Bruce Momjian <bruce@momjian.us> wrote:
On Mon, Aug 17, 2020 at 04:55:13PM +0100, Dave Page wrote:
>     That was more if the installer actually handles the whole chain. It clearly
>     doesn't today (since it doesn't support upgrades), I agree this might
>     definitely be overkill. But then also I don't really see the problem with
>     just putting a new version of ICU in with the newer versions of PostgreSQL.
>     That's just puts the user in the same position as they are with any other
>     platform wrt manual pg_upgrade runs.
>
> Well we can certainly do that if everyone is happy in the knowledge that it'll
> mean pg_upgrade users will need to reindex if they've used ICU collations.
>
> Sandeep; can you have someone do a test build with the latest ICU please (for
> background, this would be with the Windows and Mac installers)? If noone
> objects, we can push that into the v13 builds before GA. We'd also need to
> update the README if we do so.

Woh, we don't have any support in pg_upgrade to reindex just indexes
that use ICU collations, and frankly, if they have to reindex, they
might decide that they should just do pg_dump/reload of their cluster at
that point because pg_upgrade is going to be very slow, and they will be
surprised. 

Not necessarily. It's likely that not all indexes use ICU collations, and you still save time loading what may be large amounts of data.

I agree though, that it *could* be slow.
 
I can see a lot more people being disappointed by this than
will be happy to have Postgres using a newer ICU library.

Quite possibly, hence my hesitation to push ahead with anything more than a simple test build at this time.
 

Also, is it the ICU library version we should be tracking for reindex,
or each _collation_ version?  If the later, do we store the collation
version for each index?

I wasn't aware that ICU had the concept of collation versions internally (which Michael seems to have confirmed downthread). That would potentially make the number of users needing a reindex even smaller, but as you point out won't help us for years as we don't store it anyway. 

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Re: EDB builds Postgres 13 with an obsolete ICU version

From
Magnus Hagander
Date:


On Tue, Aug 18, 2020 at 11:24 AM Dave Page <dpage@pgadmin.org> wrote:


On Mon, Aug 17, 2020 at 7:23 PM Bruce Momjian <bruce@momjian.us> wrote:
On Mon, Aug 17, 2020 at 04:55:13PM +0100, Dave Page wrote:
>     That was more if the installer actually handles the whole chain. It clearly
>     doesn't today (since it doesn't support upgrades), I agree this might
>     definitely be overkill. But then also I don't really see the problem with
>     just putting a new version of ICU in with the newer versions of PostgreSQL.
>     That's just puts the user in the same position as they are with any other
>     platform wrt manual pg_upgrade runs.
>
> Well we can certainly do that if everyone is happy in the knowledge that it'll
> mean pg_upgrade users will need to reindex if they've used ICU collations.
>
> Sandeep; can you have someone do a test build with the latest ICU please (for
> background, this would be with the Windows and Mac installers)? If noone
> objects, we can push that into the v13 builds before GA. We'd also need to
> update the README if we do so.

Woh, we don't have any support in pg_upgrade to reindex just indexes
that use ICU collations, and frankly, if they have to reindex, they
might decide that they should just do pg_dump/reload of their cluster at
that point because pg_upgrade is going to be very slow, and they will be
surprised. 

Not necessarily. It's likely that not all indexes use ICU collations, and you still save time loading what may be large amounts of data.

I agree though, that it *could* be slow.

I agree it definitely could, but I'm not sure I see any case where it would actually be slower than the alternative (which would be dump/reload).

I'm also willing to say that given that (1) the windows installers don't provide a way to do it automatically, and (2) the "commandline challenge" of running pg_upgrade on WIndows in general, I bet there's a larger percentage of users who are using dump/reload rather than pg_upgrade on Windows than on other platforms already in the first place.

 
I can see a lot more people being disappointed by this than
will be happy to have Postgres using a newer ICU library.

Quite possibly, hence my hesitation to push ahead with anything more than a simple test build at this time.

My guess would be in the other direction :) But in particular, the vast majority probably don't care at all, because they're not using ICU collations.

It might be a slightly larger percentage on Windows who use it, but I'm willing to bet it's still quite low.


Also, is it the ICU library version we should be tracking for reindex,
or each _collation_ version?  If the later, do we store the collation
version for each index?

I wasn't aware that ICU had the concept of collation versions internally (which Michael seems to have confirmed downthread). That would potentially make the number of users needing a reindex even smaller, but as you point out won't help us for years as we don't store it anyway. 

It does -- and we track it in pg_collation at this point.

I think the part that Michael is referring to is we don't track enough details on a per-index basis. The suggested changes (in the separate thread) are to get rid of it from pg_collation and move it to a per-object dependency.

(And fwiw contains a patch to pg_upgrade to at least give it the ability to for all old indexes say "i know that my icu is compatible". But yeah, the full functionality won't be available until upgrading *from* 14)

--

Re: EDB builds Postgres 13 with an obsolete ICU version

From
Thomas Kellerer
Date:
Magnus Hagander schrieb am 18.08.2020 um 11:38:
> It might be a slightly larger percentage on Windows who use it, but
> I'm willing to bet it's still quite low.

I have seen increasingly more questions around ICU collations on Windows due to the fact that people that migrate from
SQLServer to Postgres very often keep Windows as the operating system and they want to get SQL Server's
case-insensitivity(at least partially) using ICU collations.
 

Thomas





Re: EDB builds Postgres 13 with an obsolete ICU version

From
Julien Rouhaud
Date:
On Tue, Aug 18, 2020 at 11:39 AM Magnus Hagander <magnus@hagander.net> wrote:
>
> On Tue, Aug 18, 2020 at 11:24 AM Dave Page <dpage@pgadmin.org> wrote:
>>
>> On Mon, Aug 17, 2020 at 7:23 PM Bruce Momjian <bruce@momjian.us> wrote:
>>>
>>> On Mon, Aug 17, 2020 at 04:55:13PM +0100, Dave Page wrote:
>> I wasn't aware that ICU had the concept of collation versions internally (which Michael seems to have confirmed
downthread).That would potentially make the number of users needing a reindex even smaller, but as you point out won't
helpus for years as we don't store it anyway. 
>
> It does -- and we track it in pg_collation at this point.
>
> I think the part that Michael is referring to is we don't track enough details on a per-index basis. The suggested
changes(in the separate thread) are to get rid of it from pg_collation and move it to a per-object dependency. 
>
> (And fwiw contains a patch to pg_upgrade to at least give it the ability to for all old indexes say "i know that my
icuis compatible". But yeah, the full functionality won't be available until upgrading *from* 14) 

Indeed, when upgrading from something older than 14, all indexes would
be marked as depending on an unknown collation version as in possibly
corrupted.



Re: EDB builds Postgres 13 with an obsolete ICU version

From
Bruce Momjian
Date:
On Tue, Aug 18, 2020 at 11:38:38AM +0200, Magnus Hagander wrote:
> On Tue, Aug 18, 2020 at 11:24 AM Dave Page <dpage@pgadmin.org> wrote:
>     Not necessarily. It's likely that not all indexes use ICU collations, and
>     you still save time loading what may be large amounts of data.
> 
>     I agree though, that it *could* be slow.
> 
> I agree it definitely could, but I'm not sure I see any case where it would
> actually be slower than the alternative (which would be dump/reload).

Well, given that pg_upgrade is more complex to run than pg_dump/reload,
you then have to weigh the complexity of using pg_upgrade with index
rebuild vs. the simpler pg_dump.  Right now, you know pg_upgrade in link
mode is going to be fast, but with the reindex, you have a much more
complex decision to make.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee