Thread: Pg_upgrade and collation
The attached patch documents that pg_upgrade requires old/new servers to use compatibile collation library versions as well. I would like to apply this to all PG branches. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Attachment
Bruce Momjian wrote: > The attached patch documents that pg_upgrade requires old/new servers to > use compatibile collation library versions as well. I think this is way too thin to be helpful: > --- 61,68 ---- > checking for compatible compile-time settings, including 32/64-bit > binaries. It is important that > any external modules are also binary compatible, though this cannot > ! be checked by <application>pg_upgrade</>. Compatible collation > ! library versions must also be used. > </para> I think it would be useful to indicate what to do if they are not compatible. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 17, 2016 at 05:51:54PM -0400, Alvaro Herrera wrote: > Bruce Momjian wrote: > > The attached patch documents that pg_upgrade requires old/new servers to > > use compatibile collation library versions as well. > > I think this is way too thin to be helpful: Well, this is a much larger issue than pg_upgrade, e.g. moving a data directory from one cluster to another with a different collation library version could also cause problems, and I don't know that is documented at all. If we want to go larger, we have to do this in a more central location. > > > --- 61,68 ---- > > checking for compatible compile-time settings, including 32/64-bit > > binaries. It is important that > > any external modules are also binary compatible, though this cannot > > ! be checked by <application>pg_upgrade</>. Compatible collation > > ! library versions must also be used. > > </para> > > I think it would be useful to indicate what to do if they are not > compatible. The indexes don't work reliably. We don't document what happens if shared objects don't match either, but again, if we want to clarify this, we need to do it more centrally. Ideas? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Fri, Jun 17, 2016 at 06:01:59PM -0400, Bruce Momjian wrote: > On Fri, Jun 17, 2016 at 05:51:54PM -0400, Alvaro Herrera wrote: > > Bruce Momjian wrote: > > > The attached patch documents that pg_upgrade requires old/new servers to > > > use compatibile collation library versions as well. > > > > I think this is way too thin to be helpful: > > Well, this is a much larger issue than pg_upgrade, e.g. moving a data > directory from one cluster to another with a different collation library > version could also cause problems, and I don't know that is documented > at all. > > If we want to go larger, we have to do this in a more central location. Frankly, pg_upgrade is, by definition, upgrading on the same server, so I don't even see how they could have mismatched collation library versions, but it seemed good to document it. The larger issue of moving clusters is a separate issue that needs documentation somewhere else. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce Momjian wrote: > On Fri, Jun 17, 2016 at 06:01:59PM -0400, Bruce Momjian wrote: > > On Fri, Jun 17, 2016 at 05:51:54PM -0400, Alvaro Herrera wrote: > > > Bruce Momjian wrote: > > > > The attached patch documents that pg_upgrade requires old/new servers to > > > > use compatibile collation library versions as well. > > > > > > I think this is way too thin to be helpful: > > > > Well, this is a much larger issue than pg_upgrade, e.g. moving a data > > directory from one cluster to another with a different collation library > > version could also cause problems, and I don't know that is documented > > at all. > > > > If we want to go larger, we have to do this in a more central location. > > Frankly, pg_upgrade is, by definition, upgrading on the same server, so > I don't even see how they could have mismatched collation library > versions, but it seemed good to document it. By this argument, the proposed patch seems pointless to me. > The larger issue of moving clusters is a separate issue that needs > documentation somewhere else. Sure. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 17, 2016 at 06:11:58PM -0400, Alvaro Herrera wrote: > > Frankly, pg_upgrade is, by definition, upgrading on the same server, so > > I don't even see how they could have mismatched collation library > > versions, but it seemed good to document it. > > By this argument, the proposed patch seems pointless to me. > > > The larger issue of moving clusters is a separate issue that needs > > documentation somewhere else. > > Sure. In looking at the docs, it seems it would go in the Backup section somewhere: https://www.postgresql.org/docs/9.6/static/backup.html Seems it would apply to both of these backup sections: 24.2. File System Level Backup 24.3. Continuous Archiving and Point-in-Time Recovery (PITR) and also here: 25.2. Log-Shipping Standby Servers It seems odd to put it in all of these places, but where can we centrally put it? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Mon, Jun 20, 2016 at 11:16:36AM -0400, Bruce Momjian wrote: > In looking at the docs, it seems it would go in the Backup section > somewhere: > > https://www.postgresql.org/docs/9.6/static/backup.html > > Seems it would apply to both of these backup sections: > > 24.2. File System Level Backup > 24.3. Continuous Archiving and Point-in-Time Recovery (PITR) > > and also here: > > 25.2. Log-Shipping Standby Servers > > It seems odd to put it in all of these places, but where can we > centrally put it? In looking at the docs, I found that the section "Creating a Database Cluster", which covers initdb and collations, as the best place to put this warning. Patch attached. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Attachment
On Fri, Jun 17, 2016 at 2:51 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I think this is way too thin to be helpful: > >> --- 61,68 ---- >> checking for compatible compile-time settings, including 32/64-bit >> binaries. It is important that >> any external modules are also binary compatible, though this cannot >> ! be checked by <application>pg_upgrade</>. Compatible collation >> ! library versions must also be used. >> </para> Unfortunately, the reality is that as things stand, there is no way to test compatibility on all platforms. Glibc does have a notion of collation versioning, though [1]. I have long advocated adopting ICU as our defacto standard "collation provider", primarily so that we can directly control collations and collation versioning. I think that doing this would solve many problems. Besides, even SQLite has optional ICU support. PostgreSQL is the only major database system that I'm aware of that relies on operating system collations exclusively. I've avoided committing to work on it because I'm concerned that it would not be well received. [1] https://www.gnu.org/software/autoconf/manual/autoconf-2.63/html_node/Special-Shell-Variables.html -- Peter Geoghegan
On Tue, Jun 28, 2016 at 02:58:58PM -0700, Peter Geoghegan wrote: > On Fri, Jun 17, 2016 at 2:51 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > I think this is way too thin to be helpful: > > > >> --- 61,68 ---- > >> checking for compatible compile-time settings, including 32/64-bit > >> binaries. It is important that > >> any external modules are also binary compatible, though this cannot > >> ! be checked by <application>pg_upgrade</>. Compatible collation > >> ! library versions must also be used. > >> </para> > > Unfortunately, the reality is that as things stand, there is no way to > test compatibility on all platforms. Glibc does have a notion of > collation versioning, though [1]. Yes, the patch text is clearly weasel-words in that we can't explain how to detect incompatible. > I have long advocated adopting ICU as our defacto standard "collation > provider", primarily so that we can directly control collations and > collation versioning. I think that doing this would solve many > problems. Besides, even SQLite has optional ICU support. PostgreSQL is > the only major database system that I'm aware of that relies on > operating system collations exclusively. I am hopeful ICU has improved enough since we last researched that support for it will soon be added. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Tue, Jun 28, 2016 at 3:20 PM, Bruce Momjian <bruce@momjian.us> wrote: >> I have long advocated adopting ICU as our defacto standard "collation >> provider", primarily so that we can directly control collations and >> collation versioning. I think that doing this would solve many >> problems. Besides, even SQLite has optional ICU support. PostgreSQL is >> the only major database system that I'm aware of that relies on >> operating system collations exclusively. > > I am hopeful ICU has improved enough since we last researched that > support for it will soon be added. There is a patch available that is not ready to be submitted, and doesn't have a real advocate, but is at least enough to convince me that it's very doable. Performance is certainly no impediment to adopting ICU, even without considering that it effectively re-introduces abbreviated keys for text when the C collation is not used. The best argument for ICU is the evidently lax attitude that the glibc people have towards the correctness and consistency of their collations: https://bugzilla.redhat.com/show_bug.cgi?id=1320356#c3 Here, Carlos O'Donnell, a glic committer, says "Regarding (b), the collations in glibc may change from build to build depending on changes in the algorithms or locales. You cannot rely on the collation stay the same once the process exits (nor can you rely upon it via a shared memory mapping to another process sorting strings in memory)". Frankly, we have no excuse for not heeding his warning. I'm not annoyed at the glibc people for taking this position. There is, quite simply, a misalignment of incentives. For the glibc people, the assumption is that any problem with collations leads only to slight annoyance from end users, as when the GUI produces subtly wrong ordering. Whereas, for us, any inconsistency is an extremely serious problem. Here we have the maintainers of glibc telling us that they feel like it's okay that that can happen at any time. Surely that isn't good enough. ICU as a project has every incentive to see things the same way as we do. The library explicitly decouples collation rule versions from algorithm versions. All of this is carefully considered, for the benefit of the numerous major database systems that use ICU. -- Peter Geoghegan
Peter Geoghegan wrote: > The best argument for ICU is the evidently lax attitude that the glibc > people have towards the correctness and consistency of their > collations: > > https://bugzilla.redhat.com/show_bug.cgi?id=1320356#c3 > > Here, Carlos O'Donnell, a glic committer, says "Regarding (b), the > collations in glibc may change from build to build depending on > changes in the algorithms or locales. You cannot rely on the collation > stay the same once the process exits (nor can you rely upon it via a > shared memory mapping to another process sorting strings in memory)". > Frankly, we have no excuse for not heeding his warning. > > I'm not annoyed at the glibc people for taking this position. There > is, quite simply, a misalignment of incentives. For the glibc people, > the assumption is that any problem with collations leads only to > slight annoyance from end users, as when the GUI produces subtly wrong > ordering. Whereas, for us, any inconsistency is an extremely serious > problem. Here we have the maintainers of glibc telling us that they > feel like it's okay that that can happen at any time. Surely that > isn't good enough. Uhmm. Until now I saw all this ICU thing as having fringe benefit on strange platforms only, but it is seeming more and more like we need to take it seriously. I'm not prepared to spend effort on it myself, though. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jun 28, 2016 at 3:50 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Uhmm. Until now I saw all this ICU thing as having fringe benefit on > strange platforms only, but it is seeming more and more like we need to > take it seriously. I'm not prepared to spend effort on it myself, > though. Let me put it this way: If we lived in a world where internationalization was a new idea, and someone proposed collation support that relied on the OS today, the patch would be rejected in about 2 minutes. The author would be pointed in the direction of "Notes to Operator Class Implementors" within the nbtree README. There are numerous user-visible benefits to ICU support, too, like: * Case-insensitive collations become possible (with work in other areas). No more contrib/citext hack. This is something that we seem to want to work towards. * Abbreviated keys in indexes with collated text becomes possible. (Already mentioned that abbreviated keys for collated text + sorting are effectively reintroduced.) * More useful collations available for certain languages, such as Japanese. Apparently, the JIS X 4061 algorithm produces results that Japanese people find more useful, but glibc doesn't support it, and never will. * We might be able to document WAL compatibility usefully, now. The documentation never gets around to explaining what two instances are compatible for the purposes of physical replication. I can't think of any other factor that prevents us from locking that down. * Upgrade major OS versions without difficulty. * User-defined collations, where you can mix and match certain facets of how text is sorted as you please. Basically, ICU offers rich functionality that we can bubble up to our users without too much effort, as other database systems have. -- Peter Geoghegan
On Tue, Jun 28, 2016 at 05:21:51PM -0400, Bruce Momjian wrote: > On Mon, Jun 20, 2016 at 11:16:36AM -0400, Bruce Momjian wrote: > > In looking at the docs, it seems it would go in the Backup section > > somewhere: > > > > https://www.postgresql.org/docs/9.6/static/backup.html > > > > Seems it would apply to both of these backup sections: > > > > 24.2. File System Level Backup > > 24.3. Continuous Archiving and Point-in-Time Recovery (PITR) > > > > and also here: > > > > 25.2. Log-Shipping Standby Servers > > > > It seems odd to put it in all of these places, but where can we > > centrally put it? > > In looking at the docs, I found that the section "Creating a Database > Cluster", which covers initdb and collations, as the best place to put > this warning. Patch attached. Patch applied and backpatched. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On 6/28/16 5:58 PM, Peter Geoghegan wrote: > I have long advocated adopting ICU as our defacto standard "collation > provider", primarily so that we can directly control collations and > collation versioning. I think that doing this would solve many > problems. I plan to submit a patch for ICU support for September. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Jul 9, 2016 at 7:02 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 6/28/16 5:58 PM, Peter Geoghegan wrote: >> >> I have long advocated adopting ICU as our defacto standard "collation >> provider", primarily so that we can directly control collations and >> collation versioning. I think that doing this would solve many >> problems. > > > I plan to submit a patch for ICU support for September. That's fantastic news! Your knowledge of packaging will be useful here. I will review your patch. -- Peter Geoghegan