Thread: Speeding up pg_upgrade
As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about zero-downtime upgrades. After the usual discussion of using logical replication, Slony, and perhaps having the server be able to read old and new system catalogs, we discussed speeding up pg_upgrade. There are clusters that take a long time to dump the schema from the old cluster and recreate it in the new cluster. One idea of speeding up pg_upgrade would be to allow pg_upgrade to be run in two stages: 1. prevent system catalog changes while the old cluster is running, and dump the old cluster's schema and restore it in the new cluster 2. shut down the old cluster and copy/link the data files My question is whether the schema dump/restore is time-consuming enough to warrant this optional more complex API, and whether people would support adding a server setting that prevented all system table changes? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce, * Bruce Momjian (bruce@momjian.us) wrote: > As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about > zero-downtime upgrades. After the usual discussion of using logical > replication, Slony, and perhaps having the server be able to read old > and new system catalogs, we discussed speeding up pg_upgrade. Sounds familiar. > There are clusters that take a long time to dump the schema from the old > cluster and recreate it in the new cluster. One idea of speeding up > pg_upgrade would be to allow pg_upgrade to be run in two stages: > > 1. prevent system catalog changes while the old cluster is running, and > dump the old cluster's schema and restore it in the new cluster > > 2. shut down the old cluster and copy/link the data files Perhaps a bit more complicated, but couldn't we copy/link while the old cluster is online and in backup mode, finish backup mode, shut down the old cluster, and then play forward the WAL to catch any relation extents being added or similar, and then flip to the new PG version? > My question is whether the schema dump/restore is time-consuming enough > to warrant this optional more complex API, and whether people would > support adding a server setting that prevented all system table changes? When you say 'system table changes', you're referring to basically all DDL, right? Just wish to clarify as there might be some confusion between the terminology you're using here and allow_system_table_mods. Would we need to have autovacuum shut down too..? The other concern is if there's changes made to the catalogs by non-DDL activity that needs to be addressed too (logical replication?); nothing definite springs to mind off-hand for me, but perhaps others will think of things. Thanks! Stephen
Hi
--
On Tue, Dec 5, 2017 at 11:01 PM, Bruce Momjian <bruce@momjian.us> wrote:
As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
zero-downtime upgrades. After the usual discussion of using logical
replication, Slony, and perhaps having the server be able to read old
and new system catalogs, we discussed speeding up pg_upgrade.
There are clusters that take a long time to dump the schema from the old
cluster and recreate it in the new cluster. One idea of speeding up
pg_upgrade would be to allow pg_upgrade to be run in two stages:
1. prevent system catalog changes while the old cluster is running, and
dump the old cluster's schema and restore it in the new cluster
2. shut down the old cluster and copy/link the data files
When we were discussing this, I was thinking that the linking could be done in phase 1 too, as that's potentially slow on a very large schema.
My question is whether the schema dump/restore is time-consuming enough
to warrant this optional more complex API, and whether people would
support adding a server setting that prevented all system table changes?
I've certainly heard of cases where pg_upgrade takes significant amounts of time to run on very complex databases.
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Dec 5, 2017 at 09:16:02AM -0500, Stephen Frost wrote: > Bruce, > > * Bruce Momjian (bruce@momjian.us) wrote: > > As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about > > zero-downtime upgrades. After the usual discussion of using logical > > replication, Slony, and perhaps having the server be able to read old > > and new system catalogs, we discussed speeding up pg_upgrade. > > Sounds familiar. Yeah. :-| > > There are clusters that take a long time to dump the schema from the old > > cluster and recreate it in the new cluster. One idea of speeding up > > pg_upgrade would be to allow pg_upgrade to be run in two stages: > > > > 1. prevent system catalog changes while the old cluster is running, and > > dump the old cluster's schema and restore it in the new cluster > > > > 2. shut down the old cluster and copy/link the data files > > Perhaps a bit more complicated, but couldn't we copy/link while the > old cluster is online and in backup mode, finish backup mode, shut down > the old cluster, and then play forward the WAL to catch any relation > extents being added or similar, and then flip to the new PG version? Well, that would require reading the old WAL, which would add an additional compibility requirement that seems unwise. > > My question is whether the schema dump/restore is time-consuming enough > > to warrant this optional more complex API, and whether people would > > support adding a server setting that prevented all system table changes? > > When you say 'system table changes', you're referring to basically all > DDL, right? Just wish to clarify as there might be some confusion > between the terminology you're using here and allow_system_table_mods. Not only all DDL, but even updating them for the internal stuff, like pg_class.relfrozenxid. > Would we need to have autovacuum shut down too..? Yes. > The other concern is if there's changes made to the catalogs by non-DDL > activity that needs to be addressed too (logical replication?); nothing > definite springs to mind off-hand for me, but perhaps others will think > of things. Yes, it could extend to many parts of the system, which is why I am asking if it is worth it. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Tue, Dec 5, 2017 at 11:16:26PM +0900, Dave Page wrote: > Hi > > On Tue, Dec 5, 2017 at 11:01 PM, Bruce Momjian <bruce@momjian.us> wrote: > > As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about > zero-downtime upgrades. After the usual discussion of using logical > replication, Slony, and perhaps having the server be able to read old > and new system catalogs, we discussed speeding up pg_upgrade. > > There are clusters that take a long time to dump the schema from the old > cluster and recreate it in the new cluster. One idea of speeding up > pg_upgrade would be to allow pg_upgrade to be run in two stages: > > 1. prevent system catalog changes while the old cluster is running, and > dump the old cluster's schema and restore it in the new cluster > > 2. shut down the old cluster and copy/link the data files > > > When we were discussing this, I was thinking that the linking could be done in > phase 1 too, as that's potentially slow on a very large schema. Uh, good point! You can create the hard links while system system is running, no problem! It would only be copy that can't be done while the system is running. Of course a big question is whether hard linking takes any measurable time. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Dave, * Dave Page (dpage@pgadmin.org) wrote: > On Tue, Dec 5, 2017 at 11:01 PM, Bruce Momjian <bruce@momjian.us> wrote: > > As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about > > zero-downtime upgrades. After the usual discussion of using logical > > replication, Slony, and perhaps having the server be able to read old > > and new system catalogs, we discussed speeding up pg_upgrade. > > > > There are clusters that take a long time to dump the schema from the old > > cluster and recreate it in the new cluster. One idea of speeding up > > pg_upgrade would be to allow pg_upgrade to be run in two stages: > > > > 1. prevent system catalog changes while the old cluster is running, and > > dump the old cluster's schema and restore it in the new cluster > > > > 2. shut down the old cluster and copy/link the data files > > When we were discussing this, I was thinking that the linking could be done > in phase 1 too, as that's potentially slow on a very large schema. Right, I had that thought too when first reading this, but the problem there is that new files can show up due to a relation being extended (at least, and perhaps in other cases too..). > > My question is whether the schema dump/restore is time-consuming enough > > to warrant this optional more complex API, and whether people would > > support adding a server setting that prevented all system table changes? > > I've certainly heard of cases where pg_upgrade takes significant amounts of > time to run on very complex databases. Right, but that doesn't really answer the question as to which part of the pg_upgrade process is taking up the time. In any case, of course, if we're able to move part of what pg_upgrade does to be while the old server is online then that takes whatever the cost of that is out of the downtime window. The question is if that's a 5% improvement in the overall performance of pg_upgrade or a 70% one. This will be case-by-case, of course, but if, in the best-case, we only get a 5% improvement then this might not be worth the risk. Thanks! Stephen
On Tue, Dec 5, 2017 at 09:23:49AM -0500, Stephen Frost wrote: > Dave, > > * Dave Page (dpage@pgadmin.org) wrote: > > On Tue, Dec 5, 2017 at 11:01 PM, Bruce Momjian <bruce@momjian.us> wrote: > > > As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about > > > zero-downtime upgrades. After the usual discussion of using logical > > > replication, Slony, and perhaps having the server be able to read old > > > and new system catalogs, we discussed speeding up pg_upgrade. > > > > > > There are clusters that take a long time to dump the schema from the old > > > cluster and recreate it in the new cluster. One idea of speeding up > > > pg_upgrade would be to allow pg_upgrade to be run in two stages: > > > > > > 1. prevent system catalog changes while the old cluster is running, and > > > dump the old cluster's schema and restore it in the new cluster > > > > > > 2. shut down the old cluster and copy/link the data files > > > > When we were discussing this, I was thinking that the linking could be done > > in phase 1 too, as that's potentially slow on a very large schema. > > Right, I had that thought too when first reading this, but the problem > there is that new files can show up due to a relation being extended (at > least, and perhaps in other cases too..). Oh, yikes, yes. > > > My question is whether the schema dump/restore is time-consuming enough > > > to warrant this optional more complex API, and whether people would > > > support adding a server setting that prevented all system table changes? > > > > I've certainly heard of cases where pg_upgrade takes significant amounts of > > time to run on very complex databases. > > Right, but that doesn't really answer the question as to which part of > the pg_upgrade process is taking up the time. > > In any case, of course, if we're able to move part of what pg_upgrade > does to be while the old server is online then that takes whatever the > cost of that is out of the downtime window. The question is if that's a > 5% improvement in the overall performance of pg_upgrade or a 70% one. > This will be case-by-case, of course, but if, in the best-case, we only > get a 5% improvement then this might not be worth the risk. Yes, and who is going to know if they have a setup where the more complex API is worth it? pg_upgrade is already complex enough to use. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce, * Bruce Momjian (bruce@momjian.us) wrote: > On Tue, Dec 5, 2017 at 09:16:02AM -0500, Stephen Frost wrote: > > > There are clusters that take a long time to dump the schema from the old > > > cluster and recreate it in the new cluster. One idea of speeding up > > > pg_upgrade would be to allow pg_upgrade to be run in two stages: > > > > > > 1. prevent system catalog changes while the old cluster is running, and > > > dump the old cluster's schema and restore it in the new cluster > > > > > > 2. shut down the old cluster and copy/link the data files > > > > Perhaps a bit more complicated, but couldn't we copy/link while the > > old cluster is online and in backup mode, finish backup mode, shut down > > the old cluster, and then play forward the WAL to catch any relation > > extents being added or similar, and then flip to the new PG version? > > Well, that would require reading the old WAL, which would add an > additional compibility requirement that seems unwise. In my proposal, this would be the old version of PG reading the old WAL. Thinking about it a bit further though, I'm not sure it'd end up working in link mode anyway, due to post-backup-finish changes that could be made by the old server on the data files before it's shut down. We have to have a way of dealing with the delta between the hard link trees after the old server is shut down though because there could be new relation extents, at least. > > > My question is whether the schema dump/restore is time-consuming enough > > > to warrant this optional more complex API, and whether people would > > > support adding a server setting that prevented all system table changes? > > > > When you say 'system table changes', you're referring to basically all > > DDL, right? Just wish to clarify as there might be some confusion > > between the terminology you're using here and allow_system_table_mods. > > Not only all DDL, but even updating them for the internal stuff, like > pg_class.relfrozenxid. Good point. We'd really need a pretty bullet-proof way to ensure that the catalog isn't changed during this time period and that seems like it might be difficult without a lot of work. > > Would we need to have autovacuum shut down too..? > > Yes. Ok, makes sense. > > The other concern is if there's changes made to the catalogs by non-DDL > > activity that needs to be addressed too (logical replication?); nothing > > definite springs to mind off-hand for me, but perhaps others will think > > of things. > > Yes, it could extend to many parts of the system, which is why I am > asking if it is worth it. My initial reaction is that it's worth it, but then I also wonder about other issues (having to get an ANALYZE done on the new cluster before opening it up, for example..) and it makes me wonder if perhaps it'll end up being too much risk for too little gain. Thanks! Stephen
Bruce, * Bruce Momjian (bruce@momjian.us) wrote: > > In any case, of course, if we're able to move part of what pg_upgrade > > does to be while the old server is online then that takes whatever the > > cost of that is out of the downtime window. The question is if that's a > > 5% improvement in the overall performance of pg_upgrade or a 70% one. > > This will be case-by-case, of course, but if, in the best-case, we only > > get a 5% improvement then this might not be worth the risk. > > Yes, and who is going to know if they have a setup where the more > complex API is worth it? pg_upgrade is already complex enough to use. Sure, but the solution there is really to make pg_upgrade simpler to use, even as we add these more complicated APIs to it. What that likely means in practical terms is that we have another utility, which uses pg_upgrade underneath, that you're able to configure to know about your existing cluster and the version of PG you want to upgrade to and where you want it and if you want a copy or if hard-links are ok, etc. Having such a tool is actually what I'd been hoping would come out of the documented process for doing a "pg_upgrade" on replicas that's currently in our documentation. That's not happened yet, but it's something that David Steele and I have been chatting about because the procedure in the documentation is terribly difficult and dangerous for those who aren't as familiar with the system. Perhaps we could have one tool that handles both the more complicated pg_upgrade API and deals with upgrading replicas. Alternatively, we could have a config file for pg_upgrade instead which might be a simpler way for people to describe exactly their current configuration and what they'd like to go to. Upgrading replicas involves using something like SSH though.. Thanks! Stephen
On 12/5/17 09:23, Stephen Frost wrote: > Right, but that doesn't really answer the question as to which part of > the pg_upgrade process is taking up the time. Yeah, that should be measured before we do anything. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Dec 5, 2017 at 09:30:53AM -0500, Stephen Frost wrote: > > > The other concern is if there's changes made to the catalogs by non-DDL > > > activity that needs to be addressed too (logical replication?); nothing > > > definite springs to mind off-hand for me, but perhaps others will think > > > of things. > > > > Yes, it could extend to many parts of the system, which is why I am > > asking if it is worth it. > > My initial reaction is that it's worth it, but then I also wonder about > other issues (having to get an ANALYZE done on the new cluster before > opening it up, for example..) and it makes me wonder if perhaps it'll > end up being too much risk for too little gain. Yes, dump/reload of analyze statistics seems like a better use of time. I have avoided it since it locks us into supporting the text respresentation of data type, but at this point it might be worth it. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Hi,
Yes, dump/reload of analyze statistics seems like a better use of time.
I have avoided it since it locks us into supporting the text
respresentation of data type, but at this point it might be worth it.
Couple of months ago we at Zalando upgraded a few databases of different sizes to 9.6.
During preparations to the I've found 2.5 pain-points:1. We are using schema-based api deployment. Basically ~every week we create a new schema in the database and hundreds of stored procedures in it.
Off course we remove old API schemas and trying not to keep more than last 10. Before the upgrade we basically dropped all API schemas except the one used in production.
And even in this case dump-restore phase was taking much more time than relinking of datafiles.
Unfortunately I don't have any numbers right now, but usually run of pg_upgrade was taking about 30-35 seconds, and about 2/3 of the time was spend in dump-restore.
2 ANALYZE phase is a pain. I think everybody agrees with it.
2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes reasonable, except one case: some of the columns might have non default statistics target.
It breaks `vacuumdb --analyze-in-stages`, because those specific columns it will not use value of default_statistics_target provided by vacuumdb.
What I did - reset those non default values right before running pg_upgrade and restored them only when analyze was completed. Off course after that I've re-analyze those columns.
Regards,
Alexander Kukushkin
Alexander, * Alexander Kukushkin (cyberdemn@gmail.com) wrote: > Couple of months ago we at Zalando upgraded a few databases of different > sizes to 9.6. Thanks for sharing your experience! > During preparations to the I've found 2.5 pain-points: > > 1. We are using schema-based api deployment. Basically ~every week we > create a new schema in the database and hundreds of stored procedures in it. > Off course we remove old API schemas and trying not to keep more than > last 10. Before the upgrade we basically dropped all API schemas except the > one used in production. > And even in this case dump-restore phase was taking much more time than > relinking of datafiles. > Unfortunately I don't have any numbers right now, but usually run of > pg_upgrade was taking about 30-35 seconds, and about 2/3 of the time was > spend in dump-restore. Ok, so eliminating 2/3 of the time would mean bringing it down to more like 10 seconds. That certainly seems worthwhile to me. With the linking time being much less than the dump/restore, we could at least consider moving forward with Bruce's original idea where we do the dump/restore while the system is online but then the linking with it offline and get a serious performance boost out of it. That also avoids the issue with new files showing up while the system is running that I brought up when we were talking about having the linking done with the system online. > 2 ANALYZE phase is a pain. I think everybody agrees with it. > > 2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes > reasonable, except one case: some of the columns might have non default > statistics target. Ok, if the stage-1 is very fast and performance is reasonable enough after that then perhaps it's not so bad to keep it as-is for now and focus on the dump/restore time. That said, we should certainly also work on improving this too. > It breaks `vacuumdb --analyze-in-stages`, because those specific > columns it will not use value of default_statistics_target provided by > vacuumdb. > What I did - reset those non default values right before running > pg_upgrade and restored them only when analyze was completed. Off course > after that I've re-analyze those columns. Ah, yeah, ouch, that's unfortuante.. I wonder if there's something we could do there to fix it.. Thanks! Stephen
Attachment
Stephen Frost wrote: > Alexander, > * Alexander Kukushkin (cyberdemn@gmail.com) wrote: > > 2 ANALYZE phase is a pain. I think everybody agrees with it. > > > > 2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes > > reasonable, except one case: some of the columns might have non default > > statistics target. > > Ok, if the stage-1 is very fast and performance is reasonable enough > after that then perhaps it's not so bad to keep it as-is for now and > focus on the dump/restore time. That said, we should certainly also > work on improving this too. It seems pretty clear to me that we should somehow transfer stats from the old server to the new one. Shouldn't it just be a matter of serializing the MCV/histogram/ndistinct values, then have capabilities to load on the new server? I suppose it'd just be used during binary upgrade, but the point seems painful enough for a lot of users. Obviously it would not be the raw contents of pg_statistic{,_ext}, but rather something a bit higher-level. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro, * Alvaro Herrera (alvherre@alvh.no-ip.org) wrote: > Stephen Frost wrote: > > * Alexander Kukushkin (cyberdemn@gmail.com) wrote: > > > > 2 ANALYZE phase is a pain. I think everybody agrees with it. > > > > > > 2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes > > > reasonable, except one case: some of the columns might have non default > > > statistics target. > > > > Ok, if the stage-1 is very fast and performance is reasonable enough > > after that then perhaps it's not so bad to keep it as-is for now and > > focus on the dump/restore time. That said, we should certainly also > > work on improving this too. > > It seems pretty clear to me that we should somehow transfer stats from > the old server to the new one. Shouldn't it just be a matter of > serializing the MCV/histogram/ndistinct values, then have capabilities > to load on the new server? I suppose it'd just be used during binary > upgrade, but the point seems painful enough for a lot of users. > Obviously it would not be the raw contents of pg_statistic{,_ext}, but > rather something a bit higher-level. Right, I think that's what Bruce was getting at and certainly makes sense to me as well. I agree that it's a definite pain point for people. One complication is going to be custom data types, of course.. Thanks! Stephen
Attachment
On Tue, Dec 05, 2017 at 09:01:35AM -0500, Bruce Momjian wrote: > As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about > zero-downtime upgrades. ... we discussed speeding up pg_upgrade. > > There are clusters that take a long time to dump the schema from the old > cluster Maybe it isn't representative of a typical case, but I can offer a data point: For us, we have ~40 customers with DBs ranging in size from <100GB to ~25TB (for which ~90% is on a ZFS tablespace with compression). We have what's traditionally considered to be an excessive number of child tables, which works okay since planning time is unimportant to us for the report queries which hit them. Some of the tables are wide (historically up to 1600 columns). Some of those have default values on nearly every column, and pg_attrdef was large (>500MB), causing pg_dump --section pre-data to be slow (10+ minutes). Since something similar is run by pg_upgrade, I worked around the issue for now by dropping defaults on the historic children in advance of upgrades (at some point I'll figure out what I have to do to allow DROPing DEFAULTs). It's not the first time we've seen an issue with larger number of children*columns. Our slowest pg-upgrade was ~40min, caused by column defaults in a case where I failed to re-DROP DEFAULTs after our first scheduled upgrade date was pushed back by over a month. Most of the rest were completed in less than 15min. Justin
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > It seems pretty clear to me that we should somehow transfer stats from > the old server to the new one. Shouldn't it just be a matter of > serializing the MCV/histogram/ndistinct values, then have capabilities > to load on the new server? The reason pg_upgrade hasn't done that in the past is not wishing to assume that the new version does stats identically to the old version. Since we do in fact add stats or change stuff around from time to time, that's not a negligible consideration. regards, tom lane
On Thu, Dec 7, 2017 at 11:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: >> It seems pretty clear to me that we should somehow transfer stats from >> the old server to the new one. Shouldn't it just be a matter of >> serializing the MCV/histogram/ndistinct values, then have capabilities >> to load on the new server? > > The reason pg_upgrade hasn't done that in the past is not wishing to > assume that the new version does stats identically to the old version. > Since we do in fact add stats or change stuff around from time to time, > that's not a negligible consideration. Yes, but we don't do that for every release. We could put rules into pg_upgrade about which releases changed the stats format incompatibly, and not transfer the stats when crossing between two releases with incompatible formats. That's more than zero effort, of course, but it might be worth it. We've already got CATALOG_VERSION_NO, XLOG_PAGE_MAGIC, PG_CONTROL_VERSION, PG_PROTOCOL_LATEST, BTREE_VERSION, HASH_VERSION, BRIN_CURRENT_VERSION, GIN_CURRENT_VERSION, LOGICALREP_PROTO_VERSION_NUM, PG_PAGE_LAYOUT_VERSION, PG_DATA_CHECKSUM_VERSION, K_VERS_MAJOR, K_VERS_MINOR, K_VERS_REV, and the utterly unused MIGRATOR_API_VERSION. Now, I have to admit that I find the process of trying to remember to bump the correct set of version numbers in every commit just a tad frustrating; it adds a cognitive burden I'd just as well skip. However, the failure to transfer stats over the years seems to have actually caused real problems for many users, so I think in this case we might be best off sucking it up and adding one more version number. We might even want to make it a little more fine-grained and track it separately by data type, but I'm not sure if that's really worth it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert, all, * Robert Haas (robertmhaas@gmail.com) wrote: > On Thu, Dec 7, 2017 at 11:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > >> It seems pretty clear to me that we should somehow transfer stats from > >> the old server to the new one. Shouldn't it just be a matter of > >> serializing the MCV/histogram/ndistinct values, then have capabilities > >> to load on the new server? > > > > The reason pg_upgrade hasn't done that in the past is not wishing to > > assume that the new version does stats identically to the old version. > > Since we do in fact add stats or change stuff around from time to time, > > that's not a negligible consideration. > > Yes, but we don't do that for every release. We could put rules into > pg_upgrade about which releases changed the stats format incompatibly, > and not transfer the stats when crossing between two releases with > incompatible formats. That's more than zero effort, of course, but it > might be worth it. We've already got CATALOG_VERSION_NO, > XLOG_PAGE_MAGIC, PG_CONTROL_VERSION, PG_PROTOCOL_LATEST, > BTREE_VERSION, HASH_VERSION, BRIN_CURRENT_VERSION, > GIN_CURRENT_VERSION, LOGICALREP_PROTO_VERSION_NUM, > PG_PAGE_LAYOUT_VERSION, PG_DATA_CHECKSUM_VERSION, K_VERS_MAJOR, > K_VERS_MINOR, K_VERS_REV, and the utterly unused MIGRATOR_API_VERSION. If we go down that route, since this makes a pretty serious difference in terms of what the user has to deal with post-pg_upgrade, I'd suggest we require an additional option for the user to pass when stats aren't going to be migrated, so they are aware of that. The concern I have hear is that we end up changing things in v13 and suddenly everyone has to re-analyze but they didn't to go from 10->11 or 11->12 and they'll get caught off-guard by it. Of course, this might end up having an entirely different effect: it might mean that we're suddenly a lot shier about changing the stats in a backwards-incompatible way, just as we now are basically stuck with the existing on-disk heap format.. > Now, I have to admit that I find the process of trying to remember to > bump the correct set of version numbers in every commit just a tad > frustrating; it adds a cognitive burden I'd just as well skip. Agreed, would be great if we could improve on this. > However, the failure to transfer stats over the years seems to have > actually caused real problems for many users, so I think in this case > we might be best off sucking it up and adding one more version number. Yes, it's definitely been an issue for users. > We might even want to make it a little more fine-grained and track it > separately by data type, but I'm not sure if that's really worth it. This would have the nice property that we could just re-analyze the data types where things changed, something that's more likely to happen with new data types than existing ones, I'd guess, and so that might be much more reasonable for users. Thanks! Stephen
Attachment
Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > It seems pretty clear to me that we should somehow transfer stats from > > the old server to the new one. Shouldn't it just be a matter of > > serializing the MCV/histogram/ndistinct values, then have capabilities > > to load on the new server? > > The reason pg_upgrade hasn't done that in the past is not wishing to > assume that the new version does stats identically to the old version. > Since we do in fact add stats or change stuff around from time to time, > that's not a negligible consideration. Sure, but the new version can probably limp along with incomplete stats until the next natural ANALYZE runs -- the system is operational in much shorter time than if you have to make it all wait for the post-upgrade full-database ANALYZE run. The serialization step is so that the underlying representation doesn't have to remain identical -- surely the new server would be able to represent whatever the old server was able to, regardless of any improvement made. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Tom Lane wrote: >> The reason pg_upgrade hasn't done that in the past is not wishing to >> assume that the new version does stats identically to the old version. >> Since we do in fact add stats or change stuff around from time to time, >> that's not a negligible consideration. > Sure, but the new version can probably limp along with incomplete stats > until the next natural ANALYZE runs -- the system is operational in much > shorter time than if you have to make it all wait for the post-upgrade > full-database ANALYZE run. The serialization step is so that the > underlying representation doesn't have to remain identical -- surely the > new server would be able to represent whatever the old server was able > to, regardless of any improvement made. Well, this is assuming a lot of stuff not in evidence about how the "serialization format" is designed and how we insert the data in the new installation. But if you think you can come up with something that can handle such cases, go for it. (In the spirit of full disclosure, I actually wrote code that allowed dumping and reloading stats while I was at Salesforce. But I've forgotten the details of that design, and anyway I'm pretty sure it didn't handle any cross-version scenarios, else I probably would have offered it to the community.) regards, tom lane
Stephen Frost <sfrost@snowman.net> writes: > If we go down that route, since this makes a pretty serious difference > in terms of what the user has to deal with post-pg_upgrade, I'd suggest > we require an additional option for the user to pass when stats aren't > going to be migrated, so they are aware of that. -1 ... you are forgetting that a lot of systems wrap pg_upgrade in some sort of vendor-supplied upgrade script. Nanny switches don't help; the vendors will just start passing them automatically. > Of course, this might end up having an entirely different effect: it > might mean that we're suddenly a lot shier about changing the stats in a > backwards-incompatible way, just as we now are basically stuck with the > existing on-disk heap format.. Yeah, there's that. But the rate of change in pg_statistic hasn't been *that* large. Alvaro might be right that we can design some transmission procedure that allows stats to be forward-migrated when compatible and dropped when not. regards, tom lane
Tom, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > If we go down that route, since this makes a pretty serious difference > > in terms of what the user has to deal with post-pg_upgrade, I'd suggest > > we require an additional option for the user to pass when stats aren't > > going to be migrated, so they are aware of that. > > -1 ... you are forgetting that a lot of systems wrap pg_upgrade in some > sort of vendor-supplied upgrade script. Nanny switches don't help; > the vendors will just start passing them automatically. That really depends on the packagers. > > Of course, this might end up having an entirely different effect: it > > might mean that we're suddenly a lot shier about changing the stats in a > > backwards-incompatible way, just as we now are basically stuck with the > > existing on-disk heap format.. > > Yeah, there's that. But the rate of change in pg_statistic hasn't been > *that* large. Alvaro might be right that we can design some transmission > procedure that allows stats to be forward-migrated when compatible and > dropped when not. Well, if it's dropped, I think we need to make sure that users are aware of that going in and that's why I was suggesting a switch. If you've got a better idea for that, great, but having certain pg_upgrade migrations require running ANALYZE and some migrations not require it is something we need to make users *very* clear about. No, I don't think a note in the release notes is really enough.. Thanks! Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> Yeah, there's that. But the rate of change in pg_statistic hasn't been >> *that* large. Alvaro might be right that we can design some transmission >> procedure that allows stats to be forward-migrated when compatible and >> dropped when not. > Well, if it's dropped, I think we need to make sure that users are aware > of that going in and that's why I was suggesting a switch. If you've > got a better idea for that, great, but having certain pg_upgrade > migrations require running ANALYZE and some migrations not require it is > something we need to make users *very* clear about. No, I don't think a > note in the release notes is really enough.. Seems like we could make this reasonably transparent if pg_upgrade continues to emit an analyze script that you're supposed to run afterwards. It just has to vary how much that script does. regards, tom lane
If you've
got a better idea for that, great, but having certain pg_upgrade
migrations require running ANALYZE and some migrations not require it is
something we need to make users *very* clear about. No, I don't think a
note in the release notes is really enough..
Ideally this would be covered by:
"""
13. Post-Upgrade processing
If any post-upgrade processing is required, pg_upgrade will issue warnings as it completes. It will also generate script files that must be run by the administrator. The script files will connect to each database that needs post-upgrade processing. Each script should be run using: ...
"""
Then whether is instructed as per #14 would be conditional.
Its arguable whether a warning/notice that "ANALYZE doesn't need to be run" message would be useful.
David J.
Tom, David, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > Well, if it's dropped, I think we need to make sure that users are aware > > of that going in and that's why I was suggesting a switch. If you've > > got a better idea for that, great, but having certain pg_upgrade > > migrations require running ANALYZE and some migrations not require it is > > something we need to make users *very* clear about. No, I don't think a > > note in the release notes is really enough.. > > Seems like we could make this reasonably transparent if pg_upgrade > continues to emit an analyze script that you're supposed to run > afterwards. It just has to vary how much that script does. That's post-upgrade though, meaning it's entirely likely to catch someone by surprise. That's the issue that I have with it. That said, I'm remembering that there's a '--check' option to pg_upgrade. Let's make sure to include in that --check output what additional post-upgrade work will have to happen, that'll work well enough to deal with this issue and I have little sympathy for people who run pg_upgrade blindly and without running --check first, given that there's an option for it. * David G. Johnston (david.g.johnston@gmail.com) wrote: > On Thu, Dec 7, 2017 at 12:04 PM, Stephen Frost <sfrost@snowman.net> wrote: > > If you've > > got a better idea for that, great, but having certain pg_upgrade > > migrations require running ANALYZE and some migrations not require it is > > something we need to make users *very* clear about. No, I don't think a > > note in the release notes is really enough.. > > Ideally this would be covered by: > > """ > 13. Post-Upgrade processing > > If any post-upgrade processing is required, pg_upgrade will issue warnings > as it completes. It will also generate script files that must be run by the > administrator. The script files will connect to each database that needs > post-upgrade processing. Each script should be run using: ... > """ > > https://www.postgresql.org/docs/10/static/pgupgrade.html > > Then whether is instructed as per #14 would be conditional. > > Its arguable whether a warning/notice that "ANALYZE doesn't need to be run" > message would be useful. I'm fine with updating the docs to make it conditional but our users need an easy way to figure out if their particular from/to PG versions require ANALYZE to be run or not, so they're able to prepare for it. Having that be included in the --check output would work to handle that nicely. Also, really, we should update those steps to have --check run earlier, imv, and before saying "stop the clusters". Thanks! Stephen
Attachment
On Thu, Dec 7, 2017 at 10:37:30AM -0500, Stephen Frost wrote: > Alexander, > > * Alexander Kukushkin (cyberdemn@gmail.com) wrote: > > Couple of months ago we at Zalando upgraded a few databases of different > > sizes to 9.6. > > Thanks for sharing your experience! > > > During preparations to the I've found 2.5 pain-points: > > > > 1. We are using schema-based api deployment. Basically ~every week we > > create a new schema in the database and hundreds of stored procedures in it. > > Off course we remove old API schemas and trying not to keep more than > > last 10. Before the upgrade we basically dropped all API schemas except the > > one used in production. > > And even in this case dump-restore phase was taking much more time than > > relinking of datafiles. > > Unfortunately I don't have any numbers right now, but usually run of > > pg_upgrade was taking about 30-35 seconds, and about 2/3 of the time was > > spend in dump-restore. > > Ok, so eliminating 2/3 of the time would mean bringing it down to more > like 10 seconds. That certainly seems worthwhile to me. With the > linking time being much less than the dump/restore, we could at least > consider moving forward with Bruce's original idea where we do the > dump/restore while the system is online but then the linking with it > offline and get a serious performance boost out of it. That also avoids > the issue with new files showing up while the system is running that I > brought up when we were talking about having the linking done with the > system online. > > > 2 ANALYZE phase is a pain. I think everybody agrees with it. > > > > 2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes > > reasonable, except one case: some of the columns might have non default > > statistics target. > > Ok, if the stage-1 is very fast and performance is reasonable enough > after that then perhaps it's not so bad to keep it as-is for now and > focus on the dump/restore time. That said, we should certainly also > work on improving this too. I think the big problem with two-stage pg_upgrade is that the user steps are more complex, so what percentage of users are going use the two-stage method. The bad news is that only a small percentage of users who will benefit from it will use it, and some who will not benefit it will use it. Also, this is going to require significant server changes, which have to be maintained. I think we need some statistics on how many users are going to benefit from this, and how are users suppose to figure out if they will benefit from it? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
> On Dec 7, 2017, at 10:24 PM, Bruce Momjian <bruce@momjian.us> wrote: > > On Thu, Dec 7, 2017 at 10:37:30AM -0500, Stephen Frost wrote: >> Alexander, >> >> * Alexander Kukushkin (cyberdemn@gmail.com) wrote: >>> Couple of months ago we at Zalando upgraded a few databases of different >>> sizes to 9.6. >> >> Thanks for sharing your experience! >> >>> During preparations to the I've found 2.5 pain-points: >>> >>> 1. We are using schema-based api deployment. Basically ~every week we >>> create a new schema in the database and hundreds of stored procedures in it. >>> Off course we remove old API schemas and trying not to keep more than >>> last 10. Before the upgrade we basically dropped all API schemas except the >>> one used in production. >>> And even in this case dump-restore phase was taking much more time than >>> relinking of datafiles. >>> Unfortunately I don't have any numbers right now, but usually run of >>> pg_upgrade was taking about 30-35 seconds, and about 2/3 of the time was >>> spend in dump-restore. >> >> Ok, so eliminating 2/3 of the time would mean bringing it down to more >> like 10 seconds. That certainly seems worthwhile to me. With the >> linking time being much less than the dump/restore, we could at least >> consider moving forward with Bruce's original idea where we do the >> dump/restore while the system is online but then the linking with it >> offline and get a serious performance boost out of it. That also avoids >> the issue with new files showing up while the system is running that I >> brought up when we were talking about having the linking done with the >> system online. >> >>> 2 ANALYZE phase is a pain. I think everybody agrees with it. >>> >>> 2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes >>> reasonable, except one case: some of the columns might have non default >>> statistics target. >> >> Ok, if the stage-1 is very fast and performance is reasonable enough >> after that then perhaps it's not so bad to keep it as-is for now and >> focus on the dump/restore time. That said, we should certainly also >> work on improving this too. > > I think the big problem with two-stage pg_upgrade is that the user steps > are more complex, so what percentage of users are going use the > two-stage method. The bad news is that only a small percentage of users > who will benefit from it will use it, and some who will not benefit it > will use it. Also, this is going to require significant server changes, > which have to be maintained. In my fork of the project, back when I was tracking 9.5, I added an option to vacuum/analyze to make it behave a bit more like autovac, so that I could run ANALYZE CONDITIONALLY; and it would only analyze those tables in the system which autovac would analyze. In the grammar, CONDITIONALLY gets translated into a VacuumOption flag. In vacuum (in src/backend/commands/vacuum.c), inside the "Loop to process each selected relation", if this flag is set, it checks the PgStat_StatTabEntry for the table to determine whether to vacuum or analyze the table. I think this extension would be helpful in the context of the current conversation. In those cases where pg_upgrade was able to migrate the statistics to the new database, as long as it set the PgStat_StatTabEntry for each table where statistics were migrated, then the user would just have to execute a "VACUUM CONDITIONALLY" after upgrade, and the database would either do a lot of analyze work, a little analyze work, or no analyze work depending on which tables needed analyzing. The main advantage here is that the user would always run this command after pg_upgrade, without having to think about whether pg_upgrade had migrated statistics or not. If the community thinks this is useful, I could put together a patch. mark
Mark, * Mark Dilger (hornschnorter@gmail.com) wrote: > > On Dec 7, 2017, at 10:24 PM, Bruce Momjian <bruce@momjian.us> wrote: > > I think the big problem with two-stage pg_upgrade is that the user steps > > are more complex, so what percentage of users are going use the > > two-stage method. The bad news is that only a small percentage of users > > who will benefit from it will use it, and some who will not benefit it > > will use it. Also, this is going to require significant server changes, > > which have to be maintained. > > In my fork of the project, back when I was tracking 9.5, I added an option > to vacuum/analyze to make it behave a bit more like autovac, so that I could > run > > ANALYZE CONDITIONALLY; > > and it would only analyze those tables in the system which autovac would > analyze. In the grammar, CONDITIONALLY gets translated into a > VacuumOption flag. In vacuum (in src/backend/commands/vacuum.c), inside > the "Loop to process each selected relation", if this flag is set, it checks the > PgStat_StatTabEntry for the table to determine whether to vacuum or analyze > the table. > > I think this extension would be helpful in the context of the current conversation. > In those cases where pg_upgrade was able to migrate the statistics to the > new database, as long as it set the PgStat_StatTabEntry for each table where > statistics were migrated, then the user would just have to execute a > "VACUUM CONDITIONALLY" after upgrade, and the database would either > do a lot of analyze work, a little analyze work, or no analyze work depending > on which tables needed analyzing. > > The main advantage here is that the user would always run this command > after pg_upgrade, without having to think about whether pg_upgrade had > migrated statistics or not. > > If the community thinks this is useful, I could put together a patch. This certainly sounds nice though I have to admit to being a bit skeptical on the keyword selection, but perhaps blue really is the right color for that bike shed. One thing I'd wonder about is if that makes 'CONDITIONALLY' into a reserved keyword, which wouldn't be ideal. Perhaps a bit of a stretch but 'ANALYZE ALL NEEDED' might avoid that? Thanks! Stephen
Attachment
> On Dec 8, 2017, at 9:21 AM, Stephen Frost <sfrost@snowman.net> wrote: > > Mark, > > * Mark Dilger (hornschnorter@gmail.com) wrote: >>> On Dec 7, 2017, at 10:24 PM, Bruce Momjian <bruce@momjian.us> wrote: >>> I think the big problem with two-stage pg_upgrade is that the user steps >>> are more complex, so what percentage of users are going use the >>> two-stage method. The bad news is that only a small percentage of users >>> who will benefit from it will use it, and some who will not benefit it >>> will use it. Also, this is going to require significant server changes, >>> which have to be maintained. >> >> In my fork of the project, back when I was tracking 9.5, I added an option >> to vacuum/analyze to make it behave a bit more like autovac, so that I could >> run >> >> ANALYZE CONDITIONALLY; >> >> and it would only analyze those tables in the system which autovac would >> analyze. In the grammar, CONDITIONALLY gets translated into a >> VacuumOption flag. In vacuum (in src/backend/commands/vacuum.c), inside >> the "Loop to process each selected relation", if this flag is set, it checks the >> PgStat_StatTabEntry for the table to determine whether to vacuum or analyze >> the table. >> >> I think this extension would be helpful in the context of the current conversation. >> In those cases where pg_upgrade was able to migrate the statistics to the >> new database, as long as it set the PgStat_StatTabEntry for each table where >> statistics were migrated, then the user would just have to execute a >> "VACUUM CONDITIONALLY" after upgrade, and the database would either >> do a lot of analyze work, a little analyze work, or no analyze work depending >> on which tables needed analyzing. >> >> The main advantage here is that the user would always run this command >> after pg_upgrade, without having to think about whether pg_upgrade had >> migrated statistics or not. >> >> If the community thinks this is useful, I could put together a patch. > > This certainly sounds nice though I have to admit to being a bit > skeptical on the keyword selection, but perhaps blue really is the right > color for that bike shed. > > One thing I'd wonder about is if that makes 'CONDITIONALLY' into a > reserved keyword, which wouldn't be ideal. Perhaps a bit of a stretch > but 'ANALYZE ALL NEEDED' might avoid that? Yeah, I expected some complaint about CONDITIONALLY, and I don't have any personal feelings about the choice of terms. I'm happy to go with your choice, or whatever the community decides. mark
Bruce, * Bruce Momjian (bruce@momjian.us) wrote: > I think the big problem with two-stage pg_upgrade is that the user steps > are more complex, so what percentage of users are going use the > two-stage method. The bad news is that only a small percentage of users > who will benefit from it will use it, and some who will not benefit it > will use it. Also, this is going to require significant server changes, > which have to be maintained. This is where I think we need to be considering a higher-level tool that makes it easier to run pg_upgrade and which could handle these multiple stages. > I think we need some statistics on how many users are going to benefit > from this, and how are users suppose to figure out if they will benefit > from it? If the complexity issue is addressed, then wouldn't all users who use pg_upgrade in link mode benefit from this..? Or are you thinking we need to figure out which users really need to have pg_upgrade be faster from those who don't? The former would be 100% and the latter seems extremely difficult to determine and not all that useful in the end- not every user needs SELECT to be faster, but we sure want to make it faster if we can do so reasonably. I agree that we need to really consider if the additional complexity is worth the performance improvement, but I don't think we really have a gauge as to the complexity level or the ongoing maintenance effort required, and without that we can't really say if it's too much or not. Thanks! Stephen
Attachment
On Fri, Dec 8, 2017 at 12:26:55PM -0500, Stephen Frost wrote: > Bruce, > > * Bruce Momjian (bruce@momjian.us) wrote: > > I think the big problem with two-stage pg_upgrade is that the user steps > > are more complex, so what percentage of users are going use the > > two-stage method. The bad news is that only a small percentage of users > > who will benefit from it will use it, and some who will not benefit it > > will use it. Also, this is going to require significant server changes, > > which have to be maintained. > > This is where I think we need to be considering a higher-level tool that > makes it easier to run pg_upgrade and which could handle these multiple > stages. > > > I think we need some statistics on how many users are going to benefit > > from this, and how are users suppose to figure out if they will benefit > > from it? > > If the complexity issue is addressed, then wouldn't all users who use > pg_upgrade in link mode benefit from this..? Or are you thinking we The instructions in the docs are going to be more complex. We don't have any planned way to make the two-stage approach the same complexity as the single-stage approach. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce, * Bruce Momjian (bruce@momjian.us) wrote: > On Fri, Dec 8, 2017 at 12:26:55PM -0500, Stephen Frost wrote: > > * Bruce Momjian (bruce@momjian.us) wrote: > > > I think the big problem with two-stage pg_upgrade is that the user steps > > > are more complex, so what percentage of users are going use the > > > two-stage method. The bad news is that only a small percentage of users > > > who will benefit from it will use it, and some who will not benefit it > > > will use it. Also, this is going to require significant server changes, > > > which have to be maintained. > > > > This is where I think we need to be considering a higher-level tool that > > makes it easier to run pg_upgrade and which could handle these multiple > > stages. > > > > > I think we need some statistics on how many users are going to benefit > > > from this, and how are users suppose to figure out if they will benefit > > > from it? > > > > If the complexity issue is addressed, then wouldn't all users who use > > pg_upgrade in link mode benefit from this..? Or are you thinking we > > The instructions in the docs are going to be more complex. We don't > have any planned way to make the two-stage approach the same complexity > as the single-stage approach. A lot of the complexity in the current approach is all the different steps that we take in the pg_upgrade process, which is largely driven by the lack of a higher-level tool to handle all those steps. The distributions have tried to address that by having their own tools and scripts to simplify the process. Debian-based systems have pg_upgradecluster which is a single command that handles everything. The RPM-based systems have the 'setup' script that's pretty similar. If pg_upgrade had a two-stage process, those scripts would be updated to take advantage of that, I expect, and users would be largely shielded from the increase in complexity. One of the things that those scripts are able to take advantage of are configuration files which they have (or hard-coded knowledge) about where everything is on the system, what clusters exist, etc. That's information that pg_upgrade itself doesn't have and therefore we can't really automate in pg_upgrade. One option would be to build support for pg_upgrade to have a config file that has all of that information and then make pg_upgrade (or another tool) able to manage the process. Given that the major distributions already have that, I'm not entirely sure that it's something we should be trying to duplicate since I don't think we'd be able to do so in a way that integrates to the same level that the distribution-specific tools do, but perhaps we should try, or maybe work with the distributions to have a way for them to generate a config file with the info we need..? Just brain storming here. Thanks! Stephen
Attachment
On Sat, Dec 9, 2017 at 08:45:14AM -0500, Stephen Frost wrote: > Bruce, > > * Bruce Momjian (bruce@momjian.us) wrote: > > On Fri, Dec 8, 2017 at 12:26:55PM -0500, Stephen Frost wrote: > > > * Bruce Momjian (bruce@momjian.us) wrote: > > > > I think the big problem with two-stage pg_upgrade is that the user steps > > > > are more complex, so what percentage of users are going use the > > > > two-stage method. The bad news is that only a small percentage of users > > > > who will benefit from it will use it, and some who will not benefit it > > > > will use it. Also, this is going to require significant server changes, > > > > which have to be maintained. > > > > > > This is where I think we need to be considering a higher-level tool that > > > makes it easier to run pg_upgrade and which could handle these multiple > > > stages. > > > > > > > I think we need some statistics on how many users are going to benefit > > > > from this, and how are users suppose to figure out if they will benefit > > > > from it? > > > > > > If the complexity issue is addressed, then wouldn't all users who use > > > pg_upgrade in link mode benefit from this..? Or are you thinking we > > > > The instructions in the docs are going to be more complex. We don't > > have any planned way to make the two-stage approach the same complexity > > as the single-stage approach. > > A lot of the complexity in the current approach is all the different > steps that we take in the pg_upgrade process, which is largely driven by > the lack of a higher-level tool to handle all those steps. Sorry for the late reply. Yes, that is _exactly_ the problem, stated better than I have in the past. > The distributions have tried to address that by having their own tools > and scripts to simplify the process. Debian-based systems have > pg_upgradecluster which is a single command that handles everything. > The RPM-based systems have the 'setup' script that's pretty similar. Yes, that is because the distributions scripts _have_ to deal with these higher-level details, so they are the logical place to automate pg_upgrade. > If pg_upgrade had a two-stage process, those scripts would be updated to > take advantage of that, I expect, and users would be largely shielded > from the increase in complexity. Yes, that is certainly possible. > One of the things that those scripts are able to take advantage of are > configuration files which they have (or hard-coded knowledge) about > where everything is on the system, what clusters exist, etc. That's > information that pg_upgrade itself doesn't have and therefore we can't I think the big problem with distribution scripts automating pg_upgrade is that each distribution is having to solve the problem on their own, and pg_upgrade is complex enough that this is a significant burden. > really automate in pg_upgrade. One option would be to build support for > pg_upgrade to have a config file that has all of that information and > then make pg_upgrade (or another tool) able to manage the process. Yes, that is an interesting idea. > Given that the major distributions already have that, I'm not entirely > sure that it's something we should be trying to duplicate since I don't > think we'd be able to do so in a way that integrates to the same level > that the distribution-specific tools do, but perhaps we should try, or > maybe work with the distributions to have a way for them to generate a > config file with the info we need..? Just brain storming here. Yes, thanks for the feedback. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Thu, Dec 7, 2017 at 11:28 AM, Justin Pryzby <pryzby@telsasoft.com> wrote:
Cheers,
On Tue, Dec 05, 2017 at 09:01:35AM -0500, Bruce Momjian wrote:
> As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
> zero-downtime upgrades. ... we discussed speeding up pg_upgrade.
>
> There are clusters that take a long time to dump the schema from the old
> cluster
Maybe it isn't representative of a typical case, but I can offer a data point:
For us, we have ~40 customers with DBs ranging in size from <100GB to ~25TB
(for which ~90% is on a ZFS tablespace with compression). We have what's
traditionally considered to be an excessive number of child tables, which works
okay since planning time is unimportant to us for the report queries which hit
them. Some of the tables are wide (historically up to 1600 columns). Some of
those have default values on nearly every column, and pg_attrdef was large
(>500MB), causing pg_dump --section pre-data to be slow (10+ minutes). Since
something similar is run by pg_upgrade, I worked around the issue for now by
dropping defaults on the historic children in advance of upgrades (at some
point I'll figure out what I have to do to allow DROPing DEFAULTs). It's not
the first time we've seen an issue with larger number of children*columns.
This is probably worth fixing independent of other ways of speeding up pg_upgrade.
It spends most of its time making the column names unique while de-parsing the DEFAULT clause for each column. But I don't think it ever outputs the column name which results from that deparsing, and since there is only one table involved, the names should already be unique anyway, unless I am missing something.
The time seems to be quadratic in number of columns if all columns have defaults, or proportional to the product of number of columns in table and the number of columns with defaults.
The CREATE TABLE has a similar problem upon restoring the dump.
Jeff