Thread: Dump/Reload pg_statistic to cut time from pg_upgrade?
Originally posted on pg_admin but got no bites there... Admins; Planning to pg_upgrade some large (3TB) clusters using hard link method. Run time for the upgrade itself takes around 5 minutes. Nice!! Origin version 8.4 and destination version 9.1. Unfortunately the post-upgrade analyze of the entire cluster is going to take a minimum of 1.5 hours running several threads to analyze all tables. This was measured in an R&D environment. Got to thinking, what if we make a SAN snapshot a few hours prior to upgrade time, upgrade that... then analyze it and then dump the stats table? The assumption is that the DB will not have churned very much between then and when the real upgrade is performed and the stats saved from a few hours earlier might be good enough to go live with. Once the real upgrade is done we'd populate the stats table of it using the saved stats. I realize that there may be rows in the reloaded stats table corresponding to temporary objects that may not exist then, and would have to deal with same. I'd immediately begin analyzing the tables as to have the best stats once this is complete but being able to turn out a working production instance in like 10 minutes rather than 2 hours is the goal here. Anyway, perhaps there are other good reasons I should *not* attempt this but it turns out that the stats table can't be reloaded with it's own dump so this of course is a show-stopper. psql:d:456: ERROR: cannot accept a value of type anyarray CONTEXT: COPY pg_statistic, line 1, column stavalues1: "{_assets,_income,_liabilities,assets,income,liabilities}" Setting allow_system_table_mods to 'on' is required also prior to attempting the import. Has anyone else attempted anything similar? Any feedback is appreciated. Thanks -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net p: 312.241.7800
On Mon, 2013-07-08 at 14:20 -0500, Jerry Sievers wrote: > Planning to pg_upgrade some large (3TB) clusters using hard link > method. Run time for the upgrade itself takes around 5 minutes. > Nice!! Origin version 8.4 and destination version 9.1. > > Unfortunately the post-upgrade analyze of the entire cluster is going > to take a minimum of 1.5 hours running several threads to analyze all > tables. This was measured in an R&D environment. ... > Anyway, perhaps there are other good reasons I should *not* attempt > this but it turns out that the stats table can't be reloaded with it's > own dump so this of course is a show-stopper. > > psql:d:456: ERROR: cannot accept a value of type anyarray > CONTEXT: COPY pg_statistic, line 1, column stavalues1: "{_assets,_income,_liabilities,assets,income,liabilities}" [ late response, but might still be useful to someone ] You can work around the problem with a little effort if you call array_in directly. It takes the type output (cstring), element type (oid), and element typmod (integer). To dump the pg_statistics table, you have to output all of the columns plus the type ID and the typmod, and then load it back in by doing something like: insert into pg_statistic select starelid, ..., array_in(stavalues1, the_element_type, -1), ... from my_statistics_dump; The element typmod is always -1 for pg_statistic. To get the element type, you can join against pg_attribute. The only problem is, you don't actually want the attribute type, you want the type used for the statistics, which is normally the same but could be different. I don't think the statypid is stored in the catalog, so you'd have to inventory the types that you use and figure out a mapping of the type to it's statistics type looking at the typanalyze routines. So, it's possible to do, but not worth the effort unless you are quite concerned about the analyze time post-upgrade. It would be nice if we had a better way to backup, transfer, and upgrade statistics. However, allowing statistics to be upgraded could be a challenge if the statistics format changes between releases. Regards, Jeff Davis
On Sun, Sep 1, 2013 at 10:35:57AM -0700, Jeff Davis wrote: > [ late response, but might still be useful to someone ] > > You can work around the problem with a little effort if you call > array_in directly. It takes the type output (cstring), element type > (oid), and element typmod (integer). > > To dump the pg_statistics table, you have to output all of the columns > plus the type ID and the typmod, and then load it back in by doing > something like: > > insert into pg_statistic > select starelid, ..., > array_in(stavalues1, the_element_type, -1), ... > from my_statistics_dump; > > The element typmod is always -1 for pg_statistic. To get the element > type, you can join against pg_attribute. The only problem is, you don't > actually want the attribute type, you want the type used for the > statistics, which is normally the same but could be different. I don't > think the statypid is stored in the catalog, so you'd have to inventory > the types that you use and figure out a mapping of the type to it's > statistics type looking at the typanalyze routines. > > So, it's possible to do, but not worth the effort unless you are quite > concerned about the analyze time post-upgrade. > > It would be nice if we had a better way to backup, transfer, and upgrade > statistics. However, allowing statistics to be upgraded could be a > challenge if the statistics format changes between releases. FYI, the data stored in those statistics tables frequently change in major releases, so a simple dump/reload of those statistics might give you the _wrong_ statistics. This is the big reason I have not done more work in migrating those. The incremental statistics build seems to have worked for most people, i.e. statistics target = 1, 10, default. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +