Thread: Dump/Reload pg_statistic to cut time from pg_upgrade?

Dump/Reload pg_statistic to cut time from pg_upgrade?

From

Jerry Sievers

Date:

09 July 2013, 18:06:30

Originally posted on pg_admin but got no bites there...

Admins;

Planning to pg_upgrade some large (3TB) clusters using hard link
method. Run time for the upgrade itself takes around 5 minutes.
Nice!! Origin version 8.4 and destination version 9.1.

Unfortunately the post-upgrade analyze of the entire cluster is going
to take a minimum of 1.5 hours running several threads to analyze all
tables. This was measured in an R&D environment.

Got to thinking, what if we make a SAN snapshot a few hours prior to
upgrade time, upgrade that... then analyze it and then dump the stats
table?

The assumption is that the DB will not have churned very much between
then and when the real upgrade is performed and the stats saved from
a few hours earlier might be good enough to go live with.

Once the real upgrade is done we'd populate the stats table of it
using the saved stats. I realize that there may be rows in the
reloaded stats table corresponding to temporary objects that may not
exist then, and would have to deal with same.

I'd immediately begin analyzing the tables as to have the best stats
once this is complete but being able to turn out a working production
instance in like 10 minutes rather than 2 hours is the goal here.

Anyway, perhaps there are other good reasons I should *not* attempt
this but it turns out that the stats table can't be reloaded with it's
own dump so this of course is a show-stopper.

psql:d:456: ERROR: cannot accept a value of type anyarray
CONTEXT: COPY pg_statistic, line 1, column stavalues1: "{_assets,_income,_liabilities,assets,income,liabilities}"

Setting allow_system_table_mods to 'on' is required also prior to
attempting the import.

Has anyone else attempted anything similar? Any feedback is
appreciated.

Thanks

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800

Re: Dump/Reload pg_statistic to cut time from pg_upgrade?

From

Jeff Davis

Date:

01 September 2013, 17:35:51

On Mon, 2013-07-08 at 14:20 -0500, Jerry Sievers wrote:
> Planning to pg_upgrade some large (3TB) clusters using hard link
> method.  Run time for the upgrade itself takes around 5 minutes.
> Nice!!  Origin version 8.4 and destination version 9.1.
>
> Unfortunately the post-upgrade analyze of the entire cluster is going
> to take a minimum of 1.5 hours running several threads to analyze all
> tables.  This was measured in an R&D environment.

...

> Anyway, perhaps there are other good reasons I should *not* attempt
> this but it turns out that the stats table can't be reloaded with it's
> own dump so this of course is a show-stopper.
>
> psql:d:456: ERROR:  cannot accept a value of type anyarray
> CONTEXT:  COPY pg_statistic, line 1, column stavalues1: "{_assets,_income,_liabilities,assets,income,liabilities}"

[ late response, but might still be useful to someone ]

You can work around the problem with a little effort if you call
array_in directly. It takes the type output (cstring), element type
(oid), and element typmod (integer).

To dump the pg_statistics table, you have to output all of the columns
plus the type ID and the typmod, and then load it back in by doing
something like:

   insert into pg_statistic
      select starelid, ...,
         array_in(stavalues1, the_element_type, -1), ...
      from my_statistics_dump;

The element typmod is always -1 for pg_statistic. To get the element
type, you can join against pg_attribute. The only problem is, you don't
actually want the attribute type, you want the type used for the
statistics, which is normally the same but could be different. I don't
think the statypid is stored in the catalog, so you'd have to inventory
the types that you use and figure out a mapping of the type to it's
statistics type looking at the typanalyze routines.

So, it's possible to do, but not worth the effort unless you are quite
concerned about the analyze time post-upgrade.

It would be nice if we had a better way to backup, transfer, and upgrade
statistics. However, allowing statistics to be upgraded could be a
challenge if the statistics format changes between releases.

Regards,
    Jeff Davis

Re: Dump/Reload pg_statistic to cut time from pg_upgrade?

From

Bruce Momjian

Date:

03 September 2013, 13:31:37

On Sun, Sep  1, 2013 at 10:35:57AM -0700, Jeff Davis wrote:
> [ late response, but might still be useful to someone ]
>
> You can work around the problem with a little effort if you call
> array_in directly. It takes the type output (cstring), element type
> (oid), and element typmod (integer).
>
> To dump the pg_statistics table, you have to output all of the columns
> plus the type ID and the typmod, and then load it back in by doing
> something like:
>
>    insert into pg_statistic
>       select starelid, ...,
>          array_in(stavalues1, the_element_type, -1), ...
>       from my_statistics_dump;
>
> The element typmod is always -1 for pg_statistic. To get the element
> type, you can join against pg_attribute. The only problem is, you don't
> actually want the attribute type, you want the type used for the
> statistics, which is normally the same but could be different. I don't
> think the statypid is stored in the catalog, so you'd have to inventory
> the types that you use and figure out a mapping of the type to it's
> statistics type looking at the typanalyze routines.
>
> So, it's possible to do, but not worth the effort unless you are quite
> concerned about the analyze time post-upgrade.
>
> It would be nice if we had a better way to backup, transfer, and upgrade
> statistics. However, allowing statistics to be upgraded could be a
> challenge if the statistics format changes between releases.

FYI, the data stored in those statistics tables frequently change in
major releases, so a simple dump/reload of those statistics might give
you the _wrong_ statistics.  This is the big reason I have not done more
work in migrating those.  The incremental statistics build seems to have
worked for most people, i.e. statistics target = 1, 10, default.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +