Re: pg_upgrade and statistics - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: pg_upgrade and statistics
Date
Msg-id 20120313141025.GI10441@momjian.us
Whole thread Raw
In response to Re: pg_upgrade and statistics  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pg_upgrade and statistics  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-hackers
On Tue, Mar 13, 2012 at 12:12:27AM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Copying the statistics from the old server is on the pg_upgrade TODO
> > list.  I have avoided it because it will add an additional requirement
> > that will make pg_upgrade more fragile in case of major version changes.
> 
> > Does anyone have a sense of how often we change the statistics data
> > between major versions?
> 
> I don't think pg_statistic is inherently any more stable than any other
> system catalog.  We've whacked it around significantly just last week,
> which might color my perception a bit, but there are other changes on
> the to-do list.  (For one example, see nearby complaints about
> estimating TOAST-related costs, which we could not fix without adding
> more stats data.)

Yes, that was my reaction too.  pg_upgrade has worked hard to avoid
copying any system tables, relying on pg_dump to handle that.  

I just received a sobering blog comment stating that pg_upgrade took 5
minutes on a 0.5TB database, but analyze took over an hour:
http://momjian.us/main/blogs/pgblog/2012.html#March_12_2012

Is there some type of intermediate format we could use to dump/restore
the statistics?  Is there an analyze "light" mode we could support that
would run faster?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


pgsql-hackers by date:

Previous
From: Joel Jacobson
Date:
Subject: Explicitly specifying use of IN/OUT variable in PL/pgSQL functions
Next
From: Kohei KaiGai
Date:
Subject: Re: [v9.2] Add GUC sepgsql.client_label