On Mon, Mar 12, 2012 at 9:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Bruce Momjian <bruce@momjian.us> writes:
>> Copying the statistics from the old server is on the pg_upgrade TODO
>> list. I have avoided it because it will add an additional requirement
>> that will make pg_upgrade more fragile in case of major version changes.
>
>> Does anyone have a sense of how often we change the statistics data
>> between major versions?
>
> I don't think pg_statistic is inherently any more stable than any other
> system catalog.
Agreed, but it would appear that in practice that a fair amount of it
carries forward. If someone ripped up the statistics system and did
them all over in such a way that the old fields had no meaning on
future costing metrics, that'd probably be reasonable cause for a
caveat involving full-blown reanalyze...still, that doesn't seem to
happen every year.
> We've whacked it around significantly just last week,
> which might color my perception a bit, but there are other changes on
> the to-do list. (For one example, see nearby complaints about
> estimating TOAST-related costs, which we could not fix without adding
> more stats data.)
Is accruing additional statistics likely going to be a big problem? I
noticed the addition of the new anyarray (presumably for
array-selectivity) features; would planning with an "empty" assumption
be disastrous vs. the old behavior, which had no concept of those at
all?
I don't think it's necessary to make statistics porting a feature of
pg_upgrade in all circumstances, but it would be "nice" when possible.That having been said, perhaps there are other
waysfor pg_upgrade to
be better invested in....or, best of all and somewhat unrelatedly,
full blown logical replication.
Although this conversation has taken focus on "how do we move stats
forward", I am about as interested in "how do I run statements (like
ANALYZE) more 'nicely'". The same general problem pervades many
background task issues, including autovacuum and large physical
reorganizations of data.
--
fdr