Re: pg_upgrade and statistics - Mailing list pgsql-hackers

From Daniel Farina
Subject Re: pg_upgrade and statistics
Date
Msg-id CAAZKuFa0KUwWVjjFpWLyme07uUJ=h4zswM3wK2fzFK0A=qZs9A@mail.gmail.com
Whole thread Raw
In response to Re: pg_upgrade and statistics  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, Mar 12, 2012 at 9:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Bruce Momjian <bruce@momjian.us> writes:
>> Copying the statistics from the old server is on the pg_upgrade TODO
>> list.  I have avoided it because it will add an additional requirement
>> that will make pg_upgrade more fragile in case of major version changes.
>
>> Does anyone have a sense of how often we change the statistics data
>> between major versions?
>
> I don't think pg_statistic is inherently any more stable than any other
> system catalog.

Agreed, but it would appear that in practice that a fair amount of it
carries forward.  If someone ripped up the statistics system and did
them all over in such a way that the old fields had no meaning on
future costing metrics, that'd probably be reasonable cause for a
caveat involving full-blown reanalyze...still, that doesn't seem to
happen every year.

> We've whacked it around significantly just last week,
> which might color my perception a bit, but there are other changes on
> the to-do list.  (For one example, see nearby complaints about
> estimating TOAST-related costs, which we could not fix without adding
> more stats data.)

Is accruing additional statistics likely going to be a big problem?  I
noticed the addition of the new anyarray (presumably for
array-selectivity) features; would planning with an "empty" assumption
be disastrous vs. the old behavior, which had no concept of those at
all?

I don't think it's necessary to make statistics porting a feature of
pg_upgrade in all circumstances, but it would be "nice" when possible.That having been said, perhaps there are other
waysfor pg_upgrade to 
be better invested in....or, best of all and somewhat unrelatedly,
full blown logical replication.

Although this conversation has taken focus on "how do we move stats
forward", I am about as interested in "how do I run statements (like
ANALYZE) more 'nicely'".  The same general problem pervades many
background task issues, including autovacuum and large physical
reorganizations of data.

--
fdr


pgsql-hackers by date:

Previous
From: Daniel Farina
Date:
Subject: Re: pg_upgrade and statistics
Next
From: Andres Freund
Date:
Subject: Re: initdb and fsync