pg_upgrade and statistics - Mailing list pgsql-hackers
From | Daniel Farina |
---|---|
Subject | pg_upgrade and statistics |
Date | |
Msg-id | CAAZKuFaWdLkK8eozSAooZBets9y_mfo2HS6urPAKXEPbd-JLCA@mail.gmail.com Whole thread Raw |
Responses |
Re: pg_upgrade and statistics
Re: pg_upgrade and statistics |
List | pgsql-hackers |
As noted by the manual, pg_statistic is ported in any way when performing pg_upgrade. I have been investigating what it would take to (even via just a connected SQL superuser client running UPDATE or INSERT against pg_statistic) get at least some baseline statistics into the database as quickly as possible, since in practice the underlying implementation of the statistics and cost estimation does not change so dramatically between releases as to make the old statistics useless (AFAIK). I eventually used a few contortions to be able to update the anyarray elements in pg_statistic: UPDATE pg_statistic SET stavalues1=array_in(anyarray_out('{thearrayliteral}'::concrete_type[]), 'concrete_type'::regtype, atttypemod) WHERE staattnum = attnum and starelid = therelation; Notably, the type analysis phase is a bit too smart for me to simply cast to "anyarray" from a concrete type, so I run it through a deparse/reparse phase instead to fool it. Now I'm stuck trying to ensure that autoanalyze will run at least once after we have committed the old statistics to the new catalogs, regardless of how much activity has taken place on the table, regardless of how cold (thus, tuning the GUC thresholds is not attractive, because at what point should I tune them back to normal settings?). One idea I had was to jigger pg_stat to indicate that a lot of tuples have changed since the last analyze (which will be automatically fixed after autoanalyze on a relation completes) but because this is not a regular table it doesn't look too easy unless I break out a new C extension. You probably are going to ask: "why not just run ANALYZE and be done with it?" The reasons are: * ANALYZE can take a sufficiently long time on large databases that the downtime of switching versions is not attractive * If we don't run ANALYZE and have no old statistics, then the plans can be disastrously bad for the user * If we do run the ANALYZE statement on a user's behalf as part of the upgrade, any compatibility fixups that require an exclusive lock (such as some ALTER TABLE statements) would have to block on this relatively long ANALYZE. autoanalyze/autovacuum, by comparison, backs off frequently, so disaster is averted. If anyone has any insightful comments as to how to meet these requirements, I'd appreciate them, otherwise I can consider it an interesting area for improvement and will eat the ANALYZE and salt the documentation with caveats. -- fdr
pgsql-hackers by date: