Thread: Re: pgsql: Update guidance for running vacuumdb after pg_upgrade.
Re: Nathan Bossart > Update guidance for running vacuumdb after pg_upgrade. > > Now that pg_upgrade can carry over most optimizer statistics, we > should recommend using vacuumdb's new --missing-stats-only option > to only analyze relations that are missing statistics. I've been looking at vacuumdb --missing-stats-only because Debian's pg_upgradecluster is using that now. I am wondering if this is really good advice in the pg_upgrade documentation. Sure it's nice that optimizer statistics are carried over by pg_upgrade, but the pg_stat_user_tables statistics are not carried over, and afaict these are the numbers that determine when the next autovacuum or autoanalyze run is going to happen. By removing the "please run vacuumdb on all tables" step from the pg_upgrade docs, we are effectively telling everyone that they should be starting with these numbers all 0, postponing the next run to some indeterminate point. Running `vacuumdb --missing-stats-only` does not fix that because it's skipping the tables. Is that the message we want to send? (If I am misinterpreting the situation the docs should still explain why this is ok.) Christoph
On Tue, Apr 22, 2025 at 09:43:56PM +0200, Christoph Berg wrote: > Re: Nathan Bossart >> Update guidance for running vacuumdb after pg_upgrade. >> >> Now that pg_upgrade can carry over most optimizer statistics, we >> should recommend using vacuumdb's new --missing-stats-only option >> to only analyze relations that are missing statistics. > > I've been looking at vacuumdb --missing-stats-only because Debian's > pg_upgradecluster is using that now. > > I am wondering if this is really good advice in the pg_upgrade > documentation. Sure it's nice that optimizer statistics are carried > over by pg_upgrade, but the pg_stat_user_tables statistics are not > carried over, and afaict these are the numbers that determine when the > next autovacuum or autoanalyze run is going to happen. By removing the > "please run vacuumdb on all tables" step from the pg_upgrade docs, we > are effectively telling everyone that they should be starting with > these numbers all 0, postponing the next run to some indeterminate > point. Running `vacuumdb --missing-stats-only` does not fix that > because it's skipping the tables. Is that the message we want to send? > > (If I am misinterpreting the situation the docs should still explain > why this is ok.) relation_needs_vacanalyze() uses dead_tuples, ins_since_vacuum, and mod_since_analyze. IIUC a full post-upgrade vacuumdb run would only set dead_tuples to a nonzero value, so the worst-case scenario is that it would take longer before a vacuum is triggered based on autovacuum_vacuum_{threshold,max_threshold,scale_factor}. To address this, I think we'd need to recommend using "vacuumdb --all --analyze-only" instead. We could alternatively suggest first running "vacuumdb --all --analyze-in-stages --missing-stats-only" (to fill in any missing stats) followed by "vacuumdb --all --analyze-only" (to update dead_tuples). However, I'm not sure how concerned to be about this. It does seem bad that it might take longer for tables to be vacuumed for the first time after upgrade, but I believe that's already the case for any type of unclean shutdown (e.g., immediate shutdown, server crash, starting from a base backup, point-in-time recovery). I see that we do recommend running ANALYZE after pg_stat_reset(), though. In any case, IMO it's unfortunate that we might end up recommending roughly the same post-upgrade steps as before even though the optimizer statistics are carried over. -- nathan
Re: Nathan Bossart > In any case, IMO it's unfortunate > that we might end up recommending roughly the same post-upgrade steps as > before even though the optimizer statistics are carried over. Maybe the docs (and the pg_upgrade scripts) should recommend the old procedure by default until this gap is closed? People could then still opt to use the new procedure in specific cases. Christoph
On Tue, Apr 22, 2025 at 11:03:29PM +0200, Christoph Berg wrote: > Re: Nathan Bossart >> In any case, IMO it's unfortunate >> that we might end up recommending roughly the same post-upgrade steps as >> before even though the optimizer statistics are carried over. > > Maybe the docs (and the pg_upgrade scripts) should recommend the old > procedure by default until this gap is closed? People could then still > opt to use the new procedure in specific cases. I think we'd still want to modify the --analyze-in-stages recommendation (from what is currently recommended for supported versions). If we don't, you'll wipe out the optimizer stats you brought over from the old version. Here is a rough draft of what I am thinking. -- nathan
Attachment
Re: To Nathan Bossart > > Update guidance for running vacuumdb after pg_upgrade. > > > > Now that pg_upgrade can carry over most optimizer statistics, we > > should recommend using vacuumdb's new --missing-stats-only option > > to only analyze relations that are missing statistics. > > I've been looking at vacuumdb --missing-stats-only because Debian's > pg_upgradecluster is using that now. The reason I was looking closely yesterday is because Debian's regression tests were tripping over it, but I only figured out the problem today: If I create a table in a PG13-or-earlier cluster, never ANALYZE it, and then pg_upgrade to 18 and run vacuumdb --analyze-only --missing-stats-only, the table will not get analyzed. The only table visited there is pg_largeobject. Upgrades from 14..17 are fine. Christoph
Re: Nathan Bossart > pg_log(PG_REPORT, > + "Some statistics are not transferred by pg_upgrade.\n" > "Once you start the new server, consider running:\n" > + " %s/vacuumdb %s--all --analyze-in-stages --missing-stats-only\n" > + " %s/vacuumdb %s--all --analyze-only", I would make it "Once you start the new server, run these two commands:" to make it explicit that both should be run, not alternatively either. This patch addresses my concern, thanks. Christoph
On Wed, Apr 23, 2025 at 04:01:33PM +0200, Christoph Berg wrote: > If I create a table in a PG13-or-earlier cluster, never ANALYZE it, > and then pg_upgrade to 18 and run vacuumdb --analyze-only > --missing-stats-only, the table will not get analyzed. The only table > visited there is pg_largeobject. I suspect this is due to commit 3d351d9, which started using -1 for reltuples before the first vacuum/analyze. Before that, we set it to 0, which could also mean the table is empty. --missing-stats-only checks for reltuples != 0. My first reaction is that we should just remove the reltuples != 0 check. That means vacuumdb might analyze some empty tables, but that doesn't seem too terrible. -- nathan
Attachment
Re: Nathan Bossart > My first reaction is that we should just remove the reltuples != 0 check. > That means vacuumdb might analyze some empty tables, but that doesn't seem > too terrible. Or some tables that aren't empty but were never analyzed when they should have been. Sounds like a good thing. Thanks! Christoph
Here is what I have staged for commit. I'll aim to commit these patches sometime next week to give time for additional feedback. -- nathan
Attachment
Re: Nathan Bossart > Here is what I have staged for commit. I'll aim to commit these patches > sometime next week to give time for additional feedback. I confirm my PG13 test table gets analyzed now with the patch. Christoph