Re: Statistics Import and Export - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Statistics Import and Export
Date
Msg-id CADkLM=cwkxTFOawsi0FwVJH2AuRxLhJz8XF3oxipVwmy3eEhwA@mail.gmail.com
Whole thread Raw
In response to Re: Statistics Import and Export  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Statistics Import and Export
List pgsql-hackers
I don't have a strong opinion here, but I suspect that if I was creating
vacuumdb from scratch, I'd have suggested a --missing-only flag that would
only work for --analyze-only/--analyze-in-stages.  That way, folks can
still regenerate statistics if they want, but we also have an answer for
folks who use pg_upgrade and have extended statistics.

(combining responses to Bruce's para about surprise calculus and Nathan here)

I agree that a clean API is desirable and a goal. And as I stated before, a new flag (--analyze-missing-in-stages / --analyze-post-pgupgrade, etc) or a flag modifier ( --missing-only ) was my first choice.

But if we're going to go that route, we have a messaging problem. We need to reach our customers who plan to upgrade, and explain to them that the underlying assumption behind running vacuumdb has gone away for 99% of them, and that may be 100% in the next version, but for that 99% running vacuumdb in the old way now actively undoes one of the major improvements to pg_upgrade, but this one additional option keeps the benefits of the new pg_upgrade without the drawbacks.

That, and once we have extended statistics importing on upgrade, then the need for vacuumdb post-upgrade goes away entirely. So we'll have to re-message the users with that news too.

I'd be in favor of this, but I have to be honest, our messaging reach is not good, and takes years to sink in. Years in which the message will change at least one more time. And this outreach will likely confuse users who already weren't (and now shouldn't be) using vacuumdb. In light of that, the big risk was that an action that some users learned to do years ago was now actively undoing whatever gains they were supposed to get in their upgrade downtime, and that downtime is money to them, hence the surprise calculus.

One other possibilities we could consider:

* create a pg_stats_health_check script that lists tables missing stats, with --fix/--fix-in-stages options, effectively replacing vacuumdb for those purposes, and then crank up the messaging about that change. The "new shiny" effect of a new utility that has "stats", "health", and "check" in the name may be the search/click-bait we need to get the word out effectively. That last sentence may sound facetious, but it isn't, it's just accepting how search engines and eyeballs currently function. With that in place, we can then change the vacuumdb documentation to be deter future use in post-upgrade situations.

* move missing-stats rebuilds into pg_upgrade/pg_restore itself, and this would give us the simpler one-time message that users should stop using vacuumdb in upgrade situations.

* Making a concerted push to get extended stats import into v18 despite the high-effort/low-reward nature of it, and then we can go with the simple messaging of "Remember vacuumdb, that thing you probably weren't running post-upgrade but should have been? Now you can stop using it!". I had extended stats imports working back when the function took JSON input, so it's do-able, but the difficulty lies in how to represent an array of incomplete pg_statistic rows in a serial fashion that is cross-version compatible.



 

pgsql-hackers by date:

Previous
From: Kirill Reshke
Date:
Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Next
From: Michel Pelletier
Date:
Subject: Re: Using Expanded Objects other than Arrays from plpgsql