Re: Statistics Import and Export - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Statistics Import and Export |
Date | |
Msg-id | Zz0T1BENIFDnXmwf@momjian.us Whole thread Raw |
In response to | Re: Statistics Import and Export (Corey Huinker <corey.huinker@gmail.com>) |
List | pgsql-hackers |
On Tue, Nov 19, 2024 at 03:47:20PM -0500, Corey Huinker wrote: > I don't have a strong opinion here, but I suspect that if I was creating > vacuumdb from scratch, I'd have suggested a --missing-only flag that would > only work for --analyze-only/--analyze-in-stages. That way, folks can > still regenerate statistics if they want, but we also have an answer for > folks who use pg_upgrade and have extended statistics. > > > (combining responses to Bruce's para about surprise calculus and Nathan here) > > I agree that a clean API is desirable and a goal. And as I stated before, a new > flag (--analyze-missing-in-stages / --analyze-post-pgupgrade, etc) or a flag > modifier ( --missing-only ) was my first choice. Yes, after a clean API is designed, you can then consider surprise calculus. This is an issue not only for this feature, but for all Postgres changes we consider, which is why I think it is worth stating this clearly. If I am thinking incorrectly, we can discuss that here too. > But if we're going to go that route, we have a messaging problem. We need to > reach our customers who plan to upgrade, and explain to them that the > underlying assumption behind running vacuumdb has gone away for 99% of them, > and that may be 100% in the next version, but for that 99% running vacuumdb in > the old way now actively undoes one of the major improvements to pg_upgrade, > but this one additional option keeps the benefits of the new pg_upgrade without > the drawbacks. How much are we supposed to consider users who do not read the major release notes? I realize we might be unrealistic to expect that from the majority of our users, but I also don't want to contort our API to adjust for them. > That, and once we have extended statistics importing on upgrade, then the need > for vacuumdb post-upgrade goes away entirely. So we'll have to re-message the > users with that news too. > > I'd be in favor of this, but I have to be honest, our messaging reach is not > good, and takes years to sink in. Years in which the message will change at > least one more time. And this outreach will likely confuse users who already > weren't (and now shouldn't be) using vacuumdb. In light of that, the big risk > was that an action that some users learned to do years ago was now actively > undoing whatever gains they were supposed to get in their upgrade downtime, and > that downtime is money to them, hence the surprise calculus. That is a big purpose of the major release notes. We can even list this as an incompatibility in the sense that the procedure has changed. > One other possibilities we could consider: > > * create a pg_stats_health_check script that lists tables missing stats, with > --fix/--fix-in-stages options, effectively replacing vacuumdb for those > purposes, and then crank up the messaging about that change. The "new shiny" > effect of a new utility that has "stats", "health", and "check" in the name may > be the search/click-bait we need to get the word out effectively. That last > sentence may sound facetious, but it isn't, it's just accepting how search > engines and eyeballs currently function. With that in place, we can then change > the vacuumdb documentation to be deter future use in post-upgrade situations. We used to create a script until the functionality was added to vacuumdb. Since 99% of users will not need to do anything after pg_upgrade, it would make sense to output the script only for the 1% of users who need it and tell users to run it, rather than giving instructions that are a no-op for 99% of users. > * move missing-stats rebuilds into pg_upgrade/pg_restore itself, and this would > give us the simpler one-time message that users should stop using vacuumdb in > upgrade situations. Uh, that would make pg_upgrade take longer for some users, which might be confusing. > * Making a concerted push to get extended stats import into v18 despite the > high-effort/low-reward nature of it, and then we can go with the simple > messaging of "Remember vacuumdb, that thing you probably weren't running > post-upgrade but should have been? Now you can stop using it!". I had extended > stats imports working back when the function took JSON input, so it's do-able, > but the difficulty lies in how to represent an array of incomplete pg_statistic > rows in a serial fashion that is cross-version compatible. I am not a big fan of that at this point. If we get it, we can adjust our API at that time, but I don't want to plan on it. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
pgsql-hackers by date: