Re: Statistics Import and Export - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Statistics Import and Export
Date
Msg-id Zz0T1BENIFDnXmwf@momjian.us
Whole thread Raw
In response to Re: Statistics Import and Export  (Corey Huinker <corey.huinker@gmail.com>)
List pgsql-hackers
On Tue, Nov 19, 2024 at 03:47:20PM -0500, Corey Huinker wrote:
>     I don't have a strong opinion here, but I suspect that if I was creating
>     vacuumdb from scratch, I'd have suggested a --missing-only flag that would
>     only work for --analyze-only/--analyze-in-stages.  That way, folks can
>     still regenerate statistics if they want, but we also have an answer for
>     folks who use pg_upgrade and have extended statistics.
> 
> 
> (combining responses to Bruce's para about surprise calculus and Nathan here)
> 
> I agree that a clean API is desirable and a goal. And as I stated before, a new
> flag (--analyze-missing-in-stages / --analyze-post-pgupgrade, etc) or a flag
> modifier ( --missing-only ) was my first choice.

Yes, after a clean API is designed, you can then consider surprise
calculus.  This is an issue not only for this feature, but for all
Postgres changes we consider, which is why I think it is worth stating
this clearly.  If I am thinking incorrectly, we can discuss that here too.

> But if we're going to go that route, we have a messaging problem. We need to
> reach our customers who plan to upgrade, and explain to them that the
> underlying assumption behind running vacuumdb has gone away for 99% of them,
> and that may be 100% in the next version, but for that 99% running vacuumdb in
> the old way now actively undoes one of the major improvements to pg_upgrade,
> but this one additional option keeps the benefits of the new pg_upgrade without
> the drawbacks.

How much are we supposed to consider users who do not read the major
release notes?  I realize we might be unrealistic to expect that from
the majority of our users, but I also don't want to contort our API to
adjust for them.

> That, and once we have extended statistics importing on upgrade, then the need
> for vacuumdb post-upgrade goes away entirely. So we'll have to re-message the
> users with that news too.
> 
> I'd be in favor of this, but I have to be honest, our messaging reach is not
> good, and takes years to sink in. Years in which the message will change at
> least one more time. And this outreach will likely confuse users who already
> weren't (and now shouldn't be) using vacuumdb. In light of that, the big risk
> was that an action that some users learned to do years ago was now actively
> undoing whatever gains they were supposed to get in their upgrade downtime, and
> that downtime is money to them, hence the surprise calculus.

That is a big purpose of the major release notes.  We can even list this
as an incompatibility in the sense that the procedure has changed.

> One other possibilities we could consider:
> 
> * create a pg_stats_health_check script that lists tables missing stats, with
> --fix/--fix-in-stages options, effectively replacing vacuumdb for those
> purposes, and then crank up the messaging about that change. The "new shiny"
> effect of a new utility that has "stats", "health", and "check" in the name may
> be the search/click-bait we need to get the word out effectively. That last
> sentence may sound facetious, but it isn't, it's just accepting how search
> engines and eyeballs currently function. With that in place, we can then change
> the vacuumdb documentation to be deter future use in post-upgrade situations.

We used to create a script until the functionality was added to
vacuumdb.  Since 99% of users will not need to do anything after
pg_upgrade, it would make sense to output the script only for the 1% of
users who need it and tell users to run it, rather than giving
instructions that are a no-op for 99% of users.

> * move missing-stats rebuilds into pg_upgrade/pg_restore itself, and this would
> give us the simpler one-time message that users should stop using vacuumdb in
> upgrade situations.

Uh, that would make pg_upgrade take longer for some users, which might
be confusing.

> * Making a concerted push to get extended stats import into v18 despite the
> high-effort/low-reward nature of it, and then we can go with the simple
> messaging of "Remember vacuumdb, that thing you probably weren't running
> post-upgrade but should have been? Now you can stop using it!". I had extended
> stats imports working back when the function took JSON input, so it's do-able,
> but the difficulty lies in how to represent an array of incomplete pg_statistic
> rows in a serial fashion that is cross-version compatible.

I am not a big fan of that at this point.  If we get it, we can adjust
our API at that time, but I don't want to plan on it.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Fix an error while building test_radixtree.c with TEST_SHARED_RT
Next
From: Nathan Bossart
Date:
Subject: Re: pg_ctl/miscinit: print "MyStartTime" as a long long instead of long to avoid 2038 problem.