Re: Optional skipping of unchanged relations during ANALYZE? - Mailing list pgsql-hackers

From Robert Treat
Subject Re: Optional skipping of unchanged relations during ANALYZE?
Date
Msg-id CAJSLCQ3CoEjd=DiANwyBybFaOu24PZFXo5f8EQUbsZ+UL0wL0A@mail.gmail.com
Whole thread
In response to Re: Optional skipping of unchanged relations during ANALYZE?  (VASUKI M <vasukianand0119@gmail.com>)
Responses Re: Optional skipping of unchanged relations during ANALYZE?
List pgsql-hackers
On Mon, Feb 16, 2026 at 4:38 AM VASUKI M <vasukianand0119@gmail.com> wrote:
>
> Hi Andreas,
>
> Thank you for raising this — it’s a very good design question.
>
> You’re right that in many practical cases, a user invoking something like ANALYZE (MODIFIED_STATS) would also want to
includerelations that currently have no statistics. From an operational perspective, “missing stats” and “modified
stats”can overlap. 
>
> In my earlier prototype, I did attempt to handle both concerns together. However, during the previous discussion in
thethread, it became clear that combining the semantics made the behavior less predictable and harder to reason about.
Thatled to splitting the functionality into two more clearly defined options: 
>
> MISSING_STATS_ONLY → analyze relations lacking statistics.
>
> MODIFIED_STATS (proposed) → analyze relations whose statistics may be stale due to modifications.
>
> The motivation for separation was semantic clarity:
>
> MISSING_STATS_ONLY is catalog-based and persistent (derived from pg_statistic / pg_statistic_ext).
>
> MODIFIED_STATS would likely depend on modification counters or thresholds (similar to autoanalyze logic), which are
transientand not crash-persistent. 
>
> Keeping them distinct allows each option to have a well-defined and predictable contract.
>
> That said, your naming suggestion is interesting. A name such as SKIP_UNMODIFIED does express the behavior from the
inverseperspective and may indeed be clearer. Another possible direction could be: 
>
> ANALYZE (MISSING_STATS_ONLY)
>
> ANALYZE (SKIP_UNMODIFIED)
>
> Or potentially allowing both options together, if that proves semantically consistent.
>
> I’m very open to adjusting the naming and/or semantics if the consensus is that a combined approach would be more
practical.
>

Well, going back to the beginning of the thread, we have two distinct
use cases at the individual level. One (MISSING_STATS) is to quickly
go through the database and ensure they have added statistics for
anything that might be missing them, like new columns, new extended
statistics, etc... The other (MODIFIED_STATS) was having a way to
update statistics in active tables for databases with large numbers of
static tables in a way similar to how autoanalyze works, but available
on demand. While I suspect people will often run both of these
together, those are clearly separate concerns and based on the
original discussions where this was being hashed out, it is easier to
reason about them separately. And while I think you might be able to
argue that MODIFIED_STATS should also include MISSING_STATS (I do
wonder though, does autoanalyze do that?), given the use case of
integrating MISSING_STATS into vacuumdb , it absolutely needs to be a
stand alone flag for that scenario.

One bookkeeping note for VASUKI, I didn't see any commitfest entries
for either patch; I would create one for each of these features
separately within https://commitfest.postgresql.org/58/.

Robert Treat
https://xzilla.net



pgsql-hackers by date:

Previous
From: Fabrízio de Royes Mello
Date:
Subject: Re: convert SpinLock* macros to static inline functions
Next
From: Tom Lane
Date:
Subject: Re: generating function default settings from pg_proc.dat