Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds - Mailing list pgsql-hackers

From VASUKI M
Subject Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds
Date
Msg-id CAE2r8H4KRCJ055utU4u+3rBYSgAmiFgMgswaBMN_iOx16iTubQ@mail.gmail.com
Whole thread
Responses Re: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds
List pgsql-hackers

Hi all,

Following up on the recent discussion around ANALYZE (MISSING_STATS_ONLY):[1]

I would like to start a separate discussion about a potential ANALYZE (MODIFIED_STATS) option.

The idea is to allow manual ANALYZE to reuse the same threshold logic that autoanalyze uses, so that when a user explicitly runs ANALYZE, only relations that have crossed the modification threshold are processed.

Conceptually, this would use the existing formula:


analyze threshold = analyze_base_threshold
                    + analyze_scale_factor * reltuples


and compare it against n_mod_since_analyze, similar to how autovacuum decides when to trigger analyze.

The goal is not to replace autoanalyze, but to expose its decision model at SQL level for deterministic, user-controlled execution. For example:

-Running maintenance in scripted environments
-Triggering analysis immediately after batch data loads
-Avoiding unnecessary work when running manual ANALYZE across many relations
- Environments where autovacuum is tuned conservatively or partially disabled

Autoanalyze runs opportunistically in the background. This proposal would allow a user to apply the same threshold logic explicitly and immediately.

A possible usage would look like: ANALYZE (MODIFIED_STATS);

I understand there is conceptual overlap with autovacuum, so I would especially appreciate feedback on:

-Whether exposing the threshold logic at SQL level makes sense architecturally
-Whether this should remain an explicit opt-in option
-Naming (e.g., MODIFIED_STATS vs SKIP_UNMODIFIED or something clearer)
-Whether the thresholds should reuse existing GUCs or accept per-command overrides

I intentionally kept this separate from MISSING_STATS_ONLY, since that option answers a different question (“are stats missing?”) while this one would answer (“have enough rows changed to justify re-analysis?”).

I would greatly appreciate thoughts before working on a prototype patch.

Thanks again for all the feedback so far — it has been very helpful ,expecting here the same.

Regards,
Vasuki M
C-DAC,Chennai

[1][https://www.postgresql.org/message-id/CAE2r8H61ZtT4ek3JmLKDPmr7ALQ0uE9WSwwJRFHbXm0WdOJnEQ@mail.gmail.com]

pgsql-hackers by date:

Previous
From: Kirill Reshke
Date:
Subject: Re: Release and unpin buffers after leaving CRIT section in GIN.
Next
From: Soumya S Murali
Date:
Subject: Re: [PATCH] Expose checkpoint reason to completion log messages.