Re: Optional skipping of unchanged relations during ANALYZE? - Mailing list pgsql-hackers

From VASUKI M
Subject Re: Optional skipping of unchanged relations during ANALYZE?
Date
Msg-id CAE2r8H5ZYiFxYzhWDAyHO1jxSn7oB-9hPV8VdhAkkJjCwfg0HQ@mail.gmail.com
Whole thread Raw
In response to Re: Optional skipping of unchanged relations during ANALYZE?  (Sami Imseih <samimseih@gmail.com>)
Responses Re: Optional skipping of unchanged relations during ANALYZE?
Re: Optional skipping of unchanged relations during ANALYZE?
List pgsql-hackers
Hi all,

I am back with a new patch.

As a follow-up to the previous discussion, I’m posting a revised v3 patch
that introduces an opt-in ANALYZE (MISSING_STATS) option.

The goal is to reduce unnecessary work when running manual ANALYZE over
many relations, while keeping the default behavior unchanged.

Overview:
ANALYZE (MISSING_STATS) analyzes only relations that currently have no
statistics entries in pg_statistic. Relations that already have statistics
are skipped.

This is conceptually similar to vacuumdb --missing-stats-only, but exposed
at the SQL ANALYZE level for interactive and scripted use.

The decision is intentionally table-level and conservative:

- If a relation has at least one pg_statistic entry, it is considered to
  “have stats” and may be skipped.
- If no pg_statistic rows exist (new table, stats removed, crash reset),
  the relation is analyzed.

No thresholds or modification counters are used in this option.

Behavior summary for ANALYZE(MISSING_STATS);


New empty table------------------------------->analyzed
Re-run on empty table------------------------> analyzed
Table with data but no stats-----------------> analyzed
Re-run after stats exist-----------------------> skipped
Add new column after ANALYZE----------> analyzed
Re-run after column stats exist-------------> skipped
Statistics manually deleted (pg_statistic) -> analyzed
Statistics lost after crash recovery-----------> analyzed
Regular ANALYZE -------------------------------> unchanged behavior

This ensures that ANALYZE (MISSING_STATS) converges toward a no-op on
subsequent runs, while still recovering from missing or invalid statistics.

Scope and limitations:
- Applies only to regular relations.
- Uses pg_statistic directly (not pg_stats or pg_stat views).
- Does not consider modification thresholds or autovacuum heuristics.
- Partitioned tables, inheritance, and extended statistics are not handled
  yet and can be considered separately.

I would appreciate feedback on:

- Whether this behavior and naming align with expectations.
- Any edge cases I may have missed.
- Whether this is a reasonable first step before considering more advanced
  options (e.g., modified-stats thresholds).

While testing i have noted this :
analyze_test=# ALTER TABLE ms1 ADD COLUMN b int;
ALTER TABLE
Time: 44.665 ms
analyze_test=# ANALYZE (MISSING_STATS);
DEBUG:  ANALYZE processing relation "ms1" (OID 32791)
analyze_test=# SELECT attname
FROM pg_statistic s
JOIN pg_class c ON c.oid = s.starelid
JOIN pg_attribute a
  ON a.attrelid = c.oid AND a.attnum = s.staattnum
WHERE c.relname = 'ms1'
ORDER BY attname;
 attname
---------
 a
 b
(2 rows)

Time: 1.390 ms
analyze_test=# ANALYZE (MISSING_STATS);
DEBUG:  ANALYZE (MISSING_STATS): skipping relation "ms1" (OID 32791)
analyze_test=# SELECT
  a.attname,
  s.stanullfrac,
  s.stadistinct
FROM pg_statistic s
JOIN pg_class c ON c.oid = s.starelid
JOIN pg_attribute a
  ON a.attrelid = c.oid AND a.attnum = s.staattnum
WHERE c.relname = 'ms1';
 attname | stanullfrac | stadistinct
---------+-------------+-------------
 a       |           0 |          -1
 b       |           1 |           0
(2 rows)

Time: 0.733 ms

Note:
[1] The empty table[if a table has 0 rows] --> no pg_statistic rows at all
[2] If a table has >= 1 row then postgres creates pg_statistic rows for every user column,even if the 
      - Column is entirely null 
      - The column was added later
      - The column has never had a non-null values 

Thanks for your time and review, I will post the next patch for modified_stats shortly.


Vasuki M
C-DAC,Chennai
Attachment

pgsql-hackers by date:

Previous
From: Nikolay Shaplov
Date:
Subject: Re: Custom oauth validator options
Next
From: Xuneng Zhou
Date:
Subject: Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery