Re: Optional skipping of unchanged relations during ANALYZE? - Mailing list pgsql-hackers
| From | VASUKI M |
|---|---|
| Subject | Re: Optional skipping of unchanged relations during ANALYZE? |
| Date | |
| Msg-id | CAE2r8H5ZYiFxYzhWDAyHO1jxSn7oB-9hPV8VdhAkkJjCwfg0HQ@mail.gmail.com Whole thread Raw |
| In response to | Re: Optional skipping of unchanged relations during ANALYZE? (Sami Imseih <samimseih@gmail.com>) |
| Responses |
Re: Optional skipping of unchanged relations during ANALYZE?
Re: Optional skipping of unchanged relations during ANALYZE? |
| List | pgsql-hackers |
Hi all,
I am back with a new patch.
As a follow-up to the previous discussion, I’m posting a revised v3 patch
that introduces an opt-in ANALYZE (MISSING_STATS) option.
The goal is to reduce unnecessary work when running manual ANALYZE over
many relations, while keeping the default behavior unchanged.
Overview:
ANALYZE (MISSING_STATS) analyzes only relations that currently have no
statistics entries in pg_statistic. Relations that already have statistics
are skipped.
This is conceptually similar to vacuumdb --missing-stats-only, but exposed
at the SQL ANALYZE level for interactive and scripted use.
The decision is intentionally table-level and conservative:
- If a relation has at least one pg_statistic entry, it is considered to
“have stats” and may be skipped.
- If no pg_statistic rows exist (new table, stats removed, crash reset),
the relation is analyzed.
No thresholds or modification counters are used in this option.
Behavior summary for ANALYZE(MISSING_STATS);
New empty table------------------------------->analyzed
Re-run on empty table------------------------> analyzed
Table with data but no stats-----------------> analyzed
Re-run after stats exist-----------------------> skipped
Add new column after ANALYZE----------> analyzed
Re-run after column stats exist-------------> skipped
Statistics manually deleted (pg_statistic) -> analyzed
Statistics lost after crash recovery-----------> analyzed
Regular ANALYZE -------------------------------> unchanged behavior
This ensures that ANALYZE (MISSING_STATS) converges toward a no-op on
subsequent runs, while still recovering from missing or invalid statistics.
Scope and limitations:
- Applies only to regular relations.
- Uses pg_statistic directly (not pg_stats or pg_stat views).
- Does not consider modification thresholds or autovacuum heuristics.
- Partitioned tables, inheritance, and extended statistics are not handled
yet and can be considered separately.
I am back with a new patch.
As a follow-up to the previous discussion, I’m posting a revised v3 patch
that introduces an opt-in ANALYZE (MISSING_STATS) option.
The goal is to reduce unnecessary work when running manual ANALYZE over
many relations, while keeping the default behavior unchanged.
Overview:
ANALYZE (MISSING_STATS) analyzes only relations that currently have no
statistics entries in pg_statistic. Relations that already have statistics
are skipped.
This is conceptually similar to vacuumdb --missing-stats-only, but exposed
at the SQL ANALYZE level for interactive and scripted use.
The decision is intentionally table-level and conservative:
- If a relation has at least one pg_statistic entry, it is considered to
“have stats” and may be skipped.
- If no pg_statistic rows exist (new table, stats removed, crash reset),
the relation is analyzed.
No thresholds or modification counters are used in this option.
Behavior summary for ANALYZE(MISSING_STATS);
New empty table------------------------------->analyzed
Re-run on empty table------------------------> analyzed
Table with data but no stats-----------------> analyzed
Re-run after stats exist-----------------------> skipped
Add new column after ANALYZE----------> analyzed
Re-run after column stats exist-------------> skipped
Statistics manually deleted (pg_statistic) -> analyzed
Statistics lost after crash recovery-----------> analyzed
Regular ANALYZE -------------------------------> unchanged behavior
This ensures that ANALYZE (MISSING_STATS) converges toward a no-op on
subsequent runs, while still recovering from missing or invalid statistics.
Scope and limitations:
- Applies only to regular relations.
- Uses pg_statistic directly (not pg_stats or pg_stat views).
- Does not consider modification thresholds or autovacuum heuristics.
- Partitioned tables, inheritance, and extended statistics are not handled
yet and can be considered separately.
I would appreciate feedback on:
- Whether this behavior and naming align with expectations.
- Any edge cases I may have missed.
- Whether this is a reasonable first step before considering more advanced
options (e.g., modified-stats thresholds).
While testing i have noted this :
analyze_test=# ALTER TABLE ms1 ADD COLUMN b int;
ALTER TABLE
Time: 44.665 ms
analyze_test=# ANALYZE (MISSING_STATS);
DEBUG: ANALYZE processing relation "ms1" (OID 32791)
analyze_test=# SELECT attname
FROM pg_statistic s
JOIN pg_class c ON c.oid = s.starelid
JOIN pg_attribute a
ON a.attrelid = c.oid AND a.attnum = s.staattnum
WHERE c.relname = 'ms1'
ORDER BY attname;
attname
---------
a
b
(2 rows)
Time: 1.390 ms
analyze_test=# ANALYZE (MISSING_STATS);
DEBUG: ANALYZE (MISSING_STATS): skipping relation "ms1" (OID 32791)
analyze_test=# SELECT
a.attname,
s.stanullfrac,
s.stadistinct
FROM pg_statistic s
JOIN pg_class c ON c.oid = s.starelid
JOIN pg_attribute a
ON a.attrelid = c.oid AND a.attnum = s.staattnum
WHERE c.relname = 'ms1';
attname | stanullfrac | stadistinct
---------+-------------+-------------
a | 0 | -1
b | 1 | 0
(2 rows)
Time: 0.733 ms
Note:
[1] The empty table[if a table has 0 rows] --> no pg_statistic rows at all
[2] If a table has >= 1 row then postgres creates pg_statistic rows for every user column,even if the
- Column is entirely null
- The column was added later
- The column has never had a non-null values
- Whether this behavior and naming align with expectations.
- Any edge cases I may have missed.
- Whether this is a reasonable first step before considering more advanced
options (e.g., modified-stats thresholds).
While testing i have noted this :
analyze_test=# ALTER TABLE ms1 ADD COLUMN b int;
ALTER TABLE
Time: 44.665 ms
analyze_test=# ANALYZE (MISSING_STATS);
DEBUG: ANALYZE processing relation "ms1" (OID 32791)
analyze_test=# SELECT attname
FROM pg_statistic s
JOIN pg_class c ON c.oid = s.starelid
JOIN pg_attribute a
ON a.attrelid = c.oid AND a.attnum = s.staattnum
WHERE c.relname = 'ms1'
ORDER BY attname;
attname
---------
a
b
(2 rows)
Time: 1.390 ms
analyze_test=# ANALYZE (MISSING_STATS);
DEBUG: ANALYZE (MISSING_STATS): skipping relation "ms1" (OID 32791)
analyze_test=# SELECT
a.attname,
s.stanullfrac,
s.stadistinct
FROM pg_statistic s
JOIN pg_class c ON c.oid = s.starelid
JOIN pg_attribute a
ON a.attrelid = c.oid AND a.attnum = s.staattnum
WHERE c.relname = 'ms1';
attname | stanullfrac | stadistinct
---------+-------------+-------------
a | 0 | -1
b | 1 | 0
(2 rows)
Time: 0.733 ms
Note:
[1] The empty table[if a table has 0 rows] --> no pg_statistic rows at all
[2] If a table has >= 1 row then postgres creates pg_statistic rows for every user column,even if the
- Column is entirely null
- The column was added later
- The column has never had a non-null values
Thanks for your time and review, I will post the next patch for modified_stats shortly.
Vasuki M
C-DAC,Chennai
C-DAC,Chennai
Attachment
pgsql-hackers by date: