Home > mailing lists

Re: Optional skipping of unchanged relations during ANALYZE? - Mailing list pgsql-hackers

From	VASUKI M
Subject	Re: Optional skipping of unchanged relations during ANALYZE?
Date	March 2 15:48:05
Msg-id	CAE2r8H7hYGYi4QM85Q7bxs4RbT0Vn63c9ONFbwhAjuSGDzah_A@mail.gmail.com Whole thread
In response to	Re: Optional skipping of unchanged relations during ANALYZE? (Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>)
Responses	Re: Optional skipping of unchanged relations during ANALYZE?
List	pgsql-hackers

Tree view

Hello Robert, Sami, Ilia, and everyone,

Thank you all for the detailed review and thoughtful feedback. I have carefully gone through the comments and revised the patch accordingly. I truly appreciate the guidance — it helped clarify both the design and the implementation details.

Regarding Robert’s point about the two distinct use cases, I agree that MISSING_STATS_ONLY and MODIFIED_STATS represent separate concerns. MISSING_STATS_ONLY is catalog-driven and persistent in nature, ensuring that newly created tables, newly added columns, and newly defined extended statistics are not left without statistics. MODIFIED_STATS, on the other hand, is more closely aligned with modification thresholds and autoanalyze-like behavior, which is transient and threshold-based.

Keeping these concerns separate makes the semantics clearer and easier to reason about. In particular, as Robert mentioned, MISSING_STATS_ONLY must remain a standalone flag for scenarios such as integration into vacuumdb, where ensuring that missing statistics are generated is the primary goal.

To align with the CommitFest process, I have created a separate entry for this feature here:[1]

I will treat MODIFIED_STATS as a separate patch and discussion thread.

Addressing Sami’s technical comments:

1. Duplicate examine_attribute() calls

You were absolutely correct that the earlier version of the patch caused examine_attribute() to be invoked twice for each attribute: once in relation_has_missing_column_stats() and again during normal attribute processing. That approach was inefficient and awkward.
In the revised version, I have removed the early relation-level column scanning logic. Instead, the missing-statistics check is now integrated directly inside examine_attribute(). The function now takes a boolean missing_stats_only parameter. When this option is specified, examine_attribute() performs the pg_statistic lookup for that specific attribute. If statistics already exist, the function simply returns NULL, causing the column to be skipped naturally.

This ensures:
-Each attribute is examined only once.
-The missing-statistics logic fits naturally into the existing flow.
-We avoid redundant catalog lookups.

Inside do_analyze_rel(), after collecting vacattrstats, we now check whether:
-attr_cnt == 0
-no expression index attributes remain
-and there are no missing extended statistics
If all of these are true under MISSING_STATS_ONLY, the relation is skipped cleanly.This follows the structure you suggested and simplifies the overall design.

2. Test coverage

I have expanded the regression tests to cover the scenarios you mentioned.This ensures that MISSING_STATS_ONLY behaves correctly across more complex schema configurations.

3. Logging behavior

Thank you for the guidance on logging.I have removed the DEBUG-level elog() calls introduced in earlier versions. Logging is now consistent with existing ANALYZE behavior and uses ereport().
When VERBOSE mode is enabled and a relation is skipped due to MISSING_STATS_ONLY, the output now includes a single INFO-level message of the form:

INFO: Skipping analyzing "database.namespace.relation"

This matches the style used elsewhere in do_analyze_rel() and avoids unnecessary additional details when no statistics are collected.There is no additional logging when a relation is processed normally, keeping behavior aligned with existing ANALYZE semantics.

Regarding autoanalyze and MODIFIED_STATS
As Ilia clarified, autoanalyze decisions are purely threshold-driven and do not include missing-statistics logic. That reinforces the decision to keep MISSING_STATS_ONLY separate and explicit.[2]

If there are further suggestions regarding semantics, naming, or additional edge cases that should be tested, I would be very happy to incorporate them.
Thank you again for your time and detailed review.

Regards,
Vasuki M
C-DAC,Chennai

[1] https://commitfest.postgresql.org/patch/6516/
[2] https://www.postgresql.org/message-id/flat/aZSm77WEh8pxQYtf%40nathan#1ab5c06a7d2247d90e71fb995fa21a39

Attachment

v6-0001-ANALYZE-add-MISSING_STATS_ONLY-option.patch

pgsql-hackers by date:

From: Tatsuya Kawata
Date: 02 March, 15:47:23
Subject: Re: [Patch]Add tab completion for DELETE ... USING

From: David Geier
Date: 02 March, 15:50:37
Subject: Re: Convert NOT IN sublinks to anti-joins when safe

Re: Optional skipping of unchanged relations during ANALYZE? - Mailing list pgsql-hackers

Attachment

Previous

Next