[PATCH] Add sampling statistics to autoanalyze log output - Mailing list pgsql-hackers

From 河田達也
Subject [PATCH] Add sampling statistics to autoanalyze log output
Date
Msg-id CAHza6qcTZd-eUBchXafeU2RwkhmBP1MVnoJthHbgi_Jy_UiyQA@mail.gmail.com
Whole thread Raw
Responses Re: [PATCH] Add sampling statistics to autoanalyze log output
List pgsql-hackers
Hi,

I would like to propose a patch to add sampling statistics to autoanalyze log output, addressing an inconsistency between ANALYZE VERBOSE and autoanalyze logging.

## Problem

Currently, ANALYZE VERBOSE displays sampling statistics, but autoanalyze does not log this information.
This makes it harder to diagnose issues with automatic statistics collection.

Example (current behavior):
- ANALYZE VERBOSE: Shows "INFO:  "pg_class": scanned 14 of 14 pages, containing 434 live rows and 11 dead rows; 434 rows in sample, 434 estimated total rows."
- autoanalyze: No sampling information

## Solution

This patch unifies the logging output by moving sampling statistics from acquire_sample_rows() to do_analyze_rel()'s instrumentation section. Now
both ANALYZE VERBOSE and autoanalyze output the same sampling information in a consolidated log message.

Key changes:
1. Updated AcquireSampleRowsFunc typedef to include 4 new output parameters
2. Modified acquire_sample_rows() and acquire_inherited_sample_rows() to populate these parameters
3. Added sampling statistics output in do_analyze_rel()
4. Updated postgres_fdw and file_fdw implementations

## Example Output

After the patch(adding both ANALYZE VERBOSE and autoanalyze) :
sampling: scanned 14 of 14 pages, containing 434 live rows and 11 dead rows; 434 rows in sample, 434 estimated total rows

For inherited tables, statistics are accumulated across all children.

## Design Question

For inherited tables, the current patch shows only the accumulated total.
An alternative approach would be to show per-child statistics followed by the total.
I wanted to align with do_analyze_rel()'s structure to properly support autoanalyze (autovacuum) logging.
However, I haven't found a clean way to preserve per-child output while maintaining this structure.
I would appreciate any advice or suggestions on how to achieve both goals if there's a better approach I'm missing.

I would appreciate your feedback!

Regards,
Attachment

pgsql-hackers by date:

Previous
From: Donghang Lin
Date:
Subject: Re: bt_index_parent_check and concurrently build indexes
Next
From: Naga Appani
Date:
Subject: Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring