Re: [PATCH] Add sampling statistics to autoanalyze log output - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: [PATCH] Add sampling statistics to autoanalyze log output
Date
Msg-id CAHGQGwEY7c-KkE0zqdkLdVKmVQfnqgPUc3B2ugkp4d2wGrL1=g@mail.gmail.com
Whole thread Raw
In response to [PATCH] Add sampling statistics to autoanalyze log output  (河田達也 <kawatatatsuya0913@gmail.com>)
Responses Re: [PATCH] Add sampling statistics to autoanalyze log output
List pgsql-hackers
On Sun, Dec 7, 2025 at 2:40 AM 河田達也 <kawatatatsuya0913@gmail.com> wrote:
>
> Hi,
>
> I would like to propose a patch to add sampling statistics to autoanalyze log output, addressing an inconsistency
betweenANALYZE VERBOSE and autoanalyze logging. 
>
> ## Problem
>
> Currently, ANALYZE VERBOSE displays sampling statistics, but autoanalyze does not log this information.
> This makes it harder to diagnose issues with automatic statistics collection.
>
> Example (current behavior):
> - ANALYZE VERBOSE: Shows "INFO:  "pg_class": scanned 14 of 14 pages, containing 434 live rows and 11 dead rows; 434
rowsin sample, 434 estimated total rows." 
> - autoanalyze: No sampling information
>
> ## Solution
>
> This patch unifies the logging output by moving sampling statistics from acquire_sample_rows() to do_analyze_rel()'s
instrumentationsection. Now 
> both ANALYZE VERBOSE and autoanalyze output the same sampling information in a consolidated log message.

+1 to including sampling information in the autoanalyze log message as well.


> Key changes:
> 1. Updated AcquireSampleRowsFunc typedef to include 4 new output parameters
> 2. Modified acquire_sample_rows() and acquire_inherited_sample_rows() to populate these parameters
> 3. Added sampling statistics output in do_analyze_rel()
> 4. Updated postgres_fdw and file_fdw implementations
>
> ## Example Output
>
> After the patch(adding both ANALYZE VERBOSE and autoanalyze) :
> sampling: scanned 14 of 14 pages, containing 434 live rows and 11 dead rows; 434 rows in sample, 434 estimated total
rows
>
> For inherited tables, statistics are accumulated across all children.
>
> ## Design Question
>
> For inherited tables, the current patch shows only the accumulated total.
> An alternative approach would be to show per-child statistics followed by the total.
> I wanted to align with do_analyze_rel()'s structure to properly support autoanalyze (autovacuum) logging.
> However, I haven't found a clean way to preserve per-child output while maintaining this structure.
> I would appreciate any advice or suggestions on how to achieve both goals if there's a better approach I'm missing.
>
> I would appreciate your feedback!

I noticed an issue with the sampling information shown when
analyzing foreign tables. Based on my testing, the reported values seem
incorrect. Is this expected behavior?

For example, running ANALYZE VERBOSE on a regular table and its foreign
table produces different sampling output:

    CREATE EXTENSION postgres_fdw ;
    CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw ;
    CREATE USER MAPPING FOR public SERVER loopback ;
    CREATE TABLE t (i int) ;
    CREATE FOREIGN TABLE ft (i int) SERVER loopback OPTIONS (table_name 't') ;
    INSERT INTO ft SELECT n FROM generate_series(1, 100000) n ;

    The sampling output by ANALYZE VERBOSE t:
    sampling: scanned 443 of 443 pages, containing 100000 live rows
and 0 dead rows; 30000 rows in sample, 100000 estimated total rows

    The sampling output by ANALYZE VERBOSE ft:
    sampling: scanned 1000 of 1000 pages, containing 30261 live rows
and 0 dead rows; 30000 rows in sample, 100000 estimated total rows

In particular, the reported number of scanned pages and live rows for
the foreign table look wrong.


Also, the patch moves the sampling information to appear after
buffer usage, WAL usage, etc. In my opinion, it's more intuitive to
report analyze activity before buffer and WAL usage, as ANALYZE VERBOSE
currently does. VACUUM VERBOSE follows the same pattern,
reporting activity details before buffer and WAL usage.

Regards,

--
Fujii Masao



pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Next
From: Matthias van de Meent
Date:
Subject: Re: Proposal: Add a UNIQUE NOT ENFORCED constraint