Re: ANALYZE ONLY - Mailing list pgsql-hackers

From Melih Mutlu
Subject Re: ANALYZE ONLY
Date
Msg-id CAGPVpCQpVqkFq1ja6EDU6ZLyJxXSRk2yGbdV0-6GGs87Y_3qtA@mail.gmail.com
Whole thread Raw
In response to ANALYZE ONLY  (Michael Harris <harmic@gmail.com>)
Responses Re: ANALYZE ONLY
List pgsql-hackers
Hi Michael,

Thanks for starting this thread. I've also spent a bit time on this after reading your first thread on this issue [1] 

Michael Harris <harmic@gmail.com>, 20 Ağu 2024 Sal, 08:52 tarihinde şunu yazdı:
The problem is that giving an ANALYZE command targeting a partitioned table
causes it to update statistics for the partitioned table AND all the individual
partitions. There is currently no option to prevent it from including the
partitions.

This is wasteful for our application: for one thing the autovacuum
has already analyzed the individual partitions; for another most of
the partitions
will have had no changes, so they don't need to be analyzed repeatedly.

I agree that it's a waste to analyze partitions when they're already analyzed by autovacuum. It would be nice to have a way to run analyze only on a partitioned table without its partitions.

 
I took some measurements when running ANALYZE on one of our tables. It
took approx
4 minutes to analyze the partitioned table, then 29 minutes to analyze the
partitions. We have hundreds of these tables, so the cost is very significant.

I quickly tweaked the code a bit to exclude partitions when a partitioned table is being analyzed. I can confirm that there is a significant gain even on a simple case like a partitioned table with 10 partitions and 1M rows in each partition.

  1. Would such a feature be welcomed? Are there any traps I might not
have thought of?

  2. The existing ANALYZE command has the following structure:

     ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ]

     It would be easiest to add ONLY as another option, but that
doesn't look quite
     right to me - surely the ONLY should be attached to the table name
     An alternative would be:

     ANALYZE [ ( option [, ...] ) ] [ONLY] [ table_and_columns [, ...] ]

I feel closer to adding this as an option instead of a new keyword in ANALYZE grammar. To me, it would be easier to have this option and then give the names of partitioned tables as opposed to typing ONLY before each partition table.
But we should think of another name as ONLY is used differently (attached to the table name as you mentioned) in other contexts.

I've been also thinking about how this new option should affect inheritance tables. Should it have just no impact on them or only analyze the parent table without taking child tables into account? There are two records for an inheritance parent table in pg_statistic, one row for only the parent table and a second row including children. We might only analyze the parent table if this new "ONLY" option is specified. I'm not sure if that would be something users would need or not, but I think this option should behave similarly for both partitioned tables and inheritance tables.

If we decide to go with only partition tables and not care about inheritance, then naming this option to SKIP_PARTITIONS as Jelte suggested sounds fine. But that name wouldn't work if this option will affect inheritance tables.


Thanks,
--
Melih Mutlu
Microsoft

pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: define PG_REPLSLOT_DIR
Next
From: Bertrand Drouvot
Date:
Subject: Re: define PG_REPLSLOT_DIR