ANALYZE ONLY - Mailing list pgsql-hackers

From Michael Harris
Subject ANALYZE ONLY
Date
Msg-id CADofcAWATx_haD=QkSxHbnTsAe6+e0Aw8Eh4H8cXyogGvn_kOg@mail.gmail.com
Whole thread Raw
Responses Re: ANALYZE ONLY
Re: ANALYZE ONLY
Re: ANALYZE ONLY
List pgsql-hackers
Hello Postgres Hackers

An application that I am developing uses Postgresql, and includes a fairly large
number of partitioned tables which are used to store time series data.

The tables are partitioned by time, and typically there is only one partition
at a time - the current partition - that is actually being updated.
Older partitions
are available for query and eventually dropped.

As per the documentation, partitioned tables are not analyzed by the autovacuum
workers, although their partitions are. Statistics are needed on the partitioned
table level for at least some query planning activities.

The problem is that giving an ANALYZE command targeting a partitioned table
causes it to update statistics for the partitioned table AND all the individual
partitions. There is currently no option to prevent it from including the
partitions.

This is wasteful for our application: for one thing the autovacuum
has already analyzed the individual partitions; for another most of
the partitions
will have had no changes, so they don't need to be analyzed repeatedly.

I took some measurements when running ANALYZE on one of our tables. It
took approx
4 minutes to analyze the partitioned table, then 29 minutes to analyze the
partitions. We have hundreds of these tables, so the cost is very significant.

For my use case at least it would be fantastic if we could add an ONLY option
to ANALYZE, which would cause it to analyze the named table only and not
descend into the partitions.

I took a look at the source and it looks doable, but before launching into it
I thought I would ask a few questions here.

  1. Would such a feature be welcomed? Are there any traps I might not
have thought of?

  2. The existing ANALYZE command has the following structure:

     ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ]

     It would be easiest to add ONLY as another option, but that
doesn't look quite
     right to me - surely the ONLY should be attached to the table name?
     An alternative would be:

     ANALYZE [ ( option [, ...] ) ] [ONLY] [ table_and_columns [, ...] ]

Any feedback or advice would be great.

Regards
Mike.



pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: define PG_REPLSLOT_DIR
Next
From: Kirill Reshke
Date:
Subject: Re: Incremental View Maintenance, take 2