Re: Partitioned table statistics vs autoanalyze - Mailing list pgsql-performance

From Justin Pryzby
Subject Re: Partitioned table statistics vs autoanalyze
Date
Msg-id 20210722121559.GE19620@telsasoft.com
Whole thread Raw
In response to Partitioned table statistics vs autoanalyze  (Kamil Frydel <k.frydel@gmail.com>)
Responses Re: Partitioned table statistics vs autoanalyze  (Kamil Frydel <k.frydel@gmail.com>)
List pgsql-performance
On Thu, Jul 22, 2021 at 01:32:51PM +0200, Kamil Frydel wrote:
> table_1 and table_2 are hash partitioned using volume_id column. Usually we
> make analyze on partitions. We do not make analyze on the partitioned table
> (parent).
> However, if we run 'analyze' on the partitioned table then planner starts
> choosing hash join. As a comparison, the execution using nested loop takes
> about 15 minutes and if it is done using hash join then the query lasts for
> about 1 minute. When running 'analyze' for the partitioned table, postgres
> inserts statistics for the partitioned table into pg_stats (pg_statistics).
> Before that, there are only statistics for partitions. We suspect that this
> is the reason for selecting bad query plan.

> updated, the inheritance statistics will not be up to date unless you run
> ANALYZE manually.
> (https://www.postgresql.org/docs/13/sql-analyze.html)
> 
> I would appreciate if anyone could shed some light on the following
> questions:
> 1) Is this above paragraph from docs still valid in PG 13 and does it apply
> to declarative partitioning as well? Is running analyze manually on a
> partitioned table needed to get proper plans for queries on partitioned
> tables? Partitioned table (in the declarative way) is ”virtual” and does not
> keep any data so it seems that there are no statistics that can be gathered
> from the table itself and statistics from partitions should be sufficient.

Up through v13, autoanalyze doesn't collect stats on parent tables (neither
declarative nor inheritence).  I agree that this doesn't seem to be well
documented.  I think it should also be mentioned here:
https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-STATISTICS

In v14 (which is currently in beta), autoanalyze will process the partitioned
table automatically:
https://www.postgresql.org/docs/14/release-14.html
|Autovacuum now analyzes partitioned tables (Yuzuko Hosoya, Álvaro Herrera)
|Insert, update, and delete tuple counts from partitions are now propagated to their parent tables so autovacuum knows
whento process them.
 

> 2) Why does the planner need these statistics since they seem to be unused
> in the query plan. The query plan uses only partitions, not the partitioned
> table.

The "inherited" stats are used when you SELECT FROM table.  The stats for the
individual table would be needed when you SELECT FROM ONLY table (which makes
no sense for a partitioned table).

-- 
Justin



pgsql-performance by date:

Previous
From: Kamil Frydel
Date:
Subject: Partitioned table statistics vs autoanalyze
Next
From: "ldh@laurent-hasson.com"
Date:
Subject: RE: Big performance slowdown from 11.2 to 13.3