Thread: ANALYZE on partitioned tables vs on individual partitions

ANALYZE on partitioned tables vs on individual partitions

From

Michael Harris

Date:

07 August 2024, 04:13:10

Hello Experts,

Our application has a database with a large number of partitioned tables used to store time series data.

It is partitioned by time: new data is populated into the current partition, old partitions eventually get dropped, and the partitions in between are largely static.

I had not realized until recently that the partitioned tables themselves are not analyzed by the autovacuum processes. I am now wondering if we should be manually analyzing those.

The problem is they are quite enormous and take a long time to analyze - I tested one yesterday and it took ~30mins, and we have hundreds of them. It might not be practical to regularly analyze them.

My questions are:

1. What is the consequence of not having good statistics on partitioned table level, if you do have good statistics on the partition level?

2. Which planning operations rely on partitioned table level statistics?

My reason for asking is I'm trying to understand the tradeoff between the IO cost associated with frequently analyzing as opposed to possibly non-optimal plans being chosen.

Thanks in advance, and sorry if this has all been covered before. I could not find anything definitive in the docs, and while I did find a few references by googling, none really explained the answers to those questions too well.

Regards

Mike

Re: ANALYZE on partitioned tables vs on individual partitions

From

Christophe Pettus

Date:

07 August 2024, 04:43:35

> On Aug 6, 2024, at 21:13, Michael Harris <harmic@gmail.com> wrote:
>
>   1. What is the consequence of not having good statistics on partitioned table level, if you do have good statistics
onthe partition level? 

The child partitions are just tables, and all of the same issues that can arise from bad statistics on a table can
ariseon them: specifically, bad query plans.  (There are also some consequences to leaving a table unvacuumed, even an
append-onlytable.)  The root table in declarative partitioning has no rows, so there really is no such thing as
"statisticsat the partition level." 

>   2. Which planning operations rely on partitioned table level statistics?

Just about every query uses the information that ANALYZE gathers in some way.  If the query can't be executed in any
otherconceivable way than a sequential scan, or if it is a simple lookup on a unique column, the statistics will not
contributeto the plan, but essentially every other plan will use them in one way or another. 

Child partitions should be autovacuumed and autoanalyzed just like any other table; they are not prohibited from
autovacuumin any way by default.  It's probably a good idea to investigate why they are not being picked up by
autovacuum. If they are created by a bulk load process, it's not a bad idea to do a VACUUM ANALYZE on them once the
bulkload is complete.

Re: ANALYZE on partitioned tables vs on individual partitions

From

David Rowley

Date:

07 August 2024, 05:22:51

On Wed, 7 Aug 2024 at 16:44, Christophe Pettus <xof@thebuild.com> wrote:
> Child partitions should be autovacuumed and autoanalyzed just like any other table; they are not prohibited from
autovacuumin any way by default. It's probably a good idea to investigate why they are not being picked up by
autovacuum. If they are created by a bulk load process, it's not a bad idea to do a VACUUM ANALYZE on them once the
bulkload is complete.

I think the complaint was about no autovacuum on the partitioned
table, not the partitions. This is expected as we don't track the
counters (in particular n_mod_since_analyze) shown in
pg_stat_all_tables at the partitioned table level, so the trigger
points that normally cause autovacuum to analyze or vacuum a table
just won't be triggered for a partitioned table. For VACUUM, that's
fine as, as you mentioned, no rows are stored. But for analyze, that
does present a problem.

To name the aspects of planning that rely on statistics of the
partitioned table, basically anything above the Append or MergeAppend
which joins the partitioned results together. So that doesn't include
the scans of each partition and any quals that are pushed down to the
scan level as those are able to use the partition level statistics.
However, it does include things like joins, group by, distinct as
those require n_distinct estimates for the partitioned table. It's not
all bad though as the row estimates for each individual partition will
be totalled up through the Append / MergeAppend simply by adding up
the row estimates for each Append / MergeAppend child plan. So, it's
really only an estimation problem for any node that comes after a join
node or a group by node as the output rows for those nodes will depend
on a good n_distinct estimate for the partitioned table.

Partition-wise joins and aggregates do change things a bit as those
features do permit moving those operations below the Append / Merge
Append, in which case the statistics for the individual partition can
be used.

You could consider manually setting the n_distinct_inherited estimates
for the columns that you join on or group by in the partitioned table.
You might find that you're able to choose a suitable value for that if
you review the documentation for that setting. In particular, please
review what is mentioned about using negative numbers for that
setting. You may be able to choose a value that scales correctly with
the row estimate that doesn't get outdated as you add more rows to the
partitions. You'll need to determine that based on the data you're
storing.

David

Re: ANALYZE on partitioned tables vs on individual partitions

From

Michael Harris

Date:

07 August 2024, 07:20:19

Many thanks David for the comprehensive response.

> I think the complaint was about no autovacuum on the partitioned
> table, not the partitions.

Yes, exactly.

One other piece of information: these tables contain a lot of columns, of which
only 4 are normally used for WHERE clauses or joins. The table I was
experimenting
with has 150 columns, 156026832 rows and occupies 166GB.

I found that running an ANALYZE specifying only those 4 columns only took
5 minutes, compared to the 30 minutes for the whole table.

That was a bit of a surprise as I imagined actually reading the table would take
most of the time and would be the same regardless of the number of columns
being analyzed, but I guess that is wrong.

Regards, Mike


On Wed, 7 Aug 2024 at 15:23, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Wed, 7 Aug 2024 at 16:44, Christophe Pettus <xof@thebuild.com> wrote:
> > Child partitions should be autovacuumed and autoanalyzed just like any other table; they are not prohibited from
autovacuumin any way by default.  It's probably a good idea to investigate why they are not being picked up by
autovacuum. If they are created by a bulk load process, it's not a bad idea to do a VACUUM ANALYZE on them once the
bulkload is complete. 
>
> I think the complaint was about no autovacuum on the partitioned
> table, not the partitions.  This is expected as we don't track the
> counters (in particular n_mod_since_analyze) shown in
> pg_stat_all_tables at the partitioned table level, so the trigger
> points that normally cause autovacuum to analyze or vacuum a table
> just won't be triggered for a partitioned table.  For VACUUM, that's
> fine as, as you mentioned, no rows are stored. But for analyze, that
> does present a problem.
>
> To name the aspects of planning that rely on statistics of the
> partitioned table, basically anything above the Append or MergeAppend
> which joins the partitioned results together. So that doesn't include
> the scans of each partition and any quals that are pushed down to the
> scan level as those are able to use the partition level statistics.
> However, it does include things like joins, group by, distinct as
> those require n_distinct estimates for the partitioned table. It's not
> all bad though as the row estimates for each individual partition will
> be totalled up through the Append / MergeAppend simply by adding up
> the row estimates for each Append / MergeAppend child plan. So, it's
> really only an estimation problem for any node that comes after a join
> node or a group by node as the output rows for those nodes will depend
> on a good n_distinct estimate for the partitioned table.
>
> Partition-wise joins and aggregates do change things a bit as those
> features do permit moving those operations below the Append / Merge
> Append, in which case the statistics for the individual partition can
> be used.
>
> You could consider manually setting the n_distinct_inherited estimates
> for the columns that you join on or group by in the partitioned table.
> You might find that you're able to choose a suitable value for that if
> you review the documentation for that setting. In particular, please
> review what is mentioned about using negative numbers for that
> setting. You may be able to choose a value that scales correctly with
> the row estimate that doesn't get outdated as you add more rows to the
> partitions. You'll need to determine that based on the data you're
> storing.
>
> David

Re: ANALYZE on partitioned tables vs on individual partitions

From

David Rowley

Date:

07 August 2024, 08:09:39

On Wed, 7 Aug 2024 at 19:20, Michael Harris <harmic@gmail.com> wrote:
> I found that running an ANALYZE specifying only those 4 columns only took
> 5 minutes, compared to the 30 minutes for the whole table.
>
> That was a bit of a surprise as I imagined actually reading the table would take
> most of the time and would be the same regardless of the number of columns
> being analyzed, but I guess that is wrong.

ANALYZE does do sampling of the data in the table.  It would only read
all of the rows for fairly small tables.  The docs in [1] mention
this:

"For large tables, ANALYZE takes a random sample of the table
contents, rather than examining every row. This allows even very large
tables to be analyzed in a small amount of time."

I think the reason it's taking so long is not because of it performing
ANALYZE on the partitioned table which results in gathering statistics
for the partitioned table which means proportionately (based on the
size of the partition) sampling rows from each partition, it's more
likely due to the fact that each partition is also analysed and the
statistics for each of those is updated. There is no "ANALYZE ONLY"
command similar to "FROM ONLY" in SELECT queries.

You could probably do some sampling of the pg_stat_progress_analyze
view to figure out what's taking the most time. If you find that the
majority of the time is spent analysing the partitions and not the
partitioned table then maybe we should expand ANALYZE to add the ONLY
option...

David

[1] https://www.postgresql.org/docs/current/sql-analyze.html
[2] https://www.postgresql.org/docs/current/progress-reporting.html

Re: ANALYZE on partitioned tables vs on individual partitions

From

Michael Harris

Date:

08 August 2024, 01:02:45

> You could probably do some sampling of the pg_stat_progress_analyze
> view to figure out what's taking the most time.

I did another run, sampling the pg_stat_progress_analyze every 30s.

For the first 4 minutes it was working on the partitioned table.

After that it began analyzing all the partitions, which took approx 29 minutes.

I think you are correct - an ONLY option for ANALYZE would be a huge
benefit. In my use case, the autovacuum processes are keeping the partitions
analyzed so there would seem to be little benefit to including them in
the manual
table level ANALYZE.

Regards
Mike

On Wed, 7 Aug 2024 at 18:09, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Wed, 7 Aug 2024 at 19:20, Michael Harris <harmic@gmail.com> wrote:
> > I found that running an ANALYZE specifying only those 4 columns only took
> > 5 minutes, compared to the 30 minutes for the whole table.
> >
> > That was a bit of a surprise as I imagined actually reading the table would take
> > most of the time and would be the same regardless of the number of columns
> > being analyzed, but I guess that is wrong.
>
> ANALYZE does do sampling of the data in the table.  It would only read
> all of the rows for fairly small tables.  The docs in [1] mention
> this:
>
> "For large tables, ANALYZE takes a random sample of the table
> contents, rather than examining every row. This allows even very large
> tables to be analyzed in a small amount of time."
>
> I think the reason it's taking so long is not because of it performing
> ANALYZE on the partitioned table which results in gathering statistics
> for the partitioned table which means proportionately (based on the
> size of the partition) sampling rows from each partition, it's more
> likely due to the fact that each partition is also analysed and the
> statistics for each of those is updated. There is no "ANALYZE ONLY"
> command similar to "FROM ONLY" in SELECT queries.
>
> You could probably do some sampling of the pg_stat_progress_analyze
> view to figure out what's taking the most time. If you find that the
> majority of the time is spent analysing the partitions and not the
> partitioned table then maybe we should expand ANALYZE to add the ONLY
> option...
>
> David
>
> [1] https://www.postgresql.org/docs/current/sql-analyze.html
> [2] https://www.postgresql.org/docs/current/progress-reporting.html