Re: Autovacuum on partitioned table (autoanalyze) - Mailing list pgsql-hackers

From yuzuko
Subject Re: Autovacuum on partitioned table (autoanalyze)
Date
Msg-id CAKkQ508EzAG9ib8rRSNozuYhB1e6p_4FzwrG=YRTW2fENT2+BA@mail.gmail.com
Whole thread Raw
In response to Re: Autovacuum on partitioned table (autoanalyze)  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Autovacuum on partitioned table (autoanalyze)  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-hackers
Hi Tomas,

Thank you for reviewing the patch.

> Firstly, the patch propagates the changes_since_analyze values from
> do_analyze_rel, i.e. from the worker after it analyzes the relation.
> That may easily lead to cases with unnecessary analyzes - consider a
> partitioned with 4 child relations:
>  [ explanation ]
>
I didn't realize that till now.  Indeed, this approach increments parent's
changes_since_analyze counter according to its leaf partition's counter
when the leaf partition is analyzed, so it will cause unnecessary ANALYZE
on partitioned tables as you described.


> I propose a different approach - instead of propagating the counts in
> do_analyze_rel for individual leaf tables, let's do that in bulk in
> relation_needs_vacanalyze. Before the (existing) first pass over
> pg_class, we can add a new one, propagating counts from leaf tables to
> parents.
>
Thank you for your suggestion.  I think it could solve all the issues
you mentioned.  I modified the patch based on this approach:

- Create a new counter, PgStat_Counter changes_since_analyze_reported,
  to track changes_since_analyze we already propagated to ancestors.
  This is used for internal processing and users may not need to know it.
  So this counter is not displayed at pg_stat_all_tables view for now.

- Create a new function, pgstat_propagate_changes() which propagates
  changes_since_analyze counter to all ancestors and saves
  changes_since_analyze_reported.  This function is called in
  do_autovacuum() before relation_needs_vacanalyze().


> Note: I do have some ideas about how to improve that, I've started a
> separate thread about it [1].
>
I'm also interested in merging children's statistics for partitioned tables
because it will make ANALYZE on inheritance trees more efficient.
So I'll check it later.

> I forgot to mention one additional thing yesterday - I wonder if we need
> to do something similar after a partition is attached/detached. That can
> also change the parent's statistics significantly, so maybe we should
> handle all partition's rows as changes_since_analyze? Not necessarily
> something this patch has to handle, but might be related.
>
Regarding attached/detached partitions,  I think we should update statistics
of partitioned tables according to the new inheritance tree.  The latest patch
hasn't handled this case yet, but I'll give it a try soon.

Attach the v13 patch to this email.  Could you please check it again?

--
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center

Attachment

pgsql-hackers by date:

Previous
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: Hybrid Hash/Nested Loop joins and caching results from subplans
Next
From: Fujii Masao
Date:
Subject: Re: Failed assertion on standby while shutdown