Re: Autovacuum on partitioned table (autoanalyze) - Mailing list pgsql-hackers

From yuzuko
Subject Re: Autovacuum on partitioned table (autoanalyze)
Date
Msg-id CAKkQ50-2RzSR7zuWgnePFYUcr3mzyHGkY+LD3Prf8aii29nzLA@mail.gmail.com
Whole thread Raw
In response to Re: Autovacuum on partitioned table (autoanalyze)  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: Autovacuum on partitioned table (autoanalyze)  (Daniel Gustafsson <daniel@yesql.se>)
List pgsql-hackers
Hello,

On Sat, Apr 18, 2020 at 2:08 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Fri, Apr 17, 2020 at 10:09:07PM +0900, Amit Langote wrote:
> > On Thu, Apr 16, 2020 at 11:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > On Thu, Apr 16, 2020 at 06:16:45PM +0900, yuzuko wrote:
> > > I don't think that adequately allows what's needed.
> ...(paragraph with my typos elided)...
> > > For example, say a new customer has bunch of partitioned tables which each
> > > currently have only one partition (for the current month), and that's expected
> > > to grow to at least 20+ partitions (2+ years of history).  How does one set the
> > > partitioned table's auto-analyze parameters to analyze whenever any child is
> > > analyzed ?  I don't think it should be needed to update it every month after
> > > computing sum(child tuples).
> > >
> > > Possibly you could allow that behavior for some special values of the
> > > threshold.  Like if autovacuum_analyze_threshold=-2, then analyze the parent
> > > whenever any of its children are analyzed.
> > >
> > > I think that use case and that need would be common, but I'd like to hear what
> > > others think.
> >
> > Having to constantly pay attention to whether a parent's
> > analyze_threshold/scale_factor is working as intended would surely be
> > an annoyance, so I tend to agree that we might need more than just the
> > ability to set analyze_threshold/scale_factor on parent tables.
> > However, I think we can at least start with being able to do
> > *something* here. :)  Maybe others think that this shouldn't be
> > considered committable until we figure out a good analyze threshold
> > calculation formula to apply to parent tables.
> >
> > Considering that, how about having, say, a
> > autovacuum_analyze_partition_parent_frequency, with string values
> > 'default', 'partition'? -- 'default' assumes the same formula as
> > regular tables, whereas with 'partition', parent is analyzed as soon
> > as a partition is.
>
> I assume you mean a reloption to be applied only to partitioned tables,
>
> Your "partition" setting would mean that the scale/threshold values would have
> no effect, which seems kind of unfortunate.
>
> I think it should be called something else, and done differently, like maybe:
> autovacuum_analyze_mode = {off,sum,max,...}
>
The above reloption you suggested will be applied all tables?
Users might not use it for partitions, so I think we should add "parent"
to reloption's name, like Amit's suggestion.

> The threshold would be threshold + scale*tuples, as always, but would be
> compared with f(changes) as determined by the relopt.
>
> sum(changes) would do what you called "default", comparing the sum(changes)
> across all partitions to the threshold, which is itself computed using
> sum(reltuples) AS reltuples.
>
> max(changes) would compute max(changes) compared to the threshold, and the
> threshold would be computed separately for each partition's reltuples:
> threshold_N = parent_threshold + parent_scale * part_N_tuples.  If *any*
> partition exceeds that threshold, the partition itself is analyzed.  This
> allows what I want for time-series.  Maybe this would have an alias called
> "any".
>
I may be wrong but I think the fomula,
> threshold_N = parent_threshold + parent_scale * part_N_tuples
would use orginary table's threshold, not parent's.  If it use parent_threshold,
parent might not be analyzed even if its any partition is analyzed when
parent_threshold is larger than normal threshold.  I'm worried that this case
meets requirements for time-series.

> I'm not sure if there's any other useful modes, like avg(changes)?  I guess we
> can add them later if someone thinks of a good use case.
>
> Also, for me, the v7 patch warns:
> |src/backend/postmaster/autovacuum.c:3117:70: warning: ‘reltuples’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
> |   vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor * reltuples;
> ..which seems to be a false positive, but easily avoided.
>
Thank you for testing the patch.
I got it.  I'll update the patch soon.

>
> This patch includes partitioned tables in pg_stat_*_tables, which is great; I
> complained awhile ago that they were missing [0].  It might be useful if that
> part was split out into a separate 0001 patch (?).
>
If partitioned table's statistics is used for other purposes,  I think
it would be
better to split the patch. Does anyone have any opinion?

---
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: DETACH PARTITION and FOR EACH ROW triggers on partitioned tables
Next
From: Justin Pryzby
Date:
Subject: Re: DETACH PARTITION and FOR EACH ROW triggers on partitioned tables