Re: Autovacuum on partitioned table (autoanalyze) - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Autovacuum on partitioned table (autoanalyze)
Date
Msg-id CA+HiwqGggH4JXyQTpUvSROhsi6SSK=O1ZEY1khgTqahLwmuE+g@mail.gmail.com
Whole thread Raw
In response to Re: Autovacuum on partitioned table (autoanalyze)  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: Autovacuum on partitioned table (autoanalyze)  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On Thu, Apr 16, 2020 at 11:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Thu, Apr 16, 2020 at 06:16:45PM +0900, yuzuko wrote:
> > The latest patch lets users set different autovacuum configuration for
> > each partitioned
> > tables like this,
> >   create table p3(i int) partition by range(i) with
> >    (autovacuum_analyze_scale_factor=0.0005, autovacuum_analyze_threshold=100);
> > so users can configure those parameters according to partitioning strategies
> > and other requirements.
> >
> > So I think this patch can solve problem you mentioned.
>
> I don't think that adequately allows what's needed.
>
> I think it out to be possible to get the "analyze parent whenever a child is
> analyzed" behavior easily, without having to compute new thershold parameters
> every time one adds partitions, detaches partitions, loades 10x more data into
> one of the partitions, load only 10% as much data into the latest partition,
> etc.
>
> For example, say a new customer has bunch of partitioned tables which each
> currently have only one partition (for the current month), and that's expected
> to grow to at least 20+ partitions (2+ years of history).  How does one set the
> partitioned table's auto-analyze parameters to analyze whenever any child is
> analyzed ?  I don't think it should be needed to update it every month after
> computing sum(child tuples).
>
> Possibly you could allow that behavior for some special values of the
> threshold.  Like if autovacuum_analyze_threshold=-2, then analyze the parent
> whenever any of its children are analyzed.
>
> I think that use case and that need would be common, but I'd like to hear what
> others think.

Having to constantly pay attention to whether a parent's
analyze_threshold/scale_factor is working as intended would surely be
an annoyance, so I tend to agree that we might need more than just the
ability to set analyze_threshold/scale_factor on parent tables.
However, I think we can at least start with being able to do
*something* here. :)  Maybe others think that this shouldn't be
considered committable until we figure out a good analyze threshold
calculation formula to apply to parent tables.

For the cases in which parent's tuple count grows at about the same
rate as partitions (hash mainly), I guess the existing formula more or
less works. That is, we can set the parent's threshold/scale_factor
same as partitions' and the autovacuum's existing formula will ensure
that the parent is auto-analyzed in time and not more than needed. For
time-series partitioning, the same formula won't work, as you have
detailed in your comments. Is there any other partitioning pattern for
which the current formula won't work?

Considering that, how about having, say, a
autovacuum_analyze_partition_parent_frequency, with string values
'default', 'partition'? -- 'default' assumes the same formula as
regular tables, whereas with 'partition', parent is analyzed as soon
as a partition is.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: wrong relkind error messages
Next
From: Peter Eisentraut
Date:
Subject: Re: WAL usage calculation patch