Re: Autovacuum on partitioned table (autoanalyze) - Mailing list pgsql-hackers

From yuzuko
Subject Re: Autovacuum on partitioned table (autoanalyze)
Date
Msg-id CAKkQ50_D=1LbYLD9qtWaKxPvRq3T_qGWfQov5eH5rycbMEq-pA@mail.gmail.com
Whole thread Raw
In response to Re: Autovacuum on partitioned table (autoanalyze)  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Autovacuum on partitioned table (autoanalyze)
List pgsql-hackers
 Horiguchi-san,

Thank you for reviewing.


On Tue, Sep 15, 2020 at 7:01 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Tue, 25 Aug 2020 14:28:20 +0200, Daniel Gustafsson <daniel@yesql.se> wrote in
> > > I attach the latest patch that solves the above Werror.
> > > Could you please check it again?
> >
> > This version now pass the tests in the Travis pipeline as can be seen in the
> > link below, and is ready to be reviewed in the upcoming commitfest:
> >
> >       http://cfbot.cputube.org/yuzuko-hosoya.html
>
> At Mon, 6 Jul 2020 19:35:37 +0900, yuzuko <yuzukohosoya@gmail.com> wrote in
> > I think there are other approaches like Tom's idea that Justin previously
> > referenced, but this patch works the same way as previous patches.
> > (tracks updated/inserted/deleted tuples and checks whether the partitioned
> > tables needs auto-analyze, same as nonpartitioned tables)
> > Because I wanted to be able to analyze partitioned tables by autovacuum
> > as a first step, and I think this approach is the simplest way to do it.
>
> I'm not sure if anything bad happen if parent and children are not
> agree on statistics.
>
> The requirement suggested here seems to be:
>
> - We want to update parent's stats when any of its children gets its
>   stats updated. This is curucial especially for time-series
>   partitioning.
>
> - However, we don't want analyze the whole-tree every time any of the
>   children was analyzed.
>
> To achieve the both, stats-merging seems to the optimal solution.
>
> Putting that aside, I had a brief look on the latest patch.
>
>         /* We only count stats for things that have storage */
> -       if (!RELKIND_HAS_STORAGE(relkind))
> +       if (!RELKIND_HAS_STORAGE(relkind) ||
> +               relkind == RELKIND_PARTITIONED_TABLE)
>         {
>                 rel->pgstat_info = NULL;
>
> RELKIND_HAS_STORAGE(RELKIND_PARTITIONED_TABLE) is already false.
> Maybe you wanted to do "&& relkind !=" instead:p
>
Oh, indeed.  I'll fix it.

>
> +               /*
> +                * If this relation is partitioned, we store all ancestors' oid
> +                * to propagate its changed_tuples to their parents when this
> +                * transaction is committed.
> +                */
> +               if (rel->rd_rel->relispartition && pgstat_info->ancestors == NULL)
>
> If the relation was detached then attached to another partition within
> a transaction, the ancestor list would get stale and the succeeding
> modification to the relation propagates into wrong ancestors.
>
> I think vacuum time is more appropriate to modify ancestors stats. It
> seems to me that what Alvalo pointed isthe list-order-susceptible
> manner of collecting children's modified tuples.
>
I proposed a patch that modified ancestors stats when vacuuming previously.
In that time, having been pointed out by Alvaro and Amit, I tried to update the
parents' changes_since_analyze in every ANALYZE.  However, in that case,
the problem mentioned in [1] occurred, but I could not find a way to avoid it.
I think that it can be solved by updating the parents' changes_since_analyze
only in the case of auto analyze, but what do you think?

>
> +               ? 0  /* partitioned tables don't have any data, so it's 0 */
>
> If the comment is true, we shouldn't have non-zero t_changed_tuples,
> too. I think the reason for the lines is something different.
>
Yes, surely.  I think updating the values of live_tuples and dead_tuples
is confusing for users.  I'll consider another comment.


[1] https://www.postgresql.org/message-id/CAKkQ50-bwFEDMBGb1JmDXffXsiU8xk-hN6kJK9CKjdBa7r%3DHdw%40mail.gmail.com
--
Best regards,
Yuzuko Hosoya


NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: SQL:2011 PERIODS vs Postgres Ranges?
Next
From: Daniel Gustafsson
Date:
Subject: Re: pgindent vs dtrace on macos