Hello Alvaro,
On Thu, Dec 3, 2020 at 10:28 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> Hello Yuzuko,
>
> On 2020-Dec-02, yuzuko wrote:
>
> > The problem Horiguchi-san mentioned is as follows:
> > [explanation]
>
> Hmm, I see. So the problem is that if some ancestor is analyzed first,
> then analyze of one of its partition will cause a redundant analyze of
> the ancestor, because the number of tuples that is propagated from the
> partition represents a set that had already been included in the
> ancestor's analysis.
>
> If the problem was just that, then I think it would be very simple to
> solve: just make sure to sort the tables to vacuum so that all leaves
> are vacuumed first, and then all ancestors, sorted from the bottom up.
> Problem solved.
>
Indeed. When discussed with Horiguchi-san before, He mentioned
the same way:
> So, to propagate the count properly, we need to analyze relations
> leaf-to-root order, or propagate the counter only to anscestors that
> haven't been processed in the current iteration. It seems a bit too
> complex to sort analyze relations in that order.
but we didn't select it because of its complexity as you also said.
> But I'm not sure that that's the whole story, for two reasons: one, two
> workers can run simultaneously, where one analyzes the partition and the
> other analyzes the ancestor. Then the order is not guaranteed (and
> each process will get no effect from remembering whether it did that one
> or not). Second, manual analyzes can occur in any order.
>
> Maybe it's more useful to think about this in terms of rememebering that
> partition P had changed_tuples set to N when we analyzed ancestor A.
> Then, when we analyze partition P, we send the message listing A as
> ancestor; on receipt of that message, we see M+N changed tuples in P,
> but we know that we had already seen N, so we only record M.
>
> I'm not sure how to implement this idea however, since on analyze of
> ancestor A we don't have the list of partitions, so we can't know the N
> for each partition.
>
I thought about it for a while, but I can't come up with how to implement it.
And also I think the other way Horiguchi-san suggested in [1] would be
more simple to solve the problem we are facing.
Attach the new patch based on his patch. What do you think?
[1] https://www.postgresql.org/message-id/20201110.203557.1420746510378864931.horikyota.ntt%40gmail.com
--
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center