Re: Autovacuum on partitioned table - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: Autovacuum on partitioned table |
Date | |
Msg-id | CA+HiwqGgC8O=8qRtZOGFLY-=Z-6ERwtHtxY1u9X9dAsmrAdjJQ@mail.gmail.com Whole thread Raw |
In response to | Re: Autovacuum on partitioned table (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>) |
Responses |
Re: Autovacuum on partitioned table
(Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
|
List | pgsql-hackers |
Hello, On Fri, Dec 27, 2019 at 2:02 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > On Fri, 27 Dec 2019 at 12:37, yuzuko <yuzukohosoya@gmail.com> wrote: > > As Laurenz commented in this thread, I tried adding option > > to update parent's statistics during Autovacuum. To do that, > > I propose supporting 'autovacuum_enabled' option already > > exists on partitioned tables. > > > > In the attached patch, you can use 'autovacuum_enabled' option > > on partitioned table as usual, that is, a default value of this option > > is true. So if you don't need autovacuum on a partitioned table, > > you have to specify the option: > > CREATE TABLE p(i int) partition by range(i) with (autovacuum_enabled=0); > > > > I'm not sure but I wonder if a suitable value as a default of > > 'autovacuum_enabled' for partitioned tables might be false. > > Because autovacuum on *partitioned tables* requires scanning > > all children to make partitioned tables' statistics. > > But if the default value varies according to the relation, > > is it confusing? Any thoughts? > > I don't look at the patch deeply yet but your patch seems to attempt > to vacuum on partitioned table. IIUC partitioned tables don't need to > be vacuumed and its all child tables are vacuumed instead if we pass > the partitioned table to vacuum() function. But autovacuum on child > tables is normally triggered since their statistics are updated. > > I think it's a good idea to have that option but I think that doing > autovacuum on the parent table every time when autovacuum is triggered > on one of its child tables is very high cost especially when there are > a lot of child tables. Instead I thought it's more straight forward if > we compare the summation of the statistics of child tables (e.g. > n_live_tuples, n_dead_tuples etc) to vacuum thresholds when we > consider the needs of autovacuum on the parent table. What do you > think? There's this old email where Tom outlines a few ideas about triggering auto-analyze on inheritance trees: https://www.postgresql.org/message-id/4823.1262132964%40sss.pgh.pa.us If I'm reading that correctly, the idea is to track only changes_since_analyze and none of the finer-grained stats like live/dead tuples for inheritance parents (partitioned tables) using some new pgstat infrastrcture, an idea that Hosoya-san also seems to be considering per an off-list discussion. Besides the complexity of getting that infrastructure in place, an important question is whether the current system of applying threshold and scale factor to changes_since_analyze should be used as-is for inheritance parents (partitioned tables), because if users set those parameters similarly to for regular tables, autovacuum might analyze partitioned tables more than necessary. We'll either need a different formula, or some commentary in the documentation about how partitioned tables might need different setting, or maybe both. By the way, maybe I'm misunderstanding what Sawada-san wrote above, but the only missing piece seems to be a way to trigger an *analyze* on the parent tables -- to collect optimizer statistics for the inheritance trees -- not vacuum, for which the existing system seems enough. Thanks, Amit
pgsql-hackers by date: