Re: Berserk Autovacuum (let's save next Mandrill) - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Berserk Autovacuum (let's save next Mandrill)
Date
Msg-id CA+fd4k4Xf3bf9Yj4wDOti7XwG91n5zL6YD9wSmkb=cjHeETvEQ@mail.gmail.com
Whole thread Raw
In response to Re: Berserk Autovacuum (let's save next Mandrill)  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Fri, 20 Mar 2020 at 15:20, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2020-03-19 06:45:48 +0100, Laurenz Albe wrote:
> > On Tue, 2020-03-17 at 18:02 -0700, Andres Freund wrote:
> > > I don't think a default scale factor of 0 is going to be ok. For
> > > large-ish tables this will basically cause permanent vacuums. And it'll
> > > sometimes trigger for tables that actually coped well so far. 10 million
> > > rows could be a few seconds, not more.
> > >
> > > I don't think that the argument that otherwise a table might not get
> > > vacuumed before autovacuum_freeze_max_age is convincing enough.
> > >
> > > a) if that's indeed the argument, we should increase the default
> > >   autovacuum_freeze_max_age - now that there's insert triggered vacuums,
> > >   the main argument against that from before isn't valid anymore.
> > >
> > > b) there's not really a good arguments for vacuuming more often than
> > >   autovacuum_freeze_max_age for such tables. It'll not be not frequent
> > >   enough to allow IOS for new data, and you're not preventing
> > >   anti-wraparound vacuums from happening.
> >
> > According to my reckoning, that is the remaining objection to the patch
> > as it is (with ordinary freezing behavior).
> >
> > How about a scale_factor od 0.005?  That will be high enough for large
> > tables, which seem to be the main concern here.
> >
> > I fully agree with your point a) - should that be part of the patch?
> >
> > I am not sure about b).  In my mind, the objective is not to prevent
> > anti-wraparound vacuums, but to see that they have less work to do,
> > because previous autovacuum runs already have frozen anything older than
> > vacuum_freeze_min_age.  So, assuming linear growth, the number of tuples
> > to freeze during any run would be at most one fourth of today's number
> > when we hit autovacuum_freeze_max_age.
>
> Based on two IM conversations I think it might be worth emphasizing how
> vacuum_cleanup_index_scale_factor works:
>
> For btree, even if there is not a single deleted tuple, we can *still*
> end up doing a full index scans at the end of vacuum. As the docs describe
> vacuum_cleanup_index_scale_factor:
>
>        <para>
>         Specifies the fraction of the total number of heap tuples counted in
>         the previous statistics collection that can be inserted without
>         incurring an index scan at the <command>VACUUM</command> cleanup stage.
>         This setting currently applies to B-tree indexes only.
>        </para>
>
> I.e. with the default settings we will perform a whole-index scan
> (without visibility map or such) after every 10% growth of the
> table. Which means that, even if the visibility map prevents repeated
> tables accesses, increasing the rate of vacuuming for insert-only tables
> can cause a lot more whole index scans.  Which means that vacuuming an
> insert-only workload frequently *will* increase the total amount of IO,
> even if there is not a single dead tuple. Rather than just spreading the
> same amount of IO over more vacuums.

Right.

>
> And both gin and gist just always do a full index scan, regardless of
> vacuum_cleanup_index_scale_factor (either during a bulk delete, or
> during the cleanup).  Thus more frequent vacuuming for insert-only
> tables can cause a *lot* of pain (even an approx quadratic increase of
> IO?  O(increased_frequency * peak_index_size)?) if you have large
> indexes - which is very common for gin/gist.

That's right but for gin, more frequent vacuuming for insert-only
tables can help to clean up the pending list, which increases search
speed and better than doing it by a backend process.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [PATCH] Add schema and table names to partition error
Next
From: Pavel Stehule
Date:
Subject: Re: proposal: schema variables