Home > mailing lists

Re: Berserk Autovacuum (let's save next Mandrill) - Mailing list pgsql-hackers

From	Laurenz Albe
Subject	Re: Berserk Autovacuum (let's save next Mandrill)
Date	March 20, 2020 13:43:20
Msg-id	1a5a8273535742114b776ab60b0cc0a48e5bfaa3.camel@cybertec.at Whole thread Raw
In response to	Re: Berserk Autovacuum (let's save next Mandrill) (Andres Freund <andres@anarazel.de>)
Responses	Re: Berserk Autovacuum (let's save next Mandrill)
List	pgsql-hackers

Tree view

On Thu, 2020-03-19 at 23:20 -0700, Andres Freund wrote:
> I am not sure about b).  In my mind, the objective is not to prevent
> > anti-wraparound vacuums, but to see that they have less work to do,
> > because previous autovacuum runs already have frozen anything older than
> > vacuum_freeze_min_age.  So, assuming linear growth, the number of tuples
> > to freeze during any run would be at most one fourth of today's number
> > when we hit autovacuum_freeze_max_age.
> 
> Based on two IM conversations I think it might be worth emphasizing how
> vacuum_cleanup_index_scale_factor works:
> 
> For btree, even if there is not a single deleted tuple, we can *still*
> end up doing a full index scans at the end of vacuum. As the docs describe
> vacuum_cleanup_index_scale_factor:
> 
>        <para>
>         Specifies the fraction of the total number of heap tuples counted in
>         the previous statistics collection that can be inserted without
>         incurring an index scan at the <command>VACUUM</command> cleanup stage.
>         This setting currently applies to B-tree indexes only.
>        </para>
> 
> I.e. with the default settings we will perform a whole-index scan
> (without visibility map or such) after every 10% growth of the
> table. Which means that, even if the visibility map prevents repeated
> tables accesses, increasing the rate of vacuuming for insert-only tables
> can cause a lot more whole index scans.  Which means that vacuuming an
> insert-only workload frequently *will* increase the total amount of IO,
> even if there is not a single dead tuple. Rather than just spreading the
> same amount of IO over more vacuums.
> 
> And both gin and gist just always do a full index scan, regardless of
> vacuum_cleanup_index_scale_factor (either during a bulk delete, or
> during the cleanup).  Thus more frequent vacuuming for insert-only
> tables can cause a *lot* of pain (even an approx quadratic increase of
> IO?  O(increased_frequency * peak_index_size)?) if you have large
> indexes - which is very common for gin/gist.

Ok, ok.  Thanks for the explanation.

In the light of that, I agree that we should increase the scale_factor.

Yours,
Laurenz Albe

pgsql-hackers by date:

From: Pengzhou Tang
Date: 20 March 2020, 11:57:02
Subject: Re: Parallel grouping sets

From: Julien Rouhaud
Date: 20 March 2020, 13:52:33
Subject: Re: Collation versioning

Re: Berserk Autovacuum (let's save next Mandrill) - Mailing list pgsql-hackers

Previous

Next