Re: Berserk Autovacuum (let's save next Mandrill) - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Berserk Autovacuum (let's save next Mandrill) |
Date | |
Msg-id | 20200320062031.uwagypenawujwajx@alap3.anarazel.de Whole thread Raw |
In response to | Re: Berserk Autovacuum (let's save next Mandrill) (Laurenz Albe <laurenz.albe@cybertec.at>) |
Responses |
Re: Berserk Autovacuum (let's save next Mandrill)
Re: Berserk Autovacuum (let's save next Mandrill) |
List | pgsql-hackers |
Hi, On 2020-03-19 06:45:48 +0100, Laurenz Albe wrote: > On Tue, 2020-03-17 at 18:02 -0700, Andres Freund wrote: > > I don't think a default scale factor of 0 is going to be ok. For > > large-ish tables this will basically cause permanent vacuums. And it'll > > sometimes trigger for tables that actually coped well so far. 10 million > > rows could be a few seconds, not more. > > > > I don't think that the argument that otherwise a table might not get > > vacuumed before autovacuum_freeze_max_age is convincing enough. > > > > a) if that's indeed the argument, we should increase the default > > autovacuum_freeze_max_age - now that there's insert triggered vacuums, > > the main argument against that from before isn't valid anymore. > > > > b) there's not really a good arguments for vacuuming more often than > > autovacuum_freeze_max_age for such tables. It'll not be not frequent > > enough to allow IOS for new data, and you're not preventing > > anti-wraparound vacuums from happening. > > According to my reckoning, that is the remaining objection to the patch > as it is (with ordinary freezing behavior). > > How about a scale_factor od 0.005? That will be high enough for large > tables, which seem to be the main concern here. > > I fully agree with your point a) - should that be part of the patch? > > I am not sure about b). In my mind, the objective is not to prevent > anti-wraparound vacuums, but to see that they have less work to do, > because previous autovacuum runs already have frozen anything older than > vacuum_freeze_min_age. So, assuming linear growth, the number of tuples > to freeze during any run would be at most one fourth of today's number > when we hit autovacuum_freeze_max_age. Based on two IM conversations I think it might be worth emphasizing how vacuum_cleanup_index_scale_factor works: For btree, even if there is not a single deleted tuple, we can *still* end up doing a full index scans at the end of vacuum. As the docs describe vacuum_cleanup_index_scale_factor: <para> Specifies the fraction of the total number of heap tuples counted in the previous statistics collection that can be inserted without incurring an index scan at the <command>VACUUM</command> cleanup stage. This setting currently applies to B-tree indexes only. </para> I.e. with the default settings we will perform a whole-index scan (without visibility map or such) after every 10% growth of the table. Which means that, even if the visibility map prevents repeated tables accesses, increasing the rate of vacuuming for insert-only tables can cause a lot more whole index scans. Which means that vacuuming an insert-only workload frequently *will* increase the total amount of IO, even if there is not a single dead tuple. Rather than just spreading the same amount of IO over more vacuums. And both gin and gist just always do a full index scan, regardless of vacuum_cleanup_index_scale_factor (either during a bulk delete, or during the cleanup). Thus more frequent vacuuming for insert-only tables can cause a *lot* of pain (even an approx quadratic increase of IO? O(increased_frequency * peak_index_size)?) if you have large indexes - which is very common for gin/gist. Is there something missing in the above description? Greetings, Andres Freund
pgsql-hackers by date: