Re: Berserk Autovacuum (let's save next Mandrill) - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Berserk Autovacuum (let's save next Mandrill) |
Date | |
Msg-id | 20200319221703.q5ik7azljb5rjeaz@alap3.anarazel.de Whole thread Raw |
In response to | Re: Berserk Autovacuum (let's save next Mandrill) (Laurenz Albe <laurenz.albe@cybertec.at>) |
Responses |
Re: Berserk Autovacuum (let's save next Mandrill)
Re: Berserk Autovacuum (let's save next Mandrill) |
List | pgsql-hackers |
Hi, On 2020-03-19 20:47:40 +0100, Laurenz Albe wrote: > On Thu, 2020-03-19 at 21:39 +1300, David Rowley wrote: > > I've attached a small fix which I'd like to apply to your v8 patch. > > With that, and pending one final look, I'd like to push this during my > > Monday (New Zealand time). So if anyone strongly objects to that, > > please state their case before then. I am doubtful it should be committed with the current settings. See below. > From 3ba4b572d82969bbb2af787d1bccc72f417ad3a0 Mon Sep 17 00:00:00 2001 > From: Laurenz Albe <laurenz.albe@cybertec.at> > Date: Thu, 19 Mar 2020 20:26:43 +0100 > Subject: [PATCH] Autovacuum tables that have received only inserts > > Add "autovacuum_vacuum_insert_threshold" and > "autovacuum_vacuum_insert_scale_factor" GUC and reloption. > The default value for the threshold is 10000000; > the scale factor defaults to 0.01. > > Any table that has received more inserts since it was > last vacuumed (and that is not vacuumed for another > reason) will be autovacuumed. > > This avoids the known problem that insert-only tables > are never autovacuumed until they need to have their > anti-wraparound autovacuum, which then can be massive > and disruptive. Shouldn't this also mention index only scans? IMO that's at least as big a problem as the "large vacuum" problem. > + <varlistentry id="guc-autovacuum-vacuum-insert-threshold" xreflabel="autovacuum_vacuum_insert_threshold"> > + <term><varname>autovacuum_vacuum_insert_threshold</varname> (<type>integer</type>) > + <indexterm> > + <primary><varname>autovacuum_vacuum_insert_threshold</varname></primary> > + <secondary>configuration parameter</secondary> > + </indexterm> > + </term> > + <listitem> > + <para> > + Specifies the number of inserted tuples needed to trigger a > + <command>VACUUM</command> in any one table. > + The default is 10000000 tuples. > + This parameter can only be set in the <filename>postgresql.conf</filename> > + file or on the server command line; > + but the setting can be overridden for individual tables by > + changing table storage parameters. > + </para> > + </listitem> > + </varlistentry> > + > <varlistentry id="guc-autovacuum-analyze-threshold" xreflabel="autovacuum_analyze_threshold"> > <term><varname>autovacuum_analyze_threshold</varname> (<type>integer</type>) > <indexterm> > @@ -7342,6 +7362,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; > </listitem> > </varlistentry> > > + <varlistentry id="guc-autovacuum-vacuum-insert-scale-factor" xreflabel="autovacuum_vacuum_insert_scale_factor"> > + <term><varname>autovacuum_vacuum_insert_scale_factor</varname> (<type>floating point</type>) > + <indexterm> > + <primary><varname>autovacuum_vacuum_insert_scale_factor</varname></primary> > + <secondary>configuration parameter</secondary> > + </indexterm> > + </term> > + <listitem> > + <para> > + Specifies a fraction of the table size to add to > + <varname>autovacuum_vacuum_insert_threshold</varname> > + when deciding whether to trigger a <command>VACUUM</command>. > + The default is 0.01 (1% of table size). > + This parameter can only be set in the <filename>postgresql.conf</filename> > + file or on the server command line; > + but the setting can be overridden for individual tables by > + changing table storage parameters. > + </para> > + </listitem> > + </varlistentry> > + I am *VERY* doubtful that the attempt of using a large threshold, and a tiny scale factor, is going to work out well. I'm not confident enough in my gut feeling to full throatedly object, but confident enough that I'd immediately change it on any important database I operated. Independent of how large a constant you set the threshold to, for databases with substantially bigger tables this will lead to [near] constant vacuuming. As soon as you hit 1 billion rows - which isn't actually that much - this is equivalent to setting autovacuum_{vacuum,analyze}_scale_factor to 0.01. There's cases where that can be a sensible setting, but I don't think anybody would suggest it as a default. After thinking about it for a while, I think it's fundamentally flawed to use large constant thresholds to avoid unnecessary vacuums. It's easy to see cases where it's bad for common databases of today, but it'll be much worse a few years down the line where common table sizes have grown by a magnitude or two. Nor do they address the difference between tables of a certain size with e.g. 2kb wide rows, and a same sized table with 28 byte wide rows. The point of constant thresholds imo can only be to avoid unnecessary work at the *small* (even tiny) end, not the opposite. I think there's too much "reinventing" autovacuum scheduling in a "local" insert-only manner happening in this thread. And as far as I can tell additionally only looking at a somewhat narrow slice of insert only workloads. I, again, strongly suggest using much more conservative values here. And then try to address the shortcomings - like not freezing aggressively enough - in separate patches (and by now separate releases, in all likelihood). This will have a huge impact on a lot of postgres installations. Autovacuum already is perceived as one of the biggest issues around postgres. If the ratio of cases where these changes improve things to the cases it regresses isn't huge, it'll be painful (silent improvements are obviously less noticed than breakages). Greetings, Andres Freund
pgsql-hackers by date: