Thread: Why is INSERT-driven autovacuuming based on pg_class.reltuples?
Commit b07642dbcd ("Trigger autovacuum based on number of INSERTs") taught autovacuum to run in response to INSERTs, which is now typically the dominant factor that drives vacuuming for an append-only table -- a very useful feature, certainly. This is driven by the following logic from autovacuum.c: vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor * reltuples; ... /* Determine if this table needs vacuum or analyze. */ *dovacuum = force_vacuum || (vactuples > vacthresh) || (vac_ins_base_thresh >= 0 && instuples > vacinsthresh); I can see why the nearby, similar vacthresh and anlthresh variables (not shown here) are scaled based on pg_class.reltuples -- that makes sense. But why should we follow that example here, with vacinsthresh? Both VACUUM and ANALYZE update pg_class.reltuples. But this code seems to assume that it's only something that VACUUM can ever do. Why wouldn't we expect a plain ANALYZE to have actually been the last thing to update pg_class.reltuples for an append-only table? Wouldn't that lead to less frequent (perhaps infinitely less frequent) vacuuming for an append-only table, relative to the documented behavior of autovacuum_vacuum_insert_scale_factor? -- Peter Geoghegan
On Thu, Jan 27, 2022 at 12:20 PM Peter Geoghegan <pg@bowt.ie> wrote: > Both VACUUM and ANALYZE update pg_class.reltuples. But this code seems > to assume that it's only something that VACUUM can ever do. Why > wouldn't we expect a plain ANALYZE to have actually been the last > thing to update pg_class.reltuples for an append-only table? Wouldn't > that lead to less frequent (perhaps infinitely less frequent) > vacuuming for an append-only table, relative to the documented > behavior of autovacuum_vacuum_insert_scale_factor? PgStat_StatTabEntry.inserts_since_vacuum will continue to grow and grow as more tuples are inserted, until VACUUM actually runs, no matter what. That largely explains why this bug was missed before now: it's inevitable that inserts_since_vacuum will become large at some point -- even large relative to a bogus scaled pg_class.reltuples-at-ANALYZE threshold (unless ANALYZE hasn't been run since the last VACUUM, in which case pg_class.reltuples will be at the expected value anyway). And so we'll eventually get to the point where so many unvacuumed inserted tuples have accumulated that an insert-driven autovacuum still takes place. In practice these delayed insert-driven autovacuum operations will usually happen without *ludicrous* delay (relative to the documented behavior). Even still, the autovacuum schedule for append-only tables will often be quite wrong. (Anti-wraparound VACUUMs probably made the bug harder to notice as well, of course.) -- Peter Geoghegan
On Thu, 2022-01-27 at 12:20 -0800, Peter Geoghegan wrote: > I can see why the nearby, similar vacthresh and anlthresh variables > (not shown here) are scaled based on pg_class.reltuples -- that makes > sense. But why should we follow that example here, with vacinsthresh? What would you suggest instead? pg_stat_all_tables.n_live_tup? Yours, Laurenz Albe
On Thu, Jan 27, 2022 at 11:22 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > What would you suggest instead? pg_stat_all_tables.n_live_tup? I'm not sure, except that I assume that it'll have to come from the statistics collector, not from pg_class. I think that this bug stemmed from the fact that vac_update_relstats() is used by both VACUUM and ANALYZE to store information in pg_class, while both *also* use pgstat_report_vacuum()/pgstat_report_analyze() to store closely related information at the same point. There is rather a lot of implicit information here. I recall a few other bugs that also seemed related to this messiness with statistics and pg_class. -- Peter Geoghegan
On Thu, Jan 27, 2022 at 01:59:38PM -0800, Peter Geoghegan wrote: > On Thu, Jan 27, 2022 at 12:20 PM Peter Geoghegan <pg@bowt.ie> wrote: > > Both VACUUM and ANALYZE update pg_class.reltuples. But this code seems > > to assume that it's only something that VACUUM can ever do. Why > > wouldn't we expect a plain ANALYZE to have actually been the last > > thing to update pg_class.reltuples for an append-only table? Wouldn't > > that lead to less frequent (perhaps infinitely less frequent) > > vacuuming for an append-only table, relative to the documented > > behavior of autovacuum_vacuum_insert_scale_factor? > > PgStat_StatTabEntry.inserts_since_vacuum will continue to grow and > grow as more tuples are inserted, until VACUUM actually runs, no > matter what. That largely explains why this bug was missed before now: > it's inevitable that inserts_since_vacuum will become large at some > point -- even large relative to a bogus scaled > pg_class.reltuples-at-ANALYZE threshold (unless ANALYZE hasn't been > run since the last VACUUM, in which case pg_class.reltuples will be at > the expected value anyway). And so we'll eventually get to the point > where so many unvacuumed inserted tuples have accumulated that an > insert-driven autovacuum still takes place. Maybe I'm missed your point, but I think this may not rise to the level of being a "bug". If we just vacuumed an insert-only table, and then insert some more, and autoanalyze runs and updates reltuples, what's wrong with vac_ins_scale_factor * reltuples + vac_ins_base_thresh ? You're saying reltuples should be the number of tuples at the last vacuum instead of the most recent value from either vacuum or analyze ? It's true that the vacuum threshold may be hit later than if reltuples hadn't been updated by ANALYZE. If that's what you're referring to, that's the behavior of scale factor in general. If a table is growing in size at a constant rate, analyze will run at decreasing frequency. With the default 20% scale factor, it'll first run at 1.2x the table's initial size, then at 1.44 (not 1.4), then at 1.728 (not 1.6), then at 2.0736 (not 1.8). That's not necessarily desirable, but it's not necessarily wrong, either. If your table doubles in size, you might have to adjust these things. Maybe there should be another knob allowing perfect, "geometric" (or other) frequency, but the behavior is not new in this patch. We talked about that here. https://www.postgresql.org/message-id/flat/20200305172749.GK684%40telsasoft.com#edac59123843f9f8e1abbc2b570c76f1 With the default values, analyze happens after 10% growth, and vacuum happens after 20% (which ends up being 22% of the initial table size). The goal of this patch was to have inserts trigger autovacuum *at all*. This behavior may be subtle or non-ideal, but not a problem? The opposite thing could also happen - *vacuum* could update reltuples, causing the autoanalyze threshold to be hit a bit later. -- Justin
On Fri, 28 Jan 2022 at 09:20, Peter Geoghegan <pg@bowt.ie> wrote: > Both VACUUM and ANALYZE update pg_class.reltuples. But this code seems > to assume that it's only something that VACUUM can ever do. Like Justin I'm also not quite following what the problem is here. pg_class.reltuples is only used to estimate how many tuples the scale factor is likely to be. It does not matter if that was set by ANALYZE or VACUUM, it's simply an estimate. I quoted the text above as I get the idea that you've gotten the wrong end of the stick about how this works. reltuples is just used to estimate what the number of tuples for the insert threshold is based on the scale factor. It does not matter if that was estimated by VACUUM or ANALYZE. If ANALYZE runs and sets pg_class.reltuples to 1 million, then we insert 500k tuples, assuming a 0 vacuum_ins_threshold and a vacuum_ins_scale_factor of 0.2, then we'll want to perform a vacuum as "vac_ins_base_thresh + vac_ins_scale_factor * reltuples" will come out at 200k. auto-vacuum will then trigger and update reltuples hopefully to some value around 1.5 million, then next time it'll take 300k tuples to trigger an insert vacuum. I'm not quite following where the problem is with that. (Of course with the exception of the fact that ANALYZE and VACUUM have different methods how they decide what to set pg_class.reltuples to. That's not a new problem) David
On Mon, 31 Jan 2022 at 17:28, David Rowley <dgrowleyml@gmail.com> wrote: > If ANALYZE runs and sets pg_class.reltuples to 1 million, then we > insert 500k tuples, assuming a 0 vacuum_ins_threshold and a > vacuum_ins_scale_factor of 0.2, then we'll want to perform a vacuum as > "vac_ins_base_thresh + vac_ins_scale_factor * reltuples" will come out > at 200k. auto-vacuum will then trigger and update reltuples hopefully > to some value around 1.5 million, then next time it'll take 300k > tuples to trigger an insert vacuum. If we wanted a more current estimate for the number of tuples in a relation then we could use reltuples / relpages * RelationGetNumberOfBlocks(r). However, I still don't see why an INSERT driven auto-vacuums are a particularly special case. ANALYZE updating the reltuples estimate had an effect on when auto-vacuum would trigger for tables that generally grow in the number of live tuples but previously only (i.e before insert vacuums existed) received auto-vacuum attention due to UPDATEs/DELETEs. I suppose the question is, what is autovacuum_vacuum_scale_factor meant to represent? Our documents claim: > Specifies a fraction of the table size to add to autovacuum_vacuum_threshold when deciding whether to trigger a VACUUM.The default is 0.2 (20% of table size). This parameter can only be set in the postgresql.conf file or on the servercommand line; but the setting can be overridden for individual tables by changing table storage parameters. Nothing there seems to indicate the scale is based on the historical table size when the table was last vacuumed/analyzed, so you could claim that the 3 usages of relpages when deciding if the table should be vacuumed and/or analyzed are all wrong and should take into account RelationGetNumberOfBlocks too. I'm not planning on doing anything to change any of this unless I see some compelling argument that what's there is wrong. David
On Tue, Feb 1, 2022 at 3:02 PM David Rowley <dgrowleyml@gmail.com> wrote: > If we wanted a more current estimate for the number of tuples in a > relation then we could use reltuples / relpages * > RelationGetNumberOfBlocks(r). However, I still don't see why an > INSERT driven auto-vacuums are a particularly special case. ANALYZE > updating the reltuples estimate had an effect on when auto-vacuum > would trigger for tables that generally grow in the number of live > tuples but previously only (i.e before insert vacuums existed) > received auto-vacuum attention due to UPDATEs/DELETEs. Sorry for the delay in my response; I've been travelling. I admit that I jumped the gun here -- I now believe that it's operating as originally designed. And that the design is reasonable. It couldn't hurt to describe the design in a little more detail in the user docs, though. > Nothing there seems to indicate the scale is based on the historical > table size when the table was last vacuumed/analyzed, so you could > claim that the 3 usages of relpages when deciding if the table should > be vacuumed and/or analyzed are all wrong and should take into account > RelationGetNumberOfBlocks too. The route of my confusion might interest you. vacuumlazy.c's call to vac_update_relstats() provides relpages and reltuples values that are very accurate relative to the VACUUM operations view of things (relative to its OldestXmin, which uses the size of the table at the start of the VACUUM, not the end). When the vac_update_relstats() call is made, the same accurate-relative-toOldestXmin values might actually *already* be out of date relative to the physical table as it is at the end of the same VACUUM. The fact that these accurate values could be out of date like this is practically inevitable for a certain kind of table. But that might not matter at all, if they were later *interpreted* in a way that took this into account later on -- context matters. For some reason I made the leap to thinking that everybody else believed the same thing, too, and that the intention with the design of the insert-driven autovacuum stuff was to capture that. But that's just not the case, at all. I apologize for the confusion. BTW, this situation with the relstats only being accurate "relative to the last VACUUM's OldestXmin" is *especially* important with new_dead_tuples values, which are basically recently dead tuples relative to OldestXmin. In the common case where there are very few recently dead tuples, we have an incredibly distorted "sample" of dead tuples (it tells us much more about VACUUM implementation details than the truth of what's going on in the table). So that also recommends "relativistically interpreting" the values later on. This specific issue has even less to do with autovacuum_vacuum_scale_factor than the main point, of course. I agree with you that the insert-driven stuff isn't a special case. -- Peter Geoghegan