Re: Why is INSERT-driven autovacuuming based on pg_class.reltuples? - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Why is INSERT-driven autovacuuming based on pg_class.reltuples? |
Date | |
Msg-id | CAH2-WznkSQ3HsCi1qn3-u6F551Y6CVypT2v950v-CTt0bqwBxg@mail.gmail.com Whole thread Raw |
In response to | Re: Why is INSERT-driven autovacuuming based on pg_class.reltuples? (David Rowley <dgrowleyml@gmail.com>) |
List | pgsql-hackers |
On Tue, Feb 1, 2022 at 3:02 PM David Rowley <dgrowleyml@gmail.com> wrote: > If we wanted a more current estimate for the number of tuples in a > relation then we could use reltuples / relpages * > RelationGetNumberOfBlocks(r). However, I still don't see why an > INSERT driven auto-vacuums are a particularly special case. ANALYZE > updating the reltuples estimate had an effect on when auto-vacuum > would trigger for tables that generally grow in the number of live > tuples but previously only (i.e before insert vacuums existed) > received auto-vacuum attention due to UPDATEs/DELETEs. Sorry for the delay in my response; I've been travelling. I admit that I jumped the gun here -- I now believe that it's operating as originally designed. And that the design is reasonable. It couldn't hurt to describe the design in a little more detail in the user docs, though. > Nothing there seems to indicate the scale is based on the historical > table size when the table was last vacuumed/analyzed, so you could > claim that the 3 usages of relpages when deciding if the table should > be vacuumed and/or analyzed are all wrong and should take into account > RelationGetNumberOfBlocks too. The route of my confusion might interest you. vacuumlazy.c's call to vac_update_relstats() provides relpages and reltuples values that are very accurate relative to the VACUUM operations view of things (relative to its OldestXmin, which uses the size of the table at the start of the VACUUM, not the end). When the vac_update_relstats() call is made, the same accurate-relative-toOldestXmin values might actually *already* be out of date relative to the physical table as it is at the end of the same VACUUM. The fact that these accurate values could be out of date like this is practically inevitable for a certain kind of table. But that might not matter at all, if they were later *interpreted* in a way that took this into account later on -- context matters. For some reason I made the leap to thinking that everybody else believed the same thing, too, and that the intention with the design of the insert-driven autovacuum stuff was to capture that. But that's just not the case, at all. I apologize for the confusion. BTW, this situation with the relstats only being accurate "relative to the last VACUUM's OldestXmin" is *especially* important with new_dead_tuples values, which are basically recently dead tuples relative to OldestXmin. In the common case where there are very few recently dead tuples, we have an incredibly distorted "sample" of dead tuples (it tells us much more about VACUUM implementation details than the truth of what's going on in the table). So that also recommends "relativistically interpreting" the values later on. This specific issue has even less to do with autovacuum_vacuum_scale_factor than the main point, of course. I agree with you that the insert-driven stuff isn't a special case. -- Peter Geoghegan
pgsql-hackers by date: