Re: Why is INSERT-driven autovacuuming based on pg_class.reltuples? - Mailing list pgsql-hackers

From David Rowley
Subject Re: Why is INSERT-driven autovacuuming based on pg_class.reltuples?
Date
Msg-id CAApHDvqaEpwzchs5=k4a38WTySfqdqF2WZCqW8Kz4R00ScSFwg@mail.gmail.com
Whole thread Raw
In response to Re: Why is INSERT-driven autovacuuming based on pg_class.reltuples?  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Why is INSERT-driven autovacuuming based on pg_class.reltuples?  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Mon, 31 Jan 2022 at 17:28, David Rowley <dgrowleyml@gmail.com> wrote:
> If ANALYZE runs and sets pg_class.reltuples to 1 million, then we
> insert 500k tuples, assuming a 0 vacuum_ins_threshold and a
> vacuum_ins_scale_factor of 0.2, then we'll want to perform a vacuum as
> "vac_ins_base_thresh + vac_ins_scale_factor * reltuples" will come out
> at 200k.  auto-vacuum will then trigger and update reltuples hopefully
> to some value around 1.5 million, then next time it'll take 300k
> tuples to trigger an insert vacuum.

If we wanted a more current estimate for the number of tuples in a
relation then we could use reltuples / relpages *
RelationGetNumberOfBlocks(r).  However, I still don't see why an
INSERT driven auto-vacuums are a particularly special case. ANALYZE
updating the reltuples estimate had an effect on when auto-vacuum
would trigger for tables that generally grow in the number of live
tuples but previously only (i.e before insert vacuums existed)
received auto-vacuum attention due to UPDATEs/DELETEs.

I suppose the question is, what is autovacuum_vacuum_scale_factor
meant to represent?  Our documents claim:

> Specifies a fraction of the table size to add to autovacuum_vacuum_threshold when deciding whether to trigger a
VACUUM.The default is 0.2 (20% of table size). This parameter can only be set in the postgresql.conf file or on the
servercommand line; but the setting can be overridden for individual tables by changing table storage parameters. 

Nothing there seems to indicate the scale is based on the historical
table size when the table was last vacuumed/analyzed, so you could
claim that the 3 usages of relpages when deciding if the table should
be vacuumed and/or analyzed are all wrong and should take into account
RelationGetNumberOfBlocks too.

I'm not planning on doing anything to change any of this unless I see
some compelling argument that what's there is wrong.

David



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Deparsing rewritten query
Next
From: Andres Freund
Date:
Subject: Re: Latest LLVM breaks our code again