Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
Date
Msg-id CA+TgmoZgyUBGXR6VszPJcx5dcdYvd6biOQbBeW4FhGGJs0JbDQ@mail.gmail.com
Whole thread Raw
In response to Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Mon, Dec 6, 2021 at 8:14 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Suppose we believe that not-all-visible pages have 20 LP_DEAD items on
> average, and they turn out to only have 3 or 5. Theoretically we've
> done the wrong thing by launching autovacuum workers sooner -- we
> introduce bias. But we also have lower variance over time, which might
> make it worth it. I also think that it might not really matter at all.
> It's no great tragedy if we clean up and set pages all-visible in the
> visibility map a little earlier on average. It might even be a
> positive thing.

This doesn't seem convincing. Launching autovacuum too soon surely has
costs that someone might not want to pay. Clearly in the degenerate
case where we always autovacuum every table every time an autovacuum
worker is launched, we have gone insane. So arbitrarily large moves in
that direction can't be viewed as unproblematic.

> The fact that the user expresses the dead-tuple-wise threshold using
> autovacuum_vacuum_scale_factor is already somewhat arbitrary -- it is
> based on some pretty iffy assumptions. Even if we greatly overestimate
> dead tuples with the new algorithm, we're only doing so under
> circumstances that might have caused
> autovacuum_vacuum_insert_scale_factor to launch an autovacuum worker
> anyway. Just setting the visibility map bit has considerable value.

Now, on the other hand, I *most definitely* think
autovacuum_vacuum_scale_factor is hogwash. Everything I've seen
indicates that, while you do want to wait for a larger number of dead
tuples in a large table than in a small one, it's sublinear. I don't
know whether it's proportional to sqrt(table_size) or table_size^(1/3)
or lg(table_size) or table_size^(0.729837166538), but just plain old
table_size is definitely not it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: parallel vacuum comments
Next
From: Amit Kapila
Date:
Subject: Re: row filtering for logical replication