Home > mailing lists

Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically? - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
Date	December 7, 2021 05:11:01
Msg-id	CA+TgmoZgyUBGXR6VszPJcx5dcdYvd6biOQbBeW4FhGGJs0JbDQ@mail.gmail.com Whole thread Raw
In response to	Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically? (Peter Geoghegan <pg@bowt.ie>)
Responses	Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically? (Peter Geoghegan <pg@bowt.ie>)
List	pgsql-hackers

Tree view

On Mon, Dec 6, 2021 at 8:14 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Suppose we believe that not-all-visible pages have 20 LP_DEAD items on
> average, and they turn out to only have 3 or 5. Theoretically we've
> done the wrong thing by launching autovacuum workers sooner -- we
> introduce bias. But we also have lower variance over time, which might
> make it worth it. I also think that it might not really matter at all.
> It's no great tragedy if we clean up and set pages all-visible in the
> visibility map a little earlier on average. It might even be a
> positive thing.

This doesn't seem convincing. Launching autovacuum too soon surely has
costs that someone might not want to pay. Clearly in the degenerate
case where we always autovacuum every table every time an autovacuum
worker is launched, we have gone insane. So arbitrarily large moves in
that direction can't be viewed as unproblematic.

> The fact that the user expresses the dead-tuple-wise threshold using
> autovacuum_vacuum_scale_factor is already somewhat arbitrary -- it is
> based on some pretty iffy assumptions. Even if we greatly overestimate
> dead tuples with the new algorithm, we're only doing so under
> circumstances that might have caused
> autovacuum_vacuum_insert_scale_factor to launch an autovacuum worker
> anyway. Just setting the visibility map bit has considerable value.

Now, on the other hand, I *most definitely* think
autovacuum_vacuum_scale_factor is hogwash. Everything I've seen
indicates that, while you do want to wait for a larger number of dead
tuples in a large table than in a small one, it's sublinear. I don't
know whether it's proportional to sqrt(table_size) or table_size^(1/3)
or lg(table_size) or table_size^(0.729837166538), but just plain old
table_size is definitely not it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Masahiko Sawada
Date: 07 December 2021, 04:24:14
Subject: Re: parallel vacuum comments

From: Amit Kapila
Date: 07 December 2021, 05:36:12
Subject: Re: row filtering for logical replication

Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically? - Mailing list pgsql-hackers

Previous

Next