On Wed, Jan 18, 2023 at 2:22 PM Andres Freund <andres@anarazel.de> wrote:
> The problem with the change is here:
>
> /*
> * Okay, we've covered the corner cases. The normal calculation is to
> * convert the old measurement to a density (tuples per page), then
> * estimate the number of tuples in the unscanned pages using that figure,
> * and finally add on the number of tuples in the scanned pages.
> */
> old_density = old_rel_tuples / old_rel_pages;
> unscanned_pages = (double) total_pages - (double) scanned_pages;
> total_tuples = old_density * unscanned_pages + scanned_tuples;
> return floor(total_tuples + 0.5);
My assumption has always been that vac_estimate_reltuples() is prone
to issues like this because it just doesn't have access to very much
information each time it runs. It can only see the delta between what
VACUUM just saw, and what the last VACUUM (or possibly the last
ANALYZE) saw according to pg_class. You're always going to find
weaknesses in such a model if you go looking for them. You're always
going to find a way to salami slice your way from good information to
total nonsense, if you pick the right/wrong test case, which runs
VACUUM in a way that allows whatever bias there may be to accumulate.
It's sort of like the way floating point values can become very
inaccurate through a process that allows many small inaccuracies to
accumulate over time.
Maybe you're right to be concerned to the degree that you're concerned
-- I'm not sure. I'm just adding what I see as important context.
--
Peter Geoghegan