On Wed, Jan 18, 2023 at 5:49 PM Andres Freund <andres@anarazel.de> wrote:
> On 2023-01-18 15:28:19 -0800, Peter Geoghegan wrote:
> > Perhaps we should make vac_estimate_reltuples focus on the pages that
> > VACUUM newly set all-visible each time (not including all-visible
> > pages that got scanned despite being all-visible) -- only that subset
> > of scanned_pages seems to be truly relevant. That way you wouldn't be
> > able to do stuff like this. We'd have to start explicitly tracking the
> > number of pages that were newly set in the VM in vacuumlazy.c to be
> > able to do that, but that seems like a good idea anyway.
>
> Can you explain a bit more what you mean with "focus on the pages" means?
We don't say anything about pages we didn't scan right now -- only
scanned_pages have new information, so we just extrapolate. Why not go
even further than that, by only saying something about the pages that
were both scanned and newly set all-visible?
Under this scheme, the pages that were scanned but couldn't be newly
set all-visible are treated just like the pages that weren't scanned
at all -- they get a generic estimate from the existing reltuples.
> I don't think it's hard to see this causing problems. Set
> autovacuum_vacuum_scale_factor to something smaller than 2% or somewhat
> frequently vacuum manually. Incrementally delete old data. Unless analyze
> saves you - which might not be run or might have a different scale factor or
> not be run manually - reltuples will stay exactly the same, despite data
> changing substantially.
You seem to be saying that it's a problem if we don't update reltuples
-- an estimate -- when less than 2% of the table is scanned by VACUUM.
But why? Why can't we just do nothing sometimes? I mean in general,
leaving aside the heuristics I came up with for a moment?
It will cause problems if we remove the heuristics. Much less
theoretical problems. What about those?
How often does VACUUM scan so few pages, anyway? We've been talking
about how ineffective autovacuum_vacuum_scale_factor is, at great
length, but now you're saying that it *can* meaningfully trigger not
just one VACUUM, but many VACUUMs, where no more than 2% of rel_pages
are not all-visible (pages, not tuples)? Not just once, mind you, but
many times? And in the presence of some kind of highly variable tuple
size, where it actually could matter to the planner at some point?
I would be willing to just avoid even these theoretical problems if
there was some way to do so, that didn't also create new problems. I
have my doubts that that is possible, within the constraints of
updating pg_class. Or the present constraints, at least. I am not a
miracle worker -- I can only do so much with the information that's
available to vac_update_relstats (and/or the information that can
easily be made available).
--
Peter Geoghegan