On Thu, Jan 19, 2023 at 12:56 PM Andres Freund <andres@anarazel.de> wrote:
> > In other words, ANALYZE sometimes (but not always) produces wrong answers.
>
> For dead tuples, but not live tuples.
> > In other words, VACUUM sometimes (but not always) produces wrong answers.
>
> For live tuples, but not badly so for dead tuples.
Agreed. More generally, there is a need to think about the whole table
in some cases (like for regular optimizer statistics including
reltuples/live tuples), and the subset of pages that will be scanned
by VACUUM in other cases (for dead tuples accounting). Random sampling
at the table level is appropriate and works well enough for the
former, though not for the latter.
> We are, but perhaps not too badly so, because we can choose to believe analyze
> more for live tuples, and vacuum for dead tuples. Analyze doesn't compute
> reltuples incrementally and vacuum doesn't compute deadtuples incrementally.
Good points.
> But in contrast to dead_tuples, where I think we can just stop analyze from
> updating it unless we crashed recently, I do think we need to update reltuples
> in vacuum. So computing an accurate value seems like the least unreasonable
> thing I can see.
I agree, but there is no reasonable basis for treating scanned_pages
as a random sample, especially if it's only a small fraction of all of
rel_pages -- treating it as a random sample is completely
unjustifiable. And so it seems to me that the only thing that can be
done is to either make VACUUM behave somewhat like ANALYZE in at least
some cases, or to have it invoke ANALYZE directly (or indirectly) in
those same cases.
--
Peter Geoghegan