Hi,
On 2023-01-18 17:00:48 -0800, Peter Geoghegan wrote:
> On Wed, Jan 18, 2023 at 4:37 PM Andres Freund <andres@anarazel.de> wrote:
> > I can, it should be just about trivial code-wise. A bit queasy about trying to
> > forsee the potential consequences.
>
> That's always going to be true, though.
>
> > A somewhat related issue is that pgstat_report_vacuum() sets dead_tuples to
> > what VACUUM itself observed, ignoring any concurrently reported dead
> > tuples. As far as I can tell, when vacuum takes a long time, that can lead to
> > severely under-accounting dead tuples.
>
> Did I not mention that one? There are so many that it can be hard to
> keep track! That's why I catalog them.
I don't recall you doing, but there's lot of emails and holes in my head.
> This creates an awkward but logical question, though: what if
> dead_tuples doesn't go down at all? What if VACUUM actually has to
> increase it, because VACUUM runs so slowly relative to the workload?
Sure, that can happen - but it's not made better by having wrong stats :)
> > I do think this is an argument for splitting up dead_tuples into separate
> > "components" that we track differently. I.e. tracking the number of dead
> > items, not-yet-removable rows, and the number of dead tuples reported from DML
> > statements via pgstats.
>
> Is it? Why?
We have reasonably sophisticated accounting in pgstats what newly live/dead
rows a transaction "creates". So an obvious (and wrong) idea is just decrement
reltuples by the number of tuples removed by autovacuum. But we can't do that,
because inserted/deleted tuples reported by backends can be removed by
on-access pruning and vacuumlazy doesn't know about all changes made by its
call to heap_page_prune().
But I think that if we add a
pgstat_count_heap_prune(nredirected, ndead, nunused)
around heap_page_prune() and a
pgstat_count_heap_vacuum(nunused)
in lazy_vacuum_heap_page(), we'd likely end up with a better approximation
than what vac_estimate_reltuples() does, in the "partially scanned" case.
> I'm all in favor of doing that, of course. I just don't particularly
> think that it's related to this other problem. One problem is that we
> count dead tuples incorrectly because we don't account for the fact
> that things change while VACUUM runs. The other problem is that the
> thing that is counted isn't broken down into distinct subcategories of
> things -- things are bunched together that shouldn't be.
If we only adjust the counters incrementally, as we go, we'd not update them
at the end of vacuum. I think it'd be a lot easier to only update the counters
incrementally if we split ->dead_tuples into sub-counters.
So I don't think it's entirely unrelated.
You probably could get close without splitting the counters, by just pushing
down the counting, and only counting redirected and unused during heap
pruning. But I think it's likely to be more accurate with the split counter.
> Oh wait, you were thinking of what I said before -- my "awkward but
> logical question". Is that it?
I'm not quite following? The "awkward but logical" bit is in the email I'm
just replying to, right?
Greetings,
Andres Freund