Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date
Msg-id 20230119021053.xn7c5aczln5scen3@awork3.anarazel.de
Whole thread Raw
In response to Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Hi,

On 2023-01-18 17:00:48 -0800, Peter Geoghegan wrote:
> On Wed, Jan 18, 2023 at 4:37 PM Andres Freund <andres@anarazel.de> wrote:
> > I can, it should be just about trivial code-wise. A bit queasy about trying to
> > forsee the potential consequences.
> 
> That's always going to be true, though.
> 
> > A somewhat related issue is that pgstat_report_vacuum() sets dead_tuples to
> > what VACUUM itself observed, ignoring any concurrently reported dead
> > tuples. As far as I can tell, when vacuum takes a long time, that can lead to
> > severely under-accounting dead tuples.
> 
> Did I not mention that one? There are so many that it can be hard to
> keep track! That's why I catalog them.

I don't recall you doing, but there's lot of emails and holes in my head.


> This creates an awkward but logical question, though: what if
> dead_tuples doesn't go down at all? What if VACUUM actually has to
> increase it, because VACUUM runs so slowly relative to the workload?

Sure, that can happen - but it's not made better by having wrong stats :)


> > I do think this is an argument for splitting up dead_tuples into separate
> > "components" that we track differently. I.e. tracking the number of dead
> > items, not-yet-removable rows, and the number of dead tuples reported from DML
> > statements via pgstats.
> 
> Is it? Why?

We have reasonably sophisticated accounting in pgstats what newly live/dead
rows a transaction "creates". So an obvious (and wrong) idea is just decrement
reltuples by the number of tuples removed by autovacuum. But we can't do that,
because inserted/deleted tuples reported by backends can be removed by
on-access pruning and vacuumlazy doesn't know about all changes made by its
call to heap_page_prune().

But I think that if we add a
  pgstat_count_heap_prune(nredirected, ndead, nunused)
around heap_page_prune() and a
  pgstat_count_heap_vacuum(nunused)
in lazy_vacuum_heap_page(), we'd likely end up with a better approximation
than what vac_estimate_reltuples() does, in the "partially scanned" case.



> I'm all in favor of doing that, of course. I just don't particularly
> think that it's related to this other problem. One problem is that we
> count dead tuples incorrectly because we don't account for the fact
> that things change while VACUUM runs. The other problem is that the
> thing that is counted isn't broken down into distinct subcategories of
> things -- things are bunched together that shouldn't be.

If we only adjust the counters incrementally, as we go, we'd not update them
at the end of vacuum. I think it'd be a lot easier to only update the counters
incrementally if we split ->dead_tuples into sub-counters.

So I don't think it's entirely unrelated.

You probably could get close without splitting the counters, by just pushing
down the counting, and only counting redirected and unused during heap
pruning. But I think it's likely to be more accurate with the split counter.



> Oh wait, you were thinking of what I said before -- my "awkward but
> logical question". Is that it?

I'm not quite following? The "awkward but logical" bit is in the email I'm
just replying to, right?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Next
From: Peter Geoghegan
Date:
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation