Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation |
Date | |
Msg-id | CAH2-WzmrJvi+mBnNLYxo18TAGNC78kKb5706bUfiYhZ3JKVfpg@mail.gmail.com Whole thread Raw |
In response to | Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
|
List | pgsql-hackers |
On Wed, Jan 18, 2023 at 4:37 PM Andres Freund <andres@anarazel.de> wrote: > I can, it should be just about trivial code-wise. A bit queasy about trying to > forsee the potential consequences. That's always going to be true, though. > A somewhat related issue is that pgstat_report_vacuum() sets dead_tuples to > what VACUUM itself observed, ignoring any concurrently reported dead > tuples. As far as I can tell, when vacuum takes a long time, that can lead to > severely under-accounting dead tuples. Did I not mention that one? There are so many that it can be hard to keep track! That's why I catalog them. As you point out, it's the dead tuples equivalent of my ins_since_vacuum complaint. The problem is exactly analogous to my recent complaint about insert-driven autovacuums. > I wonder if we ought to remember the dead_tuples value at the start of the > heap scan and use that to adjust the final dead_tuples value. I'd lean towards > over-counting rather than under-counting and thus probably would go for > something like > > tabentry->dead_tuples = livetuples + Min(0, tabentry->dead_tuples - deadtuples_at_start); > > i.e. assuming we might have missed all concurrently reported dead tuples. This is exactly what I was thinking of doing for both issues (the ins_since_vacuum one and this similar dead tuples one). It's completely logical. This creates an awkward but logical question, though: what if dead_tuples doesn't go down at all? What if VACUUM actually has to increase it, because VACUUM runs so slowly relative to the workload? Of course the whole point is to make it more likely that VACUUM will keep up with the workload. I'm just not quite sure that the consequences of doing it that way are strictly a good thing. Bearing in mind that we don't differentiate between recently dead and dead here. Fun fact: autovacuum can spin with pgbench because of recently dead tuples, even absent an old snapshot/long running xact, if you set things aggressively enough: https://postgr.es/m/CAH2-Wz=sJm3tm+FpXbyBhEhX5tbz1trQrhG6eOhYk4-+5uL=ww@mail.gmail.com I think that we probably need to do something like always make sure that dead_items goes down by a small amount at the end of each VACUUM, even when that's a lie. Maybe we also have a log message about autovacuum not keeping up, so as to not feel too guilty about it. You know, to give the user a chance to reconfigure autovacuum so that it stops happening. > Of course we could instead move to something like ins_since_vacuum and reset > it at the *start* of the vacuum. But that'd make the error case harder, > without giving us more accuracy, I think? It would. It seems illogical to me. > I do think this is an argument for splitting up dead_tuples into separate > "components" that we track differently. I.e. tracking the number of dead > items, not-yet-removable rows, and the number of dead tuples reported from DML > statements via pgstats. Is it? Why? I'm all in favor of doing that, of course. I just don't particularly think that it's related to this other problem. One problem is that we count dead tuples incorrectly because we don't account for the fact that things change while VACUUM runs. The other problem is that the thing that is counted isn't broken down into distinct subcategories of things -- things are bunched together that shouldn't be. Oh wait, you were thinking of what I said before -- my "awkward but logical question". Is that it? -- Peter Geoghegan
pgsql-hackers by date: