Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically? - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically? |
Date | |
Msg-id | CA+TgmoYemQxRN7EC+2UD6Lp0g9qu02g9SFZ_wECsBaB1NibFuQ@mail.gmail.com Whole thread Raw |
In response to | Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically? (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
|
List | pgsql-hackers |
On Mon, Dec 6, 2021 at 9:42 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Mon, Dec 6, 2021 at 6:11 PM Robert Haas <robertmhaas@gmail.com> wrote: > > This doesn't seem convincing. Launching autovacuum too soon surely has > > costs that someone might not want to pay. Clearly in the degenerate > > case where we always autovacuum every table every time an autovacuum > > worker is launched, we have gone insane. > > Unfortunately we already sometimes behave insanely in exactly the way > that you describe: > > https://postgr.es/m/CAH2-Wz=sJm3tm+FpXbyBhEhX5tbz1trQrhG6eOhYk4-+5uL=ww@mail.gmail.com In the same ballpark is http://rhaas.blogspot.com/2020/02/useless-vacuuming.html > It's hard to define a break-even point for launching an autovacuum > worker. I think it would be more productive to come up with a design > that at least doesn't go completely off the rails in various specific > ways. I think that's a good observation. I think the current autovacuum algorithm works pretty well when things are normal. But in extreme scenarios it does not degrade gracefully. The vacuuming-over-and-over-without-doing-anything phenomenon is an example of that. Another example is the user who creates 10,000 databases and we're happy to divide the 60s-autovacuum_naptime by 10,000 and try to launch a worker every 0.6 ms. A third example is vacuuming the tables from pg_class in physical order on disk, so that a table that is 1 XID past the wraparound limit can result in a long delay vacuuming a table that is bloating quickly, or conversely a table that is bloating very slowly but has just crawled over the threshold for a regular vacuum gets processed before one that is threatening an imminent wraparound shutdown. I think these are all pathological cases that a well-informed human can easily recognize and handle in an intelligent manner, and it doesn't seem crazy to program those responses into the computer in some way. > I also think that our problem is not so much that we don't have > accurate statistics about dead tuples (though we surely don't have > much accuracy). The main problem seems to be that there are various > specific, important ways in which the distribution of dead tuples may > matter (leading to various harmful biases). And so it seems reasonable > to fudge how we interpret dead tuples with the intention of capturing > some of that, as a medium term solution. Something that considers the > concentration of dead tuples in heap pages seems promising. I am less convinced about this part. It sort of sounds like you're saying - it doesn't really matter whether the numbers we gather are accurate, just that they produce the correct results. If the information we currently gather wouldn't produce the right results even if it were fully accurate, that to me suggests that we're gathering the wrong information, and we should gather something else. For example, we could attack the useless-vacuuming problem by having each vacuum figure out - and store - the oldest XID that could potentially be worth using as a cutoff for vacuuming that table, and refusing to autovacuum that table again until we'd be able to use a cutoff >= that value. I suppose this would be the oldest of (a) any XID that exists in the table on a tuple that we judged recently dead, (b) any XID that is currently-running, and (c) the next XID. I also accept that knowing the concentration of dead tuples on pages that contain at least 1 dead tuple could be interesting. I've felt for a while that it's a mistake to know how many dead tuples there are but not how many pages contain them, because that affects both the amount of I/O required to vacuum and also how much need we have to set VM bits. I'm not sure I would have approached gathering that information in the way that you're proposing here, but I'm not deeply against it, either. I do think that we should try to keep it as a principle that whatever we do gather, we should try our best to make accurate. If that doesn't work well, then gather different stuff instead. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: