Re: autovacuum prioritization - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: autovacuum prioritization |
Date | |
Msg-id | CAFiTN-tw9uo+pdKmkdotc_rKXYbL=gDz7Aw00_m2+iSE4Dt10A@mail.gmail.com Whole thread Raw |
In response to | autovacuum prioritization (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: autovacuum prioritization
|
List | pgsql-hackers |
On Fri, Jan 21, 2022 at 12:54 AM Robert Haas <robertmhaas@gmail.com> wrote: > So at a high level, I think that what we ought to do is, first, for > each table, estimate the time at which we think something bad will > occur. There are several bad events: too much bloat, XID wraparound, > MXID wraparound. We need to estimate the time at which we think each > of those things will occur, and then take the earliest of those > estimates. That's the time by which we need to have finished vacuuming > the table. Then, make an estimate of how long it will take to complete > a vacuum of the table, subtract that from the time at which we need to > be done, and that's the time by which we need to start. The earliest > need-to-start time is the highest priority table. I think we need some more parameters to compare bloat vs wraparound. I mean in one of your examples in the 2nd paragraph we can say that the need-to-start of table A is earlier than table B so it's kind of simple. But when it comes to wraparound vs bloat we need to add some weightage to compute how much bloat is considered as bad as wraparound. I think the amount of bloat can not be an absolute number but it should be relative w.r.t the total database size or so. I don't think it can be computed w.r.t to the table size because if the table is e.g. just 1 GB size and it is 5 times bloated then it is not as bad as another 1 TB table which is just 2 times bloated. > > A second problem is that, if the earliest need-to-start time is in the > past, then we definitely are in trouble and had better get to work at > once, but if it's in the future, that doesn't necessarily mean we're > safe. If there are three tables with a need-to-finish time that is 12 > hours in the future and each of them will take 11 hours to vacuum, > then every need-to-start time computed according to the algorithm > above is in the future, but in fact we're in a lot of trouble. If the > estimates are accurate, we need 3 autovacuum workers to be available > to start within 1 hour, or we're doomed. The very last thing we want > to do is wait another hour before doing anything. It's not impossible > to factor this into the calculation of need-to-start times, assuming > we know how many workers we have. For instance, if we've got tables > whose need-to-finish times are 30, 50, and 70 minutes in the future, > we can see that if each one takes 20 minutes or less to vacuum, then > the need-to-start times can just be computed by subtraction. But the > tables with 50 or 70 minute deadlines are going to take more than 20 > minutes to vacuum, then we've got to back up the need-to-start times > so that we finish each table in time to start on the next one. I > haven't looked into what algorithms exist for this kind of scheduling > problem, but I feel like a literature search, or pulling out my > college textbooks, would probably turn up some useful ideas. I think we should be thinking of dynamically adjusting priority as well. Because it is possible that when autovacuum started we prioritize the table based on some statistics and estimation but vacuuming process can take long time and during that some priority might change so during the start of the autovacuum if we push all table to some priority queue and simply vacuum in that order then we might go wrong somewhere. I think we need to make different priority queues based on different factors, for example 1 queue for wraparound risk and another for bloat risk. Even though there would be multiple queue we would have need_to_start time with each item so that we exactly know from which queue we pick the next item but dynamically whenever picking the item we can recheck the priority of the item at the head of the queue and always assume that the queue is arranged in order of need_to_start time. So now what if the item back in the queue becomes more important than the item at the queue head based on some statistics? I don't think it is wise to compute the need_to_start time for all the items before picking any new item. But I think we need to have multiple queues based on different factors (not only just wraparound and bloat) to reduce the risk of items in the back of the queue becoming higher priority than items in front of the queue. I mean this can not completely be avoided but this can be reduced by creating multiple work queues based on more factors which can dynamically change. > > I know that this email is kind of a giant wall of text, so my thanks > if you've read this far, and even more if you feel inspired to write > back with your own thoughts. Yeah it is a long email but quite interesting. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: