Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing - Mailing list pgsql-hackers
From | Ryan Booz |
---|---|
Subject | Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing |
Date | |
Msg-id | CAMjhCZqArvBmctrzdRt62VvenBHV=QXai+NGrXLGFnPHJjLTbA@mail.gmail.com Whole thread Raw |
In response to | Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing (Ryan Booz <ryan@softwareandbooz.com>) |
List | pgsql-hackers |
And, of course, I forgot that I switch to text-mode after writing most of this, so the carriage returns were unnecessary. (facepalm... sigh) -- Ryan On Fri, May 12, 2023 at 1:36 PM Ryan Booz <ryan@softwareandbooz.com> wrote: > > Thanks for the continued work, Peter. I hate to be the guy that starts this way, > but this is my first ever response on pgsql-hackers. (insert awkward > smile face). > Hopefully I've followed etiquette well, but please forgive any > missteps, and I'm > happy for any help in making better contributions in the future. > > On Thu, May 11, 2023 at 9:19 PM Peter Geoghegan <pg@bowt.ie> wrote: > > > > On Thu, May 4, 2023 at 3:18 PM samay sharma <smilingsamay@gmail.com> wrote: > > > What do you think about the term "Exhaustion"? Maybe something like "XID allocation exhaustion" or "Exhaustion of allocatableXIDs"? > > > > I use the term "transaction ID exhaustion" in the attached revision, > > v4. Overall, v4 builds on the work that went into v2 and v3, by > > continuing to polish the overhaul of everything related to freezing, > > relfrozenxid advancement, and anti-wraparound autovacuum. > > Just to say on the outset, as has been said earlier in the tread by others, > that this is herculean work. Thank you for putting the effort you have thus far. > There's a lot of good from where I sit in the modification efforts. > It's a heavy, > dense topic, so there's probably never going to be a perfect way to > get it all in, > but some of the context early on, especially, is helpful for framing. > > > > > It would be nice if it was possible to add an animation/diagram a > > little like this one: https://tuple-freezing-demo.angusd.com (this is > > how I tend to think about the "transaction ID space".) > > Indeed. With volunteer docs, illustrations/diagrams are hard for sure. But, > this or something akin to the "clock" image I've seen elsewhere when > describing the transaction ID space would probably be helpful if it were ever > possible. In fact, there's just a lot about the MVCC stuff in general that > would benefit from diagrams. But alas, I guess that's why we have some > good go-to community talks/slide decks. :-) > > > v4 also limits use of the term "wraparound" to places that directly > > discuss anti-wraparound autovacuums (plus one place in xact.sgml, > > where discussion of "true unsigned integer wraparound" and related > > implementation details has been moved). Otherwise we use the term > > "transaction ID exhaustion", which is pretty much the user-facing name > > for "xidStopLimit". I feel that this is a huge improvement, for the > > reason given to Greg earlier. I'm flexible on the details, but I feel > > strongly that we should minimize use of the term wraparound wherever > > it might have the connotation of "the past becoming the future". This > > is not a case of inventing a new terminology for its own sake. If > > anybody is skeptical I ask that they take a look at what I came up > > with before declaring it a bad idea. I have made that as easy as > > possible, by once again attaching a prebuilt routine-vacuuming.html. > > Thanks again for doing this. Really helpful for doc newbies like me that > want to help but are still working through the process. Really helpful > and appreciated. > > > > > > > Other changes in v4, compared to v3: > > > > * Improved discussion of the differences between non-aggressive and > > aggressive VACUUM. > > This was helpful for me and not something I've previously put much thought > into. Helpful context that is missing from the current docs. > > > * Explains "catch-up freezing" performed by aggressive VACUUMs directly. > > > > "Catch-up" freezing is the really important "consequence" -- something > > that emerges from how each type of VACUUM behaves over time. It is an > > indirect consequence of the behaviors. I would like to counter the > > perception that some users have about freezing only happening during > > aggressive VACUUMs (or anti-wraparound autovacuums). But more than > > that, talking about catch-up freezing seems essential because it is > > the single most important difference. > > > > Similarly, this was helpful overall context of various things > happening with freezing. > > > * Much improved handling of the discussion of anti-wraparound > > autovacuum, and how it relates to aggressive VACUUMs, following > > feedback from Samay. > > > > There is now only fairly minimal overlap in the discussion of > > aggressive VACUUM and anti-wraparound autovacuuming. We finish the > > discussion of aggressive VACUUM just after we start discussing > > anti-wraparound autovacuum. This transition works well, because it > > enforces the idea that anti-wraparound autovacuum isn't really special > > compared to any other aggressive autovacuum. This was something that > > Samay expressed particularly concern about: making anti-wraparound > > autovacuums sound less scary. Though it's also a concern I had from > > the outset, based on practical experience and interactions with people > > that have much less knowledge of Postgres than I do. > > Agree. This flows fairly well and helps the user understand that each > "next step" > in the vacuum/freezing process has a distinct job based on previous work. > > > > > * Anti-wraparound autovacuum is now mostly discussed as something that > > happens to static or mostly-static tables.... > > ...This moves discussion of anti-wraparound av in the direction of: > > "Anti-wraparound autovacuum is a special type of autovacuum. Its > > purpose is to ensure that relfrozenxid advances when no earlier VACUUM > > could advance it in passing — often because no VACUUM has run against > > the table for an extended period." > > > > Again, learned something new here, at least in how I think about it and talk > with others. In total, I do think these changes make wraparound/exhaustion > seem less "the sky is falling". > > > * Added a couple of "Tips" about instrumentation that appears in the > > server log whenever autovacuum reports on a VACUUM operation. > > > > * Much improved "Truncating Transaction Status Information" subsection. > > > > My explanation of the ways in which autovacuum_freeze_max_age can > > affect the storage overhead of commit/abort status in pg_xact is much > > clearer than it was in v3 -- pg_xact truncation is now treated as > > something loosely related to the global config of anti-wraparound > > autovacuum, which makes most sense. > > > This one isn't totally sinking in with me yet. Need another read. > > > It took a great deal of effort to find a structure that covered > > everything, and that highlighted all of the important relationships > > without going too far, while at the same time not being a huge mess. > > That's what I feel I've arrived at with v4. > > In most respects I agree with the overall flow of changes w.r.t the current doc. > Focusing on all of this as something that should normally just be happening > as part of autovacuum is helpful. Working through it as an order of operations > (and I'm just assuming this is the general order) feels like it ties > things together > a lot more. I honestly come away from this document with more of a "I understand > the process" feel than I did previously. > > For now, I'd add the following few comments on the intro section, > 2.5.1 and 2.5.2. I > haven't gotten to the bottom sections yet for much feedback. > > Intro Comments: > 1) "The autovacuum daemon automatically schedules maintenance tasks based on > workload requirements." feels at tension with "Autovacuum scheduling > is controlled > via threshold settings." > > Owing to the lingering belief that many users have whereby hosting providers > have magically enabled Postgres to do all of this for you, there is > still a need to > actively deal with these thresholds based on load. That is, as far as > I understand, > Postgres doesn't automatically adjust based on load. Someone/thing > still has to modify > the thresholds as load and data size changes. > > If the "workload requirements" is pointing towards aggressive > freezing/wraparound > tasks that happen regardless of thresholds, then for me at least that > isn't clear > in that sentence and it feels like there's an implication that > Postgres/autovacuum > is going to magically adjust overall vacuum work based on database workload. > > 2) "The intended audience is database administrators that wish to > perform more advanced > autovacuum tuning, with any of the following goals in mind:" > > I love calling out the audience in some way. That's really helpful, as are the > stated goals in the bullet list. However, as someone feeling pretty novice > after reading all of this, I can't honestly connect how the content on this page > helps me to more advanced tuning. I have a much better idea how freezing, > in particular, works (yay!), but I'm feeling a bit dense how almost anything > here helps me tune vacuum, at least as it relates to the bullets. > > I'm sure you have a connection in mind for each, and certainly understanding the > inner workings of what's happening under the covers is tremendously beneficial, > but when I search for "response" or "performance" in this document, it refers > back to another page (not included in this patch) that talks about the > thresholds. > > It might be as simple as adding something to the end of each bullet to draw > that relationship, but as is, it's hard for me to do it mentally (although I can > conjecture a few things on my own) > > That said, I definitely appreciate the callout that tuning is an > iterative process > and the minor switch from "creates a substantial amount of I/O > traffic" to "may create...". > > ** Section 2.5.1 - Recovering Disk Space ** > > 3). "The space dead tuples occupy must eventually be reclaimed for reuse > by new rows, to avoid unbounded growth of disk space requirements. Reclaiming > space from dead rows is VACUUM's main responsibility." > > It feels like one connection you could make to the bullet list above > is in this area > and not mentioned. By freeing up space and reducing the number of pages that > need to be read for satisfying a query, vacuum and recovering disk space > (theoretically) improves query performance. Not 100% how to add it in context > of these first two paragraphs. > > 4) Caution: "It may be a good idea to add monitoring to alert you about this." > I hate to be pedantic about it, but I think we should spell out > "this". Do we have > a pointer in documentation to what kinds of things to monitor for? Am monitoring > long-running transactions or some metric that shows me that VACUUM is being > "held back"? I know what you mean, but it's not clear to me how to do the right > thing in my environment here. > > 5) The plethora of tips/notes/warnings. > As you and others have mentioned, as presented these really have no context > for me. Individually they are good/helpful information, but it's > really hard to make > a connection to what I should "do" about it. > > It seems to me that this would be a good place to put a subsection which is > something like, "A note about reclaiming disk space" or something. In my > experience, most people hear about and end up using VACUUM FULL because > things got out of control and they want to get into a better spot (I have been > in that boat). I think with a small section that says, in essence, > "hey, now that > you understand why/how vacuum reclaims disk resources normally, if you're > in a position where things aren't in a good state, this is what you need to know > if you want to reclaim space from a really inefficient table" > > For me, at least, I think it would be easier to read/grok what you're sharing in > these callouts. > > 6) One last missing piece that very well might be in another page not referenced > (I obviously need to get the PG16 docs pulled and built locally so > that I can have > better overall reference. My apologies). > > In my experience, one of the biggest issues with the thresholds and recovering > space is the idea of tuning individual tables, not just the entire > database. 5/10/20% > might be fine for most tables, but it's usually the really active ones > that need the > tuning, specifically lowering the thresholds. That doesn't come across to me in > this section at all. Again, maybe I've missed something on another page and > it's all good, but it felt worth calling out. > > Plus, it may provide an opportunity to bring in the threshold formulas again if > they aren't referenced elsewhere (although they probably are). > > Hope that makes sense. > > ** Section 2.5.2: Freezing to manage... ** > As stated above, the effort here overall is great IMO. I like the flow > and reduction > in alarmist tone for things like wraparound, etc. I understand more > about freezing, > aggressive and otherwise, than I did before. > > 7) That said, totally speaking as a non-contributor, this section is > obviously very long > for good reason. But, by the time I've gotten down to 25.2.2.3, my > brain is a bit > bewildered on where we've gotten to. That's more a comment on my capability > to process it all, but I wonder if a slightly more explicit intro > could help set the > stage at least. > > "One side-effect of vacuum and transaction ID management at the row level is > that PostgreSQL would normally need to inspect each row for every query to > ensure it is visible to each requesting transaction. In order to > reduce the need to > read and inspect excessive amounts of data at query time or when normal vacuum > maintenance kicks in, VACUUM has a second job called freezing, which > accomplishes three goals: (attempting to tie in the three sections) > * speeding up queries and vacuum operations by... > * advancing the transaction ID space on generally static tables... > * ensure there are always free transaction IDs available for normal > operation... > " > > Maybe totally worthless and too much, but something like that might set a reader > up for just a bit more context. Then you could take most of what comes before > "2.5.2.2.1 Aggressive Vacuum" as a subsection (would require a renumber below) > with something like "2.5.2.2.1 Normal Freezing Activity" > > 8) Note "In PostgreSQL versions before 16..." > Showing my naivety, somehow this isn't connecting with me totally. If > it's important > to call out, then maybe we need a connecting sentence. Based on the content > above, I think you're pointing to "It's also why VACUUM will freeze all eligible > tuples from a heap page once the decision to freeze at least one tuple > is taken:" > If that's it, it's just not clear to me what's totally changed. Sorry, > more learning. :-) > > --- > Hope something in there is helpful. > > Ryan Booz > > > > > -- > > Peter Geoghegan
pgsql-hackers by date: