Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing - Mailing list pgsql-hackers
From | Ryan Booz |
---|---|
Subject | Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing |
Date | |
Msg-id | CAMjhCZoPiOyGMmhGnxBHLgvqnJjJnEDY+-jJ3z2WNwUhfcGbjA@mail.gmail.com Whole thread Raw |
In response to | Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing
Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing |
List | pgsql-hackers |
Thanks for the continued work, Peter. I hate to be the guy that starts this way, but this is my first ever response on pgsql-hackers. (insert awkward smile face). Hopefully I've followed etiquette well, but please forgive any missteps, and I'm happy for any help in making better contributions in the future. On Thu, May 11, 2023 at 9:19 PM Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, May 4, 2023 at 3:18 PM samay sharma <smilingsamay@gmail.com> wrote: > > What do you think about the term "Exhaustion"? Maybe something like "XID allocation exhaustion" or "Exhaustion of allocatableXIDs"? > > I use the term "transaction ID exhaustion" in the attached revision, > v4. Overall, v4 builds on the work that went into v2 and v3, by > continuing to polish the overhaul of everything related to freezing, > relfrozenxid advancement, and anti-wraparound autovacuum. Just to say on the outset, as has been said earlier in the tread by others, that this is herculean work. Thank you for putting the effort you have thus far. There's a lot of good from where I sit in the modification efforts. It's a heavy, dense topic, so there's probably never going to be a perfect way to get it all in, but some of the context early on, especially, is helpful for framing. > > It would be nice if it was possible to add an animation/diagram a > little like this one: https://tuple-freezing-demo.angusd.com (this is > how I tend to think about the "transaction ID space".) Indeed. With volunteer docs, illustrations/diagrams are hard for sure. But, this or something akin to the "clock" image I've seen elsewhere when describing the transaction ID space would probably be helpful if it were ever possible. In fact, there's just a lot about the MVCC stuff in general that would benefit from diagrams. But alas, I guess that's why we have some good go-to community talks/slide decks. :-) > v4 also limits use of the term "wraparound" to places that directly > discuss anti-wraparound autovacuums (plus one place in xact.sgml, > where discussion of "true unsigned integer wraparound" and related > implementation details has been moved). Otherwise we use the term > "transaction ID exhaustion", which is pretty much the user-facing name > for "xidStopLimit". I feel that this is a huge improvement, for the > reason given to Greg earlier. I'm flexible on the details, but I feel > strongly that we should minimize use of the term wraparound wherever > it might have the connotation of "the past becoming the future". This > is not a case of inventing a new terminology for its own sake. If > anybody is skeptical I ask that they take a look at what I came up > with before declaring it a bad idea. I have made that as easy as > possible, by once again attaching a prebuilt routine-vacuuming.html. Thanks again for doing this. Really helpful for doc newbies like me that want to help but are still working through the process. Really helpful and appreciated. > > > Other changes in v4, compared to v3: > > * Improved discussion of the differences between non-aggressive and > aggressive VACUUM. This was helpful for me and not something I've previously put much thought into. Helpful context that is missing from the current docs. > * Explains "catch-up freezing" performed by aggressive VACUUMs directly. > > "Catch-up" freezing is the really important "consequence" -- something > that emerges from how each type of VACUUM behaves over time. It is an > indirect consequence of the behaviors. I would like to counter the > perception that some users have about freezing only happening during > aggressive VACUUMs (or anti-wraparound autovacuums). But more than > that, talking about catch-up freezing seems essential because it is > the single most important difference. > Similarly, this was helpful overall context of various things happening with freezing. > * Much improved handling of the discussion of anti-wraparound > autovacuum, and how it relates to aggressive VACUUMs, following > feedback from Samay. > > There is now only fairly minimal overlap in the discussion of > aggressive VACUUM and anti-wraparound autovacuuming. We finish the > discussion of aggressive VACUUM just after we start discussing > anti-wraparound autovacuum. This transition works well, because it > enforces the idea that anti-wraparound autovacuum isn't really special > compared to any other aggressive autovacuum. This was something that > Samay expressed particularly concern about: making anti-wraparound > autovacuums sound less scary. Though it's also a concern I had from > the outset, based on practical experience and interactions with people > that have much less knowledge of Postgres than I do. Agree. This flows fairly well and helps the user understand that each "next step" in the vacuum/freezing process has a distinct job based on previous work. > > * Anti-wraparound autovacuum is now mostly discussed as something that > happens to static or mostly-static tables.... > ...This moves discussion of anti-wraparound av in the direction of: > "Anti-wraparound autovacuum is a special type of autovacuum. Its > purpose is to ensure that relfrozenxid advances when no earlier VACUUM > could advance it in passing — often because no VACUUM has run against > the table for an extended period." > Again, learned something new here, at least in how I think about it and talk with others. In total, I do think these changes make wraparound/exhaustion seem less "the sky is falling". > * Added a couple of "Tips" about instrumentation that appears in the > server log whenever autovacuum reports on a VACUUM operation. > > * Much improved "Truncating Transaction Status Information" subsection. > > My explanation of the ways in which autovacuum_freeze_max_age can > affect the storage overhead of commit/abort status in pg_xact is much > clearer than it was in v3 -- pg_xact truncation is now treated as > something loosely related to the global config of anti-wraparound > autovacuum, which makes most sense. > This one isn't totally sinking in with me yet. Need another read. > It took a great deal of effort to find a structure that covered > everything, and that highlighted all of the important relationships > without going too far, while at the same time not being a huge mess. > That's what I feel I've arrived at with v4. In most respects I agree with the overall flow of changes w.r.t the current doc. Focusing on all of this as something that should normally just be happening as part of autovacuum is helpful. Working through it as an order of operations (and I'm just assuming this is the general order) feels like it ties things together a lot more. I honestly come away from this document with more of a "I understand the process" feel than I did previously. For now, I'd add the following few comments on the intro section, 2.5.1 and 2.5.2. I haven't gotten to the bottom sections yet for much feedback. Intro Comments: 1) "The autovacuum daemon automatically schedules maintenance tasks based on workload requirements." feels at tension with "Autovacuum scheduling is controlled via threshold settings." Owing to the lingering belief that many users have whereby hosting providers have magically enabled Postgres to do all of this for you, there is still a need to actively deal with these thresholds based on load. That is, as far as I understand, Postgres doesn't automatically adjust based on load. Someone/thing still has to modify the thresholds as load and data size changes. If the "workload requirements" is pointing towards aggressive freezing/wraparound tasks that happen regardless of thresholds, then for me at least that isn't clear in that sentence and it feels like there's an implication that Postgres/autovacuum is going to magically adjust overall vacuum work based on database workload. 2) "The intended audience is database administrators that wish to perform more advanced autovacuum tuning, with any of the following goals in mind:" I love calling out the audience in some way. That's really helpful, as are the stated goals in the bullet list. However, as someone feeling pretty novice after reading all of this, I can't honestly connect how the content on this page helps me to more advanced tuning. I have a much better idea how freezing, in particular, works (yay!), but I'm feeling a bit dense how almost anything here helps me tune vacuum, at least as it relates to the bullets. I'm sure you have a connection in mind for each, and certainly understanding the inner workings of what's happening under the covers is tremendously beneficial, but when I search for "response" or "performance" in this document, it refers back to another page (not included in this patch) that talks about the thresholds. It might be as simple as adding something to the end of each bullet to draw that relationship, but as is, it's hard for me to do it mentally (although I can conjecture a few things on my own) That said, I definitely appreciate the callout that tuning is an iterative process and the minor switch from "creates a substantial amount of I/O traffic" to "may create...". ** Section 2.5.1 - Recovering Disk Space ** 3). "The space dead tuples occupy must eventually be reclaimed for reuse by new rows, to avoid unbounded growth of disk space requirements. Reclaiming space from dead rows is VACUUM's main responsibility." It feels like one connection you could make to the bullet list above is in this area and not mentioned. By freeing up space and reducing the number of pages that need to be read for satisfying a query, vacuum and recovering disk space (theoretically) improves query performance. Not 100% how to add it in context of these first two paragraphs. 4) Caution: "It may be a good idea to add monitoring to alert you about this." I hate to be pedantic about it, but I think we should spell out "this". Do we have a pointer in documentation to what kinds of things to monitor for? Am monitoring long-running transactions or some metric that shows me that VACUUM is being "held back"? I know what you mean, but it's not clear to me how to do the right thing in my environment here. 5) The plethora of tips/notes/warnings. As you and others have mentioned, as presented these really have no context for me. Individually they are good/helpful information, but it's really hard to make a connection to what I should "do" about it. It seems to me that this would be a good place to put a subsection which is something like, "A note about reclaiming disk space" or something. In my experience, most people hear about and end up using VACUUM FULL because things got out of control and they want to get into a better spot (I have been in that boat). I think with a small section that says, in essence, "hey, now that you understand why/how vacuum reclaims disk resources normally, if you're in a position where things aren't in a good state, this is what you need to know if you want to reclaim space from a really inefficient table" For me, at least, I think it would be easier to read/grok what you're sharing in these callouts. 6) One last missing piece that very well might be in another page not referenced (I obviously need to get the PG16 docs pulled and built locally so that I can have better overall reference. My apologies). In my experience, one of the biggest issues with the thresholds and recovering space is the idea of tuning individual tables, not just the entire database. 5/10/20% might be fine for most tables, but it's usually the really active ones that need the tuning, specifically lowering the thresholds. That doesn't come across to me in this section at all. Again, maybe I've missed something on another page and it's all good, but it felt worth calling out. Plus, it may provide an opportunity to bring in the threshold formulas again if they aren't referenced elsewhere (although they probably are). Hope that makes sense. ** Section 2.5.2: Freezing to manage... ** As stated above, the effort here overall is great IMO. I like the flow and reduction in alarmist tone for things like wraparound, etc. I understand more about freezing, aggressive and otherwise, than I did before. 7) That said, totally speaking as a non-contributor, this section is obviously very long for good reason. But, by the time I've gotten down to 25.2.2.3, my brain is a bit bewildered on where we've gotten to. That's more a comment on my capability to process it all, but I wonder if a slightly more explicit intro could help set the stage at least. "One side-effect of vacuum and transaction ID management at the row level is that PostgreSQL would normally need to inspect each row for every query to ensure it is visible to each requesting transaction. In order to reduce the need to read and inspect excessive amounts of data at query time or when normal vacuum maintenance kicks in, VACUUM has a second job called freezing, which accomplishes three goals: (attempting to tie in the three sections) * speeding up queries and vacuum operations by... * advancing the transaction ID space on generally static tables... * ensure there are always free transaction IDs available for normal operation... " Maybe totally worthless and too much, but something like that might set a reader up for just a bit more context. Then you could take most of what comes before "2.5.2.2.1 Aggressive Vacuum" as a subsection (would require a renumber below) with something like "2.5.2.2.1 Normal Freezing Activity" 8) Note "In PostgreSQL versions before 16..." Showing my naivety, somehow this isn't connecting with me totally. If it's important to call out, then maybe we need a connecting sentence. Based on the content above, I think you're pointing to "It's also why VACUUM will freeze all eligible tuples from a heap page once the decision to freeze at least one tuple is taken:" If that's it, it's just not clear to me what's totally changed. Sorry, more learning. :-) --- Hope something in there is helpful. Ryan Booz > > -- > Peter Geoghegan
pgsql-hackers by date: