Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing - Mailing list pgsql-hackers

From Ryan Booz
Subject Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing
Date
Msg-id CAMjhCZqArvBmctrzdRt62VvenBHV=QXai+NGrXLGFnPHJjLTbA@mail.gmail.com
Whole thread Raw
In response to Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing  (Ryan Booz <ryan@softwareandbooz.com>)
List pgsql-hackers
And, of course, I forgot that I switch to text-mode after writing most
of this, so the carriage returns were unnecessary. (facepalm... sigh)

--
Ryan

On Fri, May 12, 2023 at 1:36 PM Ryan Booz <ryan@softwareandbooz.com> wrote:
>
> Thanks for the continued work, Peter. I hate to be the guy that starts this way,
> but this is my first ever response on pgsql-hackers. (insert awkward
> smile face).
> Hopefully I've followed etiquette well, but please forgive any
> missteps, and I'm
> happy for any help in making better contributions in the future.
>
> On Thu, May 11, 2023 at 9:19 PM Peter Geoghegan <pg@bowt.ie> wrote:
> >
> > On Thu, May 4, 2023 at 3:18 PM samay sharma <smilingsamay@gmail.com> wrote:
> > > What do you think about the term "Exhaustion"? Maybe something like "XID allocation exhaustion" or "Exhaustion of
allocatableXIDs"? 
> >
> > I use the term "transaction ID exhaustion" in the attached revision,
> > v4. Overall, v4 builds on the work that went into v2 and v3, by
> > continuing to polish the overhaul of everything related to freezing,
> > relfrozenxid advancement, and anti-wraparound autovacuum.
>
> Just to say on the outset, as has been said earlier in the tread by others,
> that this is herculean work. Thank you for putting the effort you have thus far.
> There's a lot of good from where I sit in the modification efforts.
> It's a heavy,
> dense topic, so there's probably never going to be a perfect way to
> get it all in,
> but some of the context early on, especially, is helpful for framing.
>
> >
> > It would be nice if it was possible to add an animation/diagram a
> > little like this one: https://tuple-freezing-demo.angusd.com (this is
> > how I tend to think about the "transaction ID space".)
>
> Indeed. With volunteer docs, illustrations/diagrams are hard for sure. But,
> this or something akin to the "clock" image I've seen elsewhere when
> describing the transaction ID space would probably be helpful if it were ever
> possible. In fact, there's just a lot about the MVCC stuff in general that
> would benefit from diagrams. But alas, I guess that's why we have some
> good go-to community talks/slide decks. :-)
>
> > v4 also limits use of the term "wraparound" to places that directly
> > discuss anti-wraparound autovacuums (plus one place in xact.sgml,
> > where discussion of "true unsigned integer wraparound" and related
> > implementation details has been moved). Otherwise we use the term
> > "transaction ID exhaustion", which is pretty much the user-facing name
> > for "xidStopLimit". I feel that this is a huge improvement, for the
> > reason given to Greg earlier. I'm flexible on the details, but I feel
> > strongly that we should minimize use of the term wraparound wherever
> > it might have the connotation of "the past becoming the future". This
> > is not a case of inventing a new terminology for its own sake. If
> > anybody is skeptical I ask that they take a look at what I came up
> > with before declaring it a bad idea. I have made that as easy as
> > possible, by once again attaching a prebuilt routine-vacuuming.html.
>
> Thanks again for doing this. Really helpful for doc newbies like me that
> want to help but are still working through the process. Really helpful
> and appreciated.
>
> >
> >
> > Other changes in v4, compared to v3:
> >
> > * Improved discussion of the differences between non-aggressive and
> > aggressive VACUUM.
>
> This was helpful for me and not something I've previously put much thought
> into. Helpful context that is missing from the current docs.
>
> > * Explains "catch-up freezing" performed by aggressive VACUUMs directly.
> >
> > "Catch-up" freezing is the really important "consequence" -- something
> > that emerges from how each type of VACUUM behaves over time. It is an
> > indirect consequence of the behaviors. I would like to counter the
> > perception that some users have about freezing only happening during
> > aggressive VACUUMs (or anti-wraparound autovacuums). But more than
> > that, talking about catch-up freezing seems essential because it is
> > the single most important difference.
> >
>
> Similarly, this was helpful overall context of various things
> happening with freezing.
>
> > * Much improved handling of the discussion of anti-wraparound
> > autovacuum, and how it relates to aggressive VACUUMs, following
> > feedback from Samay.
> >
> > There is now only fairly minimal overlap in the discussion of
> > aggressive VACUUM and anti-wraparound autovacuuming. We finish the
> > discussion of aggressive VACUUM just after we start discussing
> > anti-wraparound autovacuum. This transition works well, because it
> > enforces the idea that anti-wraparound autovacuum isn't really special
> > compared to any other aggressive autovacuum. This was something that
> > Samay expressed particularly concern about: making anti-wraparound
> > autovacuums sound less scary. Though it's also a concern I had from
> > the outset, based on practical experience and interactions with people
> > that have much less knowledge of Postgres than I do.
>
> Agree. This flows fairly well and helps the user understand that each
> "next step"
> in the vacuum/freezing process has a distinct job based on previous work.
>
> >
> > * Anti-wraparound autovacuum is now mostly discussed as something that
> > happens to static or mostly-static tables....
> > ...This moves discussion of anti-wraparound av in the direction of:
> > "Anti-wraparound autovacuum is a special type of autovacuum. Its
> > purpose is to ensure that relfrozenxid advances when no earlier VACUUM
> > could advance it in passing — often because no VACUUM has run against
> > the table for an extended period."
> >
>
> Again, learned something new here, at least in how I think about it and talk
> with others. In total, I do think these changes make wraparound/exhaustion
> seem less "the sky is falling".
>
> > * Added a couple of "Tips" about instrumentation that appears in the
> > server log whenever autovacuum reports on a VACUUM operation.
> >
> > * Much improved "Truncating Transaction Status Information" subsection.
> >
> > My explanation of the ways in which autovacuum_freeze_max_age can
> > affect the storage overhead of commit/abort status in pg_xact is much
> > clearer than it was in v3 -- pg_xact truncation is now treated as
> > something loosely related to the global config of anti-wraparound
> > autovacuum, which makes most sense.
> >
> This one isn't totally sinking in with me yet. Need another read.
>
> > It took a great deal of effort to find a structure that covered
> > everything, and that highlighted all of the important relationships
> > without going too far, while at the same time not being a huge mess.
> > That's what I feel I've arrived at with v4.
>
> In most respects I agree with the overall flow of changes w.r.t the current doc.
> Focusing on all of this as something that should normally just be happening
> as part of autovacuum is helpful. Working through it as an order of operations
> (and I'm just assuming this is the general order) feels like it ties
> things together
> a lot more. I honestly come away from this document with more of a "I understand
> the process" feel than I did previously.
>
> For now, I'd add the following few comments on the intro section,
> 2.5.1 and 2.5.2. I
> haven't gotten to the bottom sections yet for much feedback.
>
> Intro Comments:
> 1) "The autovacuum daemon automatically schedules maintenance tasks based on
> workload requirements." feels at tension with "Autovacuum scheduling
> is controlled
> via threshold settings."
>
> Owing to the lingering belief that many users have whereby hosting providers
> have magically enabled Postgres to do all of this for you, there is
> still a need to
> actively deal with these thresholds based on load. That is, as far as
> I understand,
> Postgres doesn't automatically adjust based on load. Someone/thing
> still has to modify
> the thresholds as load and data size changes.
>
> If the "workload requirements" is pointing towards aggressive
> freezing/wraparound
> tasks that happen regardless of thresholds, then for me at least that
> isn't clear
> in that sentence and it feels like there's an implication that
> Postgres/autovacuum
> is going to magically adjust overall vacuum work based on database workload.
>
> 2) "The intended audience is database administrators that wish to
> perform more advanced
>  autovacuum tuning, with any of the following goals in mind:"
>
> I love calling out the audience in some way. That's really helpful, as are the
> stated goals in the bullet list. However, as someone feeling pretty novice
> after reading all of this, I can't honestly connect how the content on this page
> helps me to more advanced tuning. I have a much better idea how freezing,
> in particular, works (yay!), but I'm feeling a bit dense how almost anything
> here helps me tune vacuum, at least as it relates to the bullets.
>
> I'm sure you have a connection in mind for each, and certainly understanding the
> inner workings of what's happening under the covers is tremendously beneficial,
> but when I search for "response" or "performance" in this document, it refers
> back to another page (not included in this patch) that talks about the
> thresholds.
>
> It might be as simple as adding something to the end of each bullet to draw
> that relationship, but as is, it's hard for me to do it mentally (although I can
> conjecture a few things on my own)
>
> That said, I definitely appreciate the callout that tuning is an
> iterative process
> and the minor switch from "creates a substantial amount of I/O
> traffic" to "may create...".
>
> ** Section 2.5.1 - Recovering Disk Space **
>
> 3). "The space dead tuples occupy must eventually be reclaimed for reuse
> by new rows, to avoid unbounded growth of disk space requirements. Reclaiming
> space from dead rows is VACUUM's main responsibility."
>
> It feels like one connection you could make to the bullet list above
> is in this area
> and not mentioned. By freeing up space and reducing the number of pages that
> need to be read for satisfying a query, vacuum and recovering disk space
> (theoretically) improves query performance. Not 100% how to add it in context
> of these first two paragraphs.
>
> 4) Caution: "It may be a good idea to add monitoring to alert you about this."
> I hate to be pedantic about it, but I think we should spell out
> "this". Do we have
> a pointer in documentation to what kinds of things to monitor for? Am monitoring
> long-running transactions or some metric that shows me that VACUUM is being
> "held back"? I know what you mean, but it's not clear to me how to do the right
> thing in my environment here.
>
> 5) The plethora of tips/notes/warnings.
> As you and others have mentioned, as presented these really have no context
> for me. Individually they are good/helpful information, but it's
> really hard to make
> a connection to what I should "do" about it.
>
> It seems to me that this would be a good place to put a subsection which is
> something like, "A note about reclaiming disk space" or something. In my
> experience, most people hear about and end up using VACUUM FULL because
> things got out of control and they want to get into a better spot (I have been
> in that boat). I think with a small section that says, in essence,
> "hey, now that
> you understand why/how vacuum reclaims disk resources normally, if you're
> in a position where things aren't in a good state, this is what you need to know
> if you want to reclaim space from a really inefficient table"
>
> For me, at least, I think it would be easier to read/grok what you're sharing in
> these callouts.
>
> 6) One last missing piece that very well might be in another page not referenced
> (I obviously need to get the PG16 docs pulled and built locally so
> that I can have
> better overall reference. My apologies).
>
> In my experience, one of the biggest issues with the thresholds and recovering
> space is the idea of tuning individual tables, not just the entire
> database. 5/10/20%
> might be fine for most tables, but it's usually the really active ones
> that need the
> tuning, specifically lowering the thresholds. That doesn't come across to me in
> this section at all. Again, maybe I've missed something on another page and
> it's all good, but it felt worth calling out.
>
> Plus, it may provide an opportunity to bring in the threshold formulas again if
> they aren't referenced elsewhere (although they probably are).
>
> Hope that makes sense.
>
> ** Section 2.5.2: Freezing to manage... **
> As stated above, the effort here overall is great IMO. I like the flow
> and reduction
> in alarmist tone for things like wraparound, etc. I understand more
> about freezing,
> aggressive and otherwise, than I did before.
>
> 7) That said, totally speaking as a non-contributor, this section is
> obviously very long
> for good reason. But, by the time I've gotten down to 25.2.2.3, my
> brain is a bit
> bewildered on where we've gotten to. That's more a comment on my capability
> to process it all, but I wonder if a slightly more explicit intro
> could help set the
> stage at least.
>
> "One side-effect of vacuum and transaction ID management at the row level is
> that PostgreSQL would normally need to inspect each row for every query to
> ensure it is visible to each requesting transaction. In order to
> reduce the need to
> read and inspect excessive amounts of data at query time or when normal vacuum
> maintenance kicks in, VACUUM has a second job called freezing, which
> accomplishes three goals: (attempting to tie in the three sections)
>  * speeding up queries and vacuum operations by...
>  * advancing the transaction ID space on generally static tables...
>  * ensure there are always free transaction IDs available for normal
> operation...
> "
>
> Maybe totally worthless and too much, but something like that might set a reader
> up for just a bit more context. Then you could take most of what comes before
> "2.5.2.2.1 Aggressive Vacuum" as a subsection (would require a renumber below)
> with something like "2.5.2.2.1 Normal Freezing Activity"
>
> 8) Note "In PostgreSQL versions before 16..."
> Showing my naivety, somehow this isn't connecting with me totally. If
> it's important
> to call out, then maybe we need a connecting sentence. Based on the content
> above, I think you're pointing to "It's also why VACUUM will freeze all eligible
> tuples from a heap page once the decision to freeze at least one tuple
> is taken:"
> If that's it, it's just not clear to me what's totally changed. Sorry,
> more learning. :-)
>
> ---
> Hope something in there is helpful.
>
> Ryan Booz
>
> >
> > --
> > Peter Geoghegan



pgsql-hackers by date:

Previous
From: Ryan Booz
Date:
Subject: Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing
Next
From: Greg Stark
Date:
Subject: Re: smgrzeroextend clarification