Re: Eagerly scan all-visible pages to amortize aggressive vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Eagerly scan all-visible pages to amortize aggressive vacuum
Date
Msg-id CAD21AoBTWQ6AiaJ2GZMfPvDg86_2_hP1H7pXctggH3UoBrgeBg@mail.gmail.com
Whole thread Raw
In response to Re: Eagerly scan all-visible pages to amortize aggressive vacuum  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Mon, Jan 27, 2025 at 9:45 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Fri, Jan 24, 2025 at 3:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Fri, Jan 24, 2025 at 3:02 PM Melanie Plageman
> > <melanieplageman@gmail.com> wrote:
> > > I think you're right. I would go with a percentage. I don't see many
> > > other GUCs that are percents. What would you call it? Perhaps
> > > vacuum_eager_scan_fail_threshold? The % of the blocks in the table
> > > vacuum may scan and fail to freeze.
> > >
> > > There is an issue I see with making it a percentage. The current
> > > default vacuum_eager_scan_max_fails is 128 out of 4096. That means you
> > > are allowed to scan about 3% of the blocks in the table even if you
> > > fail to freeze every one. I don't think there are very many reasonable
> > > values above 256, personally. So, it might be weird to add a
> > > percentage GUC value with only very low acceptable values. Part of
> > > this is that we are not making the success cap configurable, so that
> > > means that you might have lots of extra I/O if you are both failing
> > > and succeeding. Someone configuring this GUC might think this is
> > > controlling the amount of extra I/O they are incurring.
> >
> > Why would it be unreasonable to set this value to 25% or even 100%? I
> > grant that it doesn't sound like the most prudent possible value, but
> > if I'm willing to fritter away my VACUUM resources to have the best
> > possible chance of eagerly freezing stuff, isn't that up to me? I
> > think setting this value to 100% would be WAY less damaging than
> > work_mem='1TB' or autovacuum_naptime='7d', both of which are allowed.
> > In fact, setting this to 100% could theoretically have no negative
> > consequences at all, if it so happens that no freeze failures occur.
> > Couldn't it even be a win, if freeze failures are common but
> > minimizing the impact of aggressive vacuum is of overwhelming
> > importance?
> >
> > Looking at ConfigureNamesReal[], some existing words we use to talk
> > about real-valued GUCs include: cost, fraction, factor, bias,
> > multiplier, target, rate. Of those, I like "fraction" and "rate" best
> > here, because they imply a range of 0-1 rather than anything broader.
> > vacuum_max_eager_freeze_failure_rate? Kinda long...
>
> attached v11 uses a fraction with this name. It follows the
> conventions and I find it descriptive.
>
> Changing the configuration to a fraction does mean that quite a few of
> the comments are different in this version. I think the description of
> the guc in
> config.sgml could use some review in particular. I tried to make it
> clear that the percentage is not the maximum number of blocks that
> could be eager scanned, because you may also eagerly scan blocks and
> succeed.
>
> Other note: I noticed AutoVacOpts that are floating point numbers (like
> vacuum_cost_delay) are float8s but their associated GUCs (like
> autovacuum_vacuum_cost_delay) are doubles. These are going to be equivalent,
> but I wanted to note the inconsistency in case I was missing something.
>
> > > """
> > > <command>VACUUM</command> typically scans pages that have been
> > > modified since the last vacuum. However, some all-visible but not
> > > all-frozen pages are eagerly scanned to try and freeze them. Note that
> > > the <structfield>relfrozenxid</structfield> can only be advanced when
> > > every page of the table that might contain unfrozen XIDs is scanned.
> > > """
> >
> > "to try and freeze them" seems a little awkward to me. The rest seems fine.
> >
> > "in the hope that we can freeze them"?
> >
> > That still doesn't seem great to me. Maybe it's worth expending more
> > text here, not sure.
>
> Given this and Treat's input from a different mail:
>
> > Yeah, this feels like a net negative. As I see it, we're trying to
> > connect three not-obviously related ideas for users who are trying to
> > understand how this system works especially with regards to
> > relfrozenxid advanced (based on the section of the docs we are in),
> > loosely
> > 1. you've probably heard vacuum mainly scans modified pages for cleanup
> > 2. you might notice it also scans non-modified pages, because
> > sometimes it wants to freeze stuff
> > 3. you might think, if we are freezing stuff, why doesn't relfrozenxid advance?
> >
> > So that middle bit is trying to act as glue that pulls this all
> > together. I thought the previous version was closer, with Haas's
> > feedback I might go with something more like this:
>
> I've mostly used his suggested wording in attached v11.
>
> > > I rewrote the comment using some of your input and trying to clarify
> > > my initial point about opportunistic vs guaranteed freezing.
> > >
> > > """
> > > If FreezeLimit has not advanced past the relfrozenxid or
> > > MultiXactCutoff has not advanced past relminmxid, we are unlikely to
> > > be able to freeze anything new.
> > > Tuples with XIDs older than OldestXmin or MXIDs older than OldestMxact
> > > are technically freezable, but we won't freeze them unless some other
> > > criteria for opportunistic freezing is met.
> >
> > It's not super clear to me what "some other criteria" means here, but
> > maybe it's fine.
>
> Given this thought and Treat's feedback below
>
> > Yeah, I also think the "some other criteria" part seems poor, although
> > tbh this all feels like a net negative to the previous wording or
> > Haas's suggested change, like you're describing what the code below it
> > does rather than what the intentions of the code are. For example, I
> > like the "freeze horizon" language vs explicit FreezeLimit since the
> > code could change even if the goal doesn't. But in lue of going back
> > to that wording (modulo your 1 line explanation in your email), if I
> > were to attempt to split the difference:
> >
> > We only want to
> > enable eager scanning if we are likely to be able to freeze some of
> > the pages in the relation, which is unlikely if FreezeLimit has not
> > advanced past
> > relfrozenxid or if MultiXactCutoff has not advanced passed
> > relminmxid. Granted, there may be pages we didn't try to freeze
> > before, or some previously blocking XID greater than FreezeLimit may
> > have now ended (allowing for freezing), but as a heuristic we wait
> > until the
> > FreezeLimit advances to increase our chances of successful freezing.
>
> I've updated the proposed text to mostly be in line with what he suggested.
>
> > > ended, rendering additional tuples freezable.
> > > As a heuristic, however, it makes sense to wait until the FreezeLimit
> > > or MultiXactCutoff has advanced before eager scanning.
> > > """
> > >
> > > > + ereport(INFO,
> > > > + (errmsg("Vacuum successfully froze %u eager scanned blocks of
> > > > \"%s.%s.%s\". Now disabling eager scanning.",
> > > >
> > > > I predict that if Tom sees this, you will get complaints about both
> > > > the wording of the message, which pretty clearly does not conform to
> > > > style guidelines for a primary error message, and also about the use
> > > > of the INFO level. Allow me to suggest DEBUG1 or DEBUG2 and "disabling
> > > > eager scanning after freezing %u eagerly scanned blocks".
> > >
> > > I've used your wording. Just for future reference, are the style
> > > guidelines I was violating the capitalization and punctuation? Are
> > > these documented somewhere? Also, what is a primary error message?
> > > INFO level? Or ones that use ereport and are translated? I looked at
> > > other messages and saw that they don't capitalize the first word or
> > > use punctuation, so I assume that those were problems.
> >
> > Yes. Primary error messages, i.e. errmsg(), are not capitalized and
> > punctuated, unlike errdetail() and errhint() messages, which are.
> >
> > See https://www.postgresql.org/docs/current/error-style-guide.html
> >
> > INFO level is used for VERY few things. I can't tell you off the top
> > of my head when it's appropriate, but I think the answer is "almost
> > never".
>
> I have changed it to DEBUG2. I started it as INFO because the other vacuum
> messages about whether or not the vacuum is an aggressive vacuum were at INFO
> level.I don't know what a user might prefer. Treat said this downthread:
>
> > Maybe, but one of the areas that INFO is used for is providing
> > additional details in VACUUM VERBOSE output, and this seems like it
> > would be pretty useful information to have if you are trying to
> > discern changes in i/o rate during a vacuum, or trying to tune the
> > failure rate setting, or several other related fields (including
> > automated capture by tools like pganalyze), so I believe INFO is the
> > right choice for this.
>
> I'm happy to go either way. I don't want users mad about verbosity, but if they
> think it's helpful, then that seems good.
>

Thank you for updating the patch. I was reviewing the v10 patch and
had some comments. I believe these comments are still valid for v11,
but please ignore them if outdated.

+       if (TransactionIdIsNormal(vacrel->cutoffs.relfrozenxid) &&
+               TransactionIdPrecedesOrEquals(vacrel->cutoffs.relfrozenxid,
+
   vacrel->cutoffs.FreezeLimit))
+               oldest_unfrozen_before_cutoff = true;
+
+       if (!oldest_unfrozen_before_cutoff &&
+               MultiXactIdIsValid(vacrel->cutoffs.relminmxid) &&
+               MultiXactIdPrecedesOrEquals(vacrel->cutoffs.relminmxid,
+
 vacrel->cutoffs.MultiXactCutoff))
+               oldest_unfrozen_before_cutoff = true;

Given that our freeze check strictly checks if an xid is older than
the cutoff (exclusive bound), I think we should check if the
relfrozenxid and relminmxid strictly precede the cutoff values.

---
        if (*was_eager_scanned)
            vacrel->eager_scanned_pages++;

How about incrementing this counter near the place where incrementing
scanned_pages (i.e., at the beginning of the loop of
heap_vac_scan_next_block())? It would make it easy to understand the
difference between eager_scanned_pages and scanned_pages.

---
+ * No vm_page_frozen output parameter (like what is passed to
+ * lazy_scan_prune()) is passed here because empty pages are always frozen and
+ * thus could never be eagerly scanned.

The last part "thus could never be eagerly scanned" confused me a bit;
IIUC we count all pages that are scanned because of this new eager
scan feature as "eagerly scanned pages". We increment
eager_scanned_pages counter even if the page is either new or empty.
This fact seems to contradict the sentence "empty pages could never be
eagerly scanned".

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Sami Imseih
Date:
Subject: Re: POC: track vacuum/analyze cumulative time per relation
Next
From: Nathan Bossart
Date:
Subject: Re: Adding extension default version to \dx