Re: Eagerly scan all-visible pages to amortize aggressive vacuum - Mailing list pgsql-hackers

From Robert Treat
Subject Re: Eagerly scan all-visible pages to amortize aggressive vacuum
Date
Msg-id CAJSLCQ2YARY0oJcv+ic=nQ1KNKpjrMRKQm9eoc0QDgvPX66e7Q@mail.gmail.com
Whole thread Raw
In response to Re: Eagerly scan all-visible pages to amortize aggressive vacuum  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Eagerly scan all-visible pages to amortize aggressive vacuum
List pgsql-hackers
On Fri, Jan 24, 2025 at 3:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 24, 2025 at 3:02 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > This thought exercise made me realize something is wrong with my
> > current patch, though. If you set the failure tolerance
> > (vacuum_eager_scan_max_fails) to 0 right now, it disables eager
> > scanning altogether. That might be unexpected. You would probably
> > expect setting that to 0 to still allow eager scanning if it is only
> > succeeding. That made me think that perhaps I should have a -1 value
> > that disables eager scanning altogether and 0 just sets the failure
> > tolerance to 0.
>
> I would not expect that. I would expect setting a value of N to mean
> "try until we have N failures". "Try until you have 0 failures" is
> logically equivalent to "don't try".
>

Hmm... isn't "don't try" really more logically equivalent to "stop
after zero failures", with "try until zero failures" more of the
inverse... more like "never stop never stopping"?

> > I think you're right. I would go with a percentage. I don't see many
> > other GUCs that are percents. What would you call it? Perhaps
> > vacuum_eager_scan_fail_threshold? The % of the blocks in the table
> > vacuum may scan and fail to freeze.
> >
> > There is an issue I see with making it a percentage. The current
> > default vacuum_eager_scan_max_fails is 128 out of 4096. That means you
> > are allowed to scan about 3% of the blocks in the table even if you
> > fail to freeze every one. I don't think there are very many reasonable
> > values above 256, personally. So, it might be weird to add a
> > percentage GUC value with only very low acceptable values. Part of
> > this is that we are not making the success cap configurable, so that
> > means that you might have lots of extra I/O if you are both failing
> > and succeeding. Someone configuring this GUC might think this is
> > controlling the amount of extra I/O they are incurring.
>
> Why would it be unreasonable to set this value to 25% or even 100%? I
> grant that it doesn't sound like the most prudent possible value, but
> if I'm willing to fritter away my VACUUM resources to have the best
> possible chance of eagerly freezing stuff, isn't that up to me? I
> think setting this value to 100% would be WAY less damaging than
> work_mem='1TB' or autovacuum_naptime='7d', both of which are allowed.
> In fact, setting this to 100% could theoretically have no negative
> consequences at all, if it so happens that no freeze failures occur.
> Couldn't it even be a win, if freeze failures are common but
> minimizing the impact of aggressive vacuum is of overwhelming
> importance?
>

Yeah, I don't see much reason to be concerned about this being a foot
gun; I don't think there is a way to configure it to be more
disruptive than the folks I have seen running vacuum freeze in cron
jobs :-)

And I do think there is potential upside; we've mostly talked about
this in cases that use settings close to the defaults, but I think
there is a significant number of people who use numbers quite
different from the defaults (specifically, increasing
autovac_max_freeze_age / vacuum_freeze_table_age considerably higher)
where a higher failure rate is probably worth it, especially as vacuum
time increases. (It feels very analogous to checkpoint smoothing when
put this way)

> > The while was meant to contrast the sentence before it about typically
> > only scanning pages that have been modified since the previous vacuum.
> > But, I see how that didn't work out for me. How about this:
> >
> > """
> > <command>VACUUM</command> typically scans pages that have been
> > modified since the last vacuum. However, some all-visible but not
> > all-frozen pages are eagerly scanned to try and freeze them. Note that
> > the <structfield>relfrozenxid</structfield> can only be advanced when
> > every page of the table that might contain unfrozen XIDs is scanned.
> > """
>
> "to try and freeze them" seems a little awkward to me. The rest seems fine.
>
> "in the hope that we can freeze them"?
>
> That still doesn't seem great to me. Maybe it's worth expending more
> text here, not sure.
>

Yeah, this feels like a net negative. As I see it, we're trying to
connect three not-obviously related ideas for users who are trying to
understand how this system works especially with regards to
relfrozenxid advanced (based on the section of the docs we are in),
loosely
1. you've probably heard vacuum mainly scans modified pages for cleanup
2. you might notice it also scans non-modified pages, because
sometimes it wants to freeze stuff
3. you might think, if we are freezing stuff, why doesn't relfrozenxid advance?

So that middle bit is trying to act as glue that pulls this all
together. I thought the previous version was closer, with Haas's
feedback I might go with something more like this:

    <para>
-    <command>VACUUM</command> normally only scans pages that have been modified
-    since the last vacuum, but
<structfield>relfrozenxid</structfield> can only be
-    advanced when every page of the table
-    that might contain unfrozen XIDs is scanned.  This happens when
+    While <command>VACUUM</command> typically scans pages that have been
+    modified since the last vacuum, it may also eagerly scan some
all-visible but not
+    all-frozen pages in an attempt to freeze them, but the
+    <structfield>relfrozenxid</structfield> will only be advanced when
+    every page of the table that might contain unfrozen XIDs is scanned.
+    This happens when

> > I rewrote the comment using some of your input and trying to clarify
> > my initial point about opportunistic vs guaranteed freezing.
> >
> > """
> > If FreezeLimit has not advanced past the relfrozenxid or
> > MultiXactCutoff has not advanced past relminmxid, we are unlikely to
> > be able to freeze anything new.
> > Tuples with XIDs older than OldestXmin or MXIDs older than OldestMxact
> > are technically freezable, but we won't freeze them unless some other
> > criteria for opportunistic freezing is met.
>
> It's not super clear to me what "some other criteria" means here, but
> maybe it's fine.
>
> > It is also possible that a transaction newer than the FreezeLimit has
> > ended, rendering additional tuples freezable.
> > As a heuristic, however, it makes sense to wait until the FreezeLimit
> > or MultiXactCutoff has advanced before eager scanning.
> > """
> >

Yeah, I also think the "some other criteria" part seems poor, although
tbh this all feels like a net negative to the previous wording or
Haas's suggested change, like you're describing what the code below it
does rather than what the intentions of the code are. For example, I
like the "freeze horizon" language vs explicit FreezeLimit since the
code could change even if the goal doesn't. But in lue of going back
to that wording (modulo your 1 line explanation in your email), if I
were to attempt to split the difference:

We only want to
enable eager scanning if we are likely to be able to freeze some of
the pages in the relation, which is unlikely if FreezeLimit has not
advanced past
relfrozenxid or if MultiXactCutoff has not advanced passed
relminmxid. Granted, there may be pages we didn't try to freeze
before, or some previously blocking XID greater than FreezeLimit may
have now ended (allowing for freezing), but as a heuristic we wait
until the
FreezeLimit advances to increase our chances of successful freezing.


Robert Treat
https://xzilla.net



pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: Convert sepgsql tests to TAP
Next
From: Jacob Brazeal
Date:
Subject: Re: MAX_BACKENDS size (comment accuracy)