Thread: Decoupling antiwraparound autovacuum from special rules around auto cancellation

Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

19 October 2022, 21:58:37

I think that we should decouple the PROC_VACUUM_FOR_WRAPAROUND
autocancellation behavior in ProcSleep() from antiwraparound
autovacuum itself. In other words I think that it should be possible
to cancel an autovacuum that happens to be an antiwraparound
autovacuum, just as if were any other autovacuum -- because it usually
is no different in any real practical sense. Or at least it shouldn't
be seen as fundamentally different to other autovacuums at first,
before relfrozenxid attains an appreciably greater age (definition of
"appreciably greater" is TBD).

Why should the PROC_VACUUM_FOR_WRAPAROUND behavior happen on *exactly*
the same timeline as the one used to launch an antiwraparound
autovacuum, though? There is no inherent reason why we have to do both
things at exactly the same XID-age-wise time. But there is reason to
think that doing so could make matters worse rather than better [1].

More generally I think that it'll be useful to perform "aggressive
behaviors" on their own timeline, with no two distinct aggressive
behaviors applied at exactly the same time. In general we ought to
give a less aggressive approach some room to succeed before escalating
to a more aggressive approach -- we should see if a less aggressive
approach will work on its own. The failsafe is the most aggressive
intervention of all. The PROC_VACUUM_FOR_WRAPAROUND behavior is almost
as aggressive, and should happen sooner. Antiwraparound autovacuum
itself (which is really a separate thing to
PROC_VACUUM_FOR_WRAPAROUND) is less aggressive still. Then you have
things like the cutoffs in vacuumlazy.c that control things like
freezing.

In short, having an "escalatory" approach that applies each behavior
at different times. The exact timelines we'd want are of course
debatable, but the value of having multiple distinct timelines (one
per aggressive behavior) is far less debatable. We should give
problems a chance to "resolve themselves", at least up to a point.

The latest version of my in progress VACUUM patch series [2]
completely removes the concept of aggressive VACUUM as a discrete mode
of operation inside vacuumlazy.c. Every existing "aggressive-ish
behavior" will be retained in some form or other, but they'll be
applied on separate timelines, in proportion to the problem at hand.
For example, we'll have a separate XID cutoff for waiting for a
cleanup lock the hard way -- we will no longer use FreezeLimit for
that, since that doesn't give freezing a chance to happen in the next
VACUUM. The same VACUUM operation that is the first one that is
capable of freezing should ideally not *also* be the first one that
has to wait for a cleanup lock. We should be willing to put off
waiting for a cleanup lock for much longer than we're willing to put
off freezing. Reusing the same cutoff just makes life harder.

Clearly the idea of decoupling the PROC_VACUUM_FOR_WRAPAROUND behavior
from antiwraparound autovacuum is conceptually related to my patch
series, but it can be treated as separate work. That's why I'm
starting another thread now.

There is another idea in that patch series that also seems worth
mentioning as relevant (but not essential) to this discussion on this
thread: it would be better if antiwraparound autovacuum was simply
another way to launch an autovacuum, which isn't fundamentally
different to any other. I believe that users will find this conceptual
model a lot easier, especially in a world where antiwraparound
autovacuums naturally became rare (which is the world that the big
patch series seeks to bring about). It'll make antiwraparound
autovacuum "the threshold of last resort", only needed when
conventional tuple-based thresholds don't trigger at all for an
extended period of time (e.g., for static tables).

Perhaps it won't be trivial to fix autovacuum.c in the way I have in
mind (which is to split PROC_VACUUM_FOR_WRAPAROUND into two flags that
serve two separate purposes). I haven't considered if we're
accidentally relying on the coupling to avoid confusion within
autovacuum.c. That doesn't seem important right now, though.

[1] https://www.tritondatacenter.com/blog/manta-postmortem-7-27-2015
[2] https://postgr.es/m/CAH2-WzkU42GzrsHhL2BiC1QMhaVGmVdb5HR0_qczz0Gu2aSn=A@mail.gmail.com
--
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Jeff Davis

Date:

20 October 2022, 18:09:00

On Wed, 2022-10-19 at 14:58 -0700, Peter Geoghegan wrote:
> Why should the PROC_VACUUM_FOR_WRAPAROUND behavior happen on
> *exactly*
> the same timeline as the one used to launch an antiwraparound
> autovacuum, though?

The terminology is getting slightly confusing here: by
"antiwraparound", you mean that it's not skipping unfrozen pages, and
therefore is able to advance relfrozenxid. Whereas the
PROC_VACUUM_FOR_WRAPAROUND is the same thing, except done with greater
urgency because wraparound is imminent. Right?

> There is no inherent reason why we have to do both
> things at exactly the same XID-age-wise time. But there is reason to
> think that doing so could make matters worse rather than better [1].

Can you explain?

Regards,
    Jeff Davis

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

20 October 2022, 18:52:03

On Thu, Oct 20, 2022 at 11:09 AM Jeff Davis <pgsql@j-davis.com> wrote:
> The terminology is getting slightly confusing here: by
> "antiwraparound", you mean that it's not skipping unfrozen pages, and
> therefore is able to advance relfrozenxid. Whereas the
> PROC_VACUUM_FOR_WRAPAROUND is the same thing, except done with greater
> urgency because wraparound is imminent. Right?

Not really.

I started this thread to discuss a behavior in autovacuum.c and proc.c
(the autocancellation behavior), which is, strictly speaking, not
related to the current vacuumlazy.c behavior we call aggressive mode
VACUUM. Various hackers have in the past described antiwraparound
autovacuum as "implying aggressive", which makes sense; what's the
point in doing an antiwraparound autovacuum that can almost never
advance relfrozenxid?

It is nevertheless true that antiwraparound autovacuum is an
independent behavior to aggressive VACUUM. The former is an autovacuum
thing, and the latter is a VACUUM thing. That's just how it works,
mechanically.

If this division seems artificial or pedantic to you, then consider
the fact that you can quite easily get a non-aggressive antiwraparound
autovacuum by using the storage option called
autovacuum_freeze_max_age (instead of the GUC):

https://postgr.es/m/CAH2-Wz=DJAokY_GhKJchgpa8k9t_H_OVOvfPEn97jGNr9W=deg@mail.gmail.com

This is even a case where we'll output a distinct description in the
server log when autovacuum logging is enabled and gets triggered. So
while there may be no point in an antiwraparound autovacuum that is
non-aggressive, that doesn't stop them from happening. Regardless of
whether or not that's an intended behavior, that's just how the
mechanism has been constructed.

> > There is no inherent reason why we have to do both
> > things at exactly the same XID-age-wise time. But there is reason to
> > think that doing so could make matters worse rather than better [1].
>
> Can you explain?

Why should the special autocancellation behavior for antiwraparound
autovacuums kick in at exactly the same point that we first launch an
antiwraparound autovacuum? Maybe that aggressive intervention will be
needed, in the end, but why start there?

With my patch series, antiwraparound autovacuums still occur, but
they're confined to things like static tables -- things that are
pretty much edge cases. They still need to behave sensibly (i.e.
reliably advance relfrozenxid based on some principled approach), but
now they're more like "an autovacuum that happens because no other
condition triggered an autovacuum". To some degree this is already the
case, but I'd like to be more deliberate about it.

Leaving my patch series aside, I still don't think that it makes sense
to make it impossible to auto-cancel antiwraparound autovacuums,
across the board, regardless of the underlying table age. We still
need something like that, but why not give a still-cancellable
autovacuum worker a chance to resolve the problem? Why take a risk of
causing much bigger problems (e.g., blocking automated DDL that blocks
simple SELECT queries) before the point that that starts to look like
the lesser risk (compared to hitting xidStopLimit)?

--
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

22 October 2022, 00:39:55

On Thu, Oct 20, 2022 at 11:52 AM Peter Geoghegan <pg@bowt.ie> wrote:
> Leaving my patch series aside, I still don't think that it makes sense
> to make it impossible to auto-cancel antiwraparound autovacuums,
> across the board, regardless of the underlying table age.

One small thought on the presentation/docs side of this: maybe it
would be better to invent a new kind of autovacuum that has the same
purpose as antiwraparound autovacuum, but goes by a different name,
and doesn't have the special behavior around cancellations. We
wouldn't have to change anything about the behavior of antiwraparound
autovacuum once we reached the point of needing one.

Maybe we wouldn't even need to invent a new user-visible name for this
other kind of autovacuum. While even this so-called "new kind of
autovacuum" will be rare once my main patch series gets in, it'll
still be a totally normal occurrence. Whereas antiwraparound
autovacuums are sometimes described as an emergency mechanism.

That way we wouldn't be fighting against the widely held perception
that antiwraparound autovacuums are scary. In fact that reputation
would be fully deserved, for the first time. There are lots of
problems with the idea that antiwraparound autovacuum is kind of an
emergency thing right now, but this would make things fit the
perception, "fixing" the perception. Antiwraparound autovacuums would
become far far rarer under this scheme, but when they did happen
they'd be clear cause for concern. A useful signal for users, who
should ideally aim to never see *any* antiwraparound autovacuums.

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Jeff Davis

Date:

24 October 2022, 04:32:39

On Fri, 2022-10-21 at 17:39 -0700, Peter Geoghegan wrote:
> One small thought on the presentation/docs side of this: maybe it
> would be better to invent a new kind of autovacuum

It's possible this would be easier for users to understand: one process
that does cleanup work over time in a way that minimizes interference;
and another process that activates in more urgent situations (perhaps
due to misconfiguration of the first process).

But we should be careful that we don't end up with more confusion. For
something like that to work, we'd probably want the second process to
not be configurable at all, and we'd want it to be issuing WARNINGs
pointing to what might be misconfigured, and otherwise just be
invisible.

>  That way we wouldn't be fighting against the widely held perception
> that antiwraparound autovacuums are scary.

There's certainly a terminology problem there. Just to brainstorm on
some new names, we might want to call it something like "xid
reclamation" or "xid horizon advancement".

When it starts to run out, we can use words like "wraparound" or
"exhaustion".

--
Jeff Davis
PostgreSQL Contributor Team - AWS

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

24 October 2022, 14:25:06

On Sun, Oct 23, 2022 at 9:32 PM Jeff Davis <pgsql@j-davis.com> wrote:
> It's possible this would be easier for users to understand: one process
> that does cleanup work over time in a way that minimizes interference;
> and another process that activates in more urgent situations (perhaps
> due to misconfiguration of the first process).

I think that the new "early" version of antiwraparound autovacuum
(that can still be autocancelled) would simply be called autovacuum.
It wouldn't appear as "autovacuum to prevent wraparound" in places
like pg_stat_activity. For the most part users wouldn't have to care
about the difference between these autovacuums and traditional
non-antiwraparound autovacuums. They really would be exactly the same
thing, so it would make sense if users typically noticed no difference
whatsoever (at least in contexts like pg_stat_activity).

> But we should be careful that we don't end up with more confusion. For
> something like that to work, we'd probably want the second process to
> not be configurable at all, and we'd want it to be issuing WARNINGs
> pointing to what might be misconfigured, and otherwise just be
> invisible.

There should be some simple scheme for determining when an
antiwraparound autovacuum (non-cancellable autovacuum to advance
relfrozenxid/relminmxid) should run (applied by the autovacuum.c
scheduling logic). Something like "table has attained an age that's
now 2x autovacuum_freeze_max_age, or 1/2 of vacuum_failsafe_age,
whichever is less".

The really important thing is giving a regular/early autocancellable
autovacuum triggered by age(relfrozenxid) *some* opportunity to run. I
strongly suspect that the exact details won't matter too much,
provided we manage to launch at least one such autovacuum before
escalating to traditional antiwraparound autovacuum (which cannot be
autocancelled). Even if regular/early autovacuum had just one
opportunity to run to completion, we'd already be much better off. The
hazards from blocking automated DDL in a way that leads to a very
disruptive traffic jam (like in the Joyent Manta postmortem) would go
way down.

> >  That way we wouldn't be fighting against the widely held perception
> > that antiwraparound autovacuums are scary.
>
> There's certainly a terminology problem there. Just to brainstorm on
> some new names, we might want to call it something like "xid
> reclamation" or "xid horizon advancement".

I think that we should simply call it autovacuum. Under this scheme,
antiwraparound autovacuum would be a qualitatively different kind of
operation to users (though not to vacuumlazy.c), because it would not
be autocancellable in the standard way. And because users should take
it as a signal that things aren't really working well (otherwise we
wouldn't have reached the point of requiring a scary antiwraparound
autovacuum in the first place). Right now antiwraparound autovacuums
are both an emergency thing (or at least described as such in one or
two areas of the source code), and a completely routine occurrence.
This is deeply confusing.

Separately, I plan on breaking out insert-triggered autovacuums from
traditional dead tuple triggered autovacuums [1], which creates a need
to invent some kind of name to differentiate the new table age
triggering criteria from both insert-driven and dead tuple driven
autovacuums. These are all fundamentally the same operations with the
same urgency to users, though. We'd only need to describe the
*criteria* that *triggered* the autovacuum in our autovacuum log
report (actually we'd still report autovacuums aš antiwraparound
autovacuum in cases where that still happened, which won't be
presented as just another triggering criteria in the report).

[1] https://www.postgresql.org/message-id/flat/CAH2-WznEqmkmry8feuDK8XdpH37-4anyGF7a04bWXOc1GKd0Yg@mail.gmail.com
--
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Jeff Davis

Date:

24 October 2022, 15:42:45

On Mon, 2022-10-24 at 07:25 -0700, Peter Geoghegan wrote:
> The really important thing is giving a regular/early autocancellable
> autovacuum triggered by age(relfrozenxid) *some* opportunity to run.

+1. That principle seems both reasonable from a system standpoint and
understandable to a user.

> Even if regular/early autovacuum had just one
> opportunity to run to completion, we'd already be much better off.

By "opportunity", you mean that, regardless of configuration, the
cancellable autovacuum would at least start; though it still might be
cancelled by DDL. Right?

> These are all fundamentally the same operations with the
> same urgency to users, though. We'd only need to describe the
> *criteria* that *triggered* the autovacuum in our autovacuum log
> report

Hmm... I'm worried that could be a bit confusing depending on how it's
done. Let's be clear that it was merely the triggering criteria and
doesn't necessarily represent the work that is being done.

There are enough cases that it would be good to start a document and
outline the end behavior that your patch series is designed to
accomplish. In other words, a before/after of the interesting cases.

--
Jeff Davis
PostgreSQL Contributor Team - AWS

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

24 October 2022, 16:00:42

On Mon, Oct 24, 2022 at 8:43 AM Jeff Davis <pgsql@j-davis.com> wrote:
> > Even if regular/early autovacuum had just one
> > opportunity to run to completion, we'd already be much better off.
>
> By "opportunity", you mean that, regardless of configuration, the
> cancellable autovacuum would at least start; though it still might be
> cancelled by DDL. Right?

Yes, exactly.

It might be difficult as a practical matter to make sure that we
*reliably* give autovacuum.c the opportunity to launch a "standard"
autovacuum tasked with advancing relfrozenxid (just after
autovacuum_freeze_max_age is first crossed) before the point that a
scary antiwraparound autovacuum is launched. So we might end up giving
it more XID slack than it's likely to ever need (say by only launching
a traditional antiwraparound autovacuum against tables that attain an
age that is twice the value of autovacuum_freeze_max_age). These are
all just details, though -- the important principle is that we try our
utmost to give the less disruptive strategy a chance to succeed before
concluding that it has failed, and then "escalating" to a traditional
antiwraparound autovacuum.

> > These are all fundamentally the same operations with the
> > same urgency to users, though. We'd only need to describe the
> > *criteria* that *triggered* the autovacuum in our autovacuum log
> > report
>
> Hmm... I'm worried that could be a bit confusing depending on how it's
> done. Let's be clear that it was merely the triggering criteria and
> doesn't necessarily represent the work that is being done.

Maybe it could be broken out into a separate "autovacuum triggered by:
" line, seen only in the autovacuum log instrumentation (and not in
the similar report output by a manual VACUUM VERBOSE). When we still
end up "escalating" to an antiwraparound autovacuum, the "triggered
by:" line would match whatever we'd show in the benign the
non-cancellable-but-must-advance-relfrozenxid autovacuum case. The
difference would be that we'd now be reporting on a different
operation entirely (not just a regular autovacuum, a scary
antiwraparound autovacuum).

(Again, even these distinctions wouldn't be meaningful to vacuumlazy.c
itself -- it would just need to handle the details around logging in a
way that gave users the right idea. There wouldn't be any special
discrete aggressive mode of operation anymore, assuming my big patch
set gets into Postgres 16 too.)

> There are enough cases that it would be good to start a document and
> outline the end behavior that your patch series is designed to
> accomplish. In other words, a before/after of the interesting cases.

That's on my TODO list. Mostly it's an independent thing to this
(antiwraparound) autovacuum stuff, despite the fact that both projects
share the same underlying philosophy.

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

25 November 2022, 22:47:57

On Mon, Oct 24, 2022 at 9:00 AM Peter Geoghegan <pg@bowt.ie> wrote:
> Maybe it could be broken out into a separate "autovacuum triggered by:
> " line, seen only in the autovacuum log instrumentation (and not in
> the similar report output by a manual VACUUM VERBOSE). When we still
> end up "escalating" to an antiwraparound autovacuum, the "triggered
> by:" line would match whatever we'd show in the benign the
> non-cancellable-but-must-advance-relfrozenxid autovacuum case. The
> difference would be that we'd now be reporting on a different
> operation entirely (not just a regular autovacuum, a scary
> antiwraparound autovacuum).

Attached WIP patch invents the idea of a regular autovacuum that is
tasked with advancing relfrozenxid -- which is really just another
trigger criteria, reported on in the server log in its autovacuum
reports. Of course we retain the idea of antiwraparound autovacuums.
The only difference is that they are triggered when table age has
advanced by twice the usual amount, which is presumably only possible
because a regular autovacuum couldn't start or couldn't complete in
time (most likely due to continually being auto-cancelled).

As I said before, I think that the most important thing is to give
regular autovacuuming a chance to succeed. The exact approach taken
has a relatively large amount of slack, but that probably isn't
needed. So the new antiwraparound cutoffs were chosen because they're
easy to understand and remember, which is fairly arbitrary.

Adding this to the upcoming CF.

-- 
Peter Geoghegan

Attachment

v1-0001-Add-table-age-trigger-concept-to-autovacuum.patch

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Laurenz Albe

Date:

26 November 2022, 17:58:20

On Fri, 2022-11-25 at 14:47 -0800, Peter Geoghegan wrote:
> Attached WIP patch invents the idea of a regular autovacuum that is
> tasked with advancing relfrozenxid -- which is really just another
> trigger criteria, reported on in the server log in its autovacuum
> reports. Of course we retain the idea of antiwraparound autovacuums.
> The only difference is that they are triggered when table age has
> advanced by twice the usual amount, which is presumably only possible
> because a regular autovacuum couldn't start or couldn't complete in
> time (most likely due to continually being auto-cancelled).
> 
> As I said before, I think that the most important thing is to give
> regular autovacuuming a chance to succeed. The exact approach taken
> has a relatively large amount of slack, but that probably isn't
> needed. So the new antiwraparound cutoffs were chosen because they're
> easy to understand and remember, which is fairly arbitrary.

The target is a table that receives no DML at all, right?
I think that is a good idea.
Wouldn't it make sense to trigger that at *half* "autovacuum_freeze_max_age"?

Yours,
Laurenz Albe

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

26 November 2022, 19:00:22

On Sat, Nov 26, 2022 at 9:58 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> The target is a table that receives no DML at all, right?

Sort of, but not really. The target is a table that doesn't get
vacuumed for any other reason -- I don't make any claims beyond that
one.

It seems a little too optimistic to suppose that such a table really
didn't need any vacuuming to deal with bloat just because autovacuum.c
didn't seem to think that it did.

> I think that is a good idea.
> Wouldn't it make sense to trigger that at *half* "autovacuum_freeze_max_age"?

That would be equivalent to what I've done here, provided you also
double the autovacuum_freeze_max_age setting. I did it this way
because I believe that it has fewer problems. The approach I took
makes the general perception that antiwraparound autovacuum are a
scary thing (really just needed for emergencies) become true, for the
first time.

We should expect to see very few antiwraparound autovacuums with the
patch, but when we do see even one it'll be after a less aggressive
approach was given the opportunity to succeed, but (for whatever
reason) failed. Just seeing any antiwraparound autovacuums will become
a useful signal of something being amiss in a way that it just isn't
at the moment. The docs can be updated to talk about antiwraparound
autovacuum as a thing that you should encounter approximately never.
This is possible even though the patch isn't invasive at all.

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Laurenz Albe

Date:

27 November 2022, 16:54:26

On Sat, 2022-11-26 at 11:00 -0800, Peter Geoghegan wrote:

> > I think that is a good idea.
> > Wouldn't it make sense to trigger that at *half* "autovacuum_freeze_max_age"?
> 
> That would be equivalent to what I've done here, provided you also
> double the autovacuum_freeze_max_age setting.

That's exactly what I was trying to debate.  Wouldn't it make sense to
trigger VACUUM earlier so that it has a chance of being less heavy?
On the other hand, if there are not sufficiently many modifications
on the table to trigger autovacuum, perhaps it doesn't matter in many
cases.

> I did it this way
> because I believe that it has fewer problems. The approach I took
> makes the general perception that antiwraparound autovacuum are a
> scary thing (really just needed for emergencies) become true, for the
> first time.
> 
> We should expect to see very few antiwraparound autovacuums with the
> patch, but when we do see even one it'll be after a less aggressive
> approach was given the opportunity to succeed, but (for whatever
> reason) failed.

Is that really so much less aggressive?  Will that autovacuum run want
to process all pages that are not all-frozen?  If not, it probably won't
do much good.  If yes, it will be just as heavy as an anti-wraparound
autovacuum (except that it won't block other sessions).

> Just seeing any antiwraparound autovacuums will become
> a useful signal of something being amiss in a way that it just isn't
> at the moment. The docs can be updated to talk about antiwraparound
> autovacuum as a thing that you should encounter approximately never.
> This is possible even though the patch isn't invasive at all.

True.  On the other hand, it might happen that after this, people start
worrying about normal autovacuum runs because they occasionally experience
a table age autovacuum that is much heavier than the other ones.  And
they can no longer tell the reason, because it doesn't show up anywhere.

Yours,
Laurenz Albe

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

27 November 2022, 18:34:28

On Sun, Nov 27, 2022 at 8:54 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> That's exactly what I was trying to debate.  Wouldn't it make sense to
> trigger VACUUM earlier so that it has a chance of being less heavy?
> On the other hand, if there are not sufficiently many modifications
> on the table to trigger autovacuum, perhaps it doesn't matter in many
> cases.

Maybe. There is a deeper problem here, though: table age is a really
terrible proxy for whether or not it's appropriate for VACUUM to
freeze preexisting all-visible pages. It's not obvious that half
autovacuum_freeze_max_age is much better than
autovacuum_freeze_max_age if your concern is avoiding getting too far
into debt on freezing. Afterall, this is debt that must be paid back
by freezing some number of physical heap pages, which in general has
approximately zero relationship with table age (we need physical units
for this, not logical units).

This is a long standing problem that I hope and expect will be fixed
in 16, by my ongoing work to completely remove the concept of
aggressive mode VACUUM:

https://commitfest.postgresql.org/40/3843/

This makes VACUUM care about both table age and the number of unfrozen
heap pages (mostly the latter). It weighs everything at the start of
each VACUUM, and decides on how it must advance relfrozenxid based on
the conditions in the table and the picture over time. Note that
performance stability is the main goal; we will not just keep
accumulating unfrozen pages for no good reason. All of the behaviors
previously associated with aggressive mode are retained, but are
individually applied on a timeline that is attuned to the needs of the
table (we can still wait for a cleanup lock, but that happens much
later than the point that the same page first becomes eligible for
freezing, not at exactly the same time).

In short, "aggressiveness" becomes a continuous thing, rather than a
discrete mode of operation, improving performance stability. We go
back to having only one kind of lazy vacuum, which is how things
worked prior to the introduction of the visibility map. (We did have
antiwraparound autovacuums in 8.3, but we did not have
aggressive/scan_all VACUUMs at the time.)

> Is that really so much less aggressive?  Will that autovacuum run want
> to process all pages that are not all-frozen?  If not, it probably won't
> do much good.  If yes, it will be just as heavy as an anti-wraparound
> autovacuum (except that it won't block other sessions).

Even if we assume that my much bigger patch set won't make it into 16,
it'll probably still be a good idea to do this in 16. I admit that I
haven't really given that question enough thought to be sure of that,
though. Naturally my goal is to get everything in. Hopefully I'll
never have to make that call.

It is definitely true that this patch is "the autovacuum side" of the
work from the other much larger patchset (which handles "the VACUUM
side" of things). This antiwraparound patch should probably be
considered in that context, even though it's theoretically independent
work. It just worked out that way.

> True.  On the other hand, it might happen that after this, people start
> worrying about normal autovacuum runs because they occasionally experience
> a table age autovacuum that is much heavier than the other ones.  And
> they can no longer tell the reason, because it doesn't show up anywhere.

But you can tell the reason, just by looking at the autovacuum log
reports. The only thing you can't do is see "(to prevent wraparound)"
in pg_stat_activity. That (and the autocancellation behavioral change)
are the only differences.

The big picture is that users really will have no good reason to care
very much about autovacuums that were triggered to advance
relfrozenxid (at least in the common case where we haven't needed to
make them antiwraparound autovacuums). They could almost (though not
quite) now be explained as "an autovacuum that takes place because
it's been a while since we did an autovacuum to deal with bloat and/or
tuple inserts". That will at least be reasonable if you assume all of
the patches get in.

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

30 December 2022, 03:01:35

On Fri, Nov 25, 2022 at 2:47 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Attached WIP patch invents the idea of a regular autovacuum that is
> tasked with advancing relfrozenxid -- which is really just another
> trigger criteria, reported on in the server log in its autovacuum
> reports.

Attached is v2, which is just to fix bitrot. Well, mostly. I did make
one functional change in v2: the autovacuum server log reports now
separately report on table XID age and table MultiXactId age, each as
its own distinct triggering condition.

I've heard informal reports that the difference between antiwraparound
autovacuums triggered by table XID age versus table MXID age can
matter a great deal. It isn't difficult to break out that detail
anyway, so even if the distinction isn't interesting all that often we
might as well surface it to users.

I still haven't made a start on the docs for this. I'm still not sure
how much work I should do on the docs in the scope of this project
versus my project that deals with related issues in VACUUM itself. The
existing material from the "Routine Vacuuming" docs has lots of
problems, and figuring out how to approach fixing those problems seems
kind of daunting right now.

--
Peter Geoghegan

Attachment

v2-0001-Add-table-age-trigger-concept-to-autovacuum.patch

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

09 January 2023, 01:49:20

On Thu, Dec 29, 2022 at 7:01 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Attached is v2, which is just to fix bitrot.

Attached is v3. We no longer apply vacuum_failsafe_age when
determining the cutoff for antiwraparound autovacuuming -- the new
approach is a bit simpler.

This is a fairly small change overall. Now any "table age driven"
autovacuum will also be antiwraparound when its
relfrozenxid/relminmxid attains an age that's either double the
relevant setting (either autovacuum_freeze_max_age or
effective_multixact_freeze_max_age), or 1 billion XIDs/MXIDs --
whichever is less.

That makes it completely impossible to disable antiwraparound
protections (the special antiwrap autocancellation behavior) for
table-age-driven autovacuums once table age exceeds 1 billion
XIDs/MXIDs. It's still possible to increase autovacuum_freeze_max_age
to well over a billion, of course. It just won't be possible to do
that while also avoiding the no-auto-cancellation behavior for those
autovacuums that are triggered due to table age crossing the
autovacuum_freeze_max_age/effective_multixact_freeze_max_age
threshold.

--
Peter Geoghegan

Attachment

v3-0001-Add-table-age-trigger-concept-to-autovacuum.patch

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Andres Freund

Date:

10 January 2023, 01:22:15

Hi,

On 2023-01-08 17:49:20 -0800, Peter Geoghegan wrote:
> Teach autovacuum.c to launch "table age" autovacuums at the same point
> that it previously triggered antiwraparound autovacuums.  Antiwraparound
> autovacuums are retained, but are only used as a true option of last
> resort, when regular autovacuum has presumably tried and failed to
> advance relfrozenxid (likely because the auto-cancel behavior kept
> cancelling regular autovacuums triggered based on table age).

I've also seen the inverse, with recent versions of postgres: Autovacuum can
only ever make progress if it's an anti-wraparound vacuum, because it'll
always get cancelled otherwise.  I'm worried that substantially increasing the
time until an anti-wraparound autovacuum happens will lead to more users
running into out-of-xid shutdowns.

I don't think it's safe to just increase the time at which anti-wrap vacuums
happen to a hardcoded 1 billion.

I'm also doubtful that it's ok to just make all autovacuums on relations with
an age > 1 billion anti-wraparound ones. For people that use a large
autovacuum_freeze_max_age that will be a rude awakening.

I am all in favor for adding logic to trigger autovacuum based on the table
age, without needing to reach autovacuum_freeze_max_age. It never made sense
to me that we get to the "emergency mode" in entirely normal operation. But
I'm not in favor of just entirely reinterpreting existing GUCs and adding
important thresholds as hardcoded numbers.

Greetings,

Andres Freund

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

10 January 2023, 01:40:06

On Mon, Jan 9, 2023 at 5:22 PM Andres Freund <andres@anarazel.de> wrote:
> I've also seen the inverse, with recent versions of postgres: Autovacuum can
> only ever make progress if it's an anti-wraparound vacuum, because it'll
> always get cancelled otherwise.  I'm worried that substantially increasing the
> time until an anti-wraparound autovacuum happens will lead to more users
> running into out-of-xid shutdowns.
>
> I don't think it's safe to just increase the time at which anti-wrap vacuums
> happen to a hardcoded 1 billion.

That's not what the patch does. It doubles the time that the anti-wrap
no-autocancellation behaviors kick in, up to a maximum of 1 billion
XIDs/MXIDs. So it goes from autovacuum_freeze_max_age to
autovacuum_freeze_max_age x 2, without changing the basic fact that we
initially launch autovacuums that advance relfrozenxid/relminmxid when
the autovacuum_freeze_max_age threshold is first crossed.

These heuristics are totally negotiable -- and likely should be
thought out in more detail. It's likely that most of the benefit of
the patch comes from simply trying to advance relfrozenxid without the
special auto-cancellation behavior one single time. The main problem
right now is that the threshold that launches most antiwraparound
autovacuums is exactly the same as the threshold that activates the
auto-cancellation protections. Even doing the latter very slightly
later than the former could easily make things much better, while
adding essentially no risk of the kind you're concerned about.

> I'm also doubtful that it's ok to just make all autovacuums on relations with
> an age > 1 billion anti-wraparound ones. For people that use a large
> autovacuum_freeze_max_age that will be a rude awakening.

Actually, users that have autovacuum_freeze_max_age set to over 1
billion will get exactly the same behavior as before (except that the
instrumentation of autovacuum will be better). It'll be identical.

If you set autovacuum_freeze_max_age to 2 billion, and a "standard"
autovacuum is launched on a table whose relfrozenxid age is 1.5
billion, it'll just be a regular dead tuples/inserted tuples
autovacuum, with the same old familiar locking characteristics as
today.

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Robert Haas

Date:

12 January 2023, 17:12:15

On Mon, Jan 9, 2023 at 8:40 PM Peter Geoghegan <pg@bowt.ie> wrote:
> That's not what the patch does. It doubles the time that the anti-wrap
> no-autocancellation behaviors kick in, up to a maximum of 1 billion
> XIDs/MXIDs. So it goes from autovacuum_freeze_max_age to
> autovacuum_freeze_max_age x 2, without changing the basic fact that we
> initially launch autovacuums that advance relfrozenxid/relminmxid when
> the autovacuum_freeze_max_age threshold is first crossed.

I'm skeptical about this kind of approach.

I do agree that it's good to slowly increase the aggressiveness of
VACUUM as we get further behind, rather than having big behavior
changes all at once, but I think that should happen by smoothly
varying various parameters rather than by making discrete behavior
changes at a whole bunch of different times. For instance, when VACUUM
goes into emergency mode, it stops respecting the vacuum delay. I
think that's great, but it happens all at once, and maybe it would be
better if it didn't. We could consider gradually ramping the vacuum
delay from 100% down to 0% instead of having it happen all at once.
Maybe that's not the right idea, I don't know, and a naive
implementation might be worse than nothing, but I think it has some
chance of being worth consideration.

But what the kind of change you're proposing here does is create
another threshold where the behavior changes suddenly, and I think
that's challenging from the point of view of understanding the
behavior of the system. The behavior already changes when you hit
vacuum_freeze_min_age and then again when you hit
vacuum_freeze_table_age and then there's also
autoovacuum_freeze_max_age and xidWarnLimit and xidStopLimit and a few
others, and these setting all interact in pretty complex ways. The
more conditional logic we add to that, the harder it becomes to
understand what's actually happening. You see a system where
age(relfrozenxid) = 673m and you need a calculator and a spreadsheet
to figure out what the vacuum behavior is at that point. Honestly, I
think we already have a problem with the behaviors here being too
complex for normal human beings to understand them, and I think that
the kinds of changes you are proposing here could make that quite a
bit worse.

Now, you might reply to the above by saying, well, some behaviors
can't vary continuously. vacuum_cost_limit can perhaps be phased out
gradually, but autocancellation seems like something that you must
either do, or not do. I would agree with that. But what I'm saying is
that we ought to favor having those kinds of behaviors all engage at
the same point rather than at different times. I'm not saying that
there can't ever be good reasons to separate out different behaviors
and have the engage at different times, but I think we will end up
better off if we minimize that sort of thing as much as we reasonably
can. In your opening email you write "Why should the
PROC_VACUUM_FOR_WRAPAROUND behavior happen on *exactly* the same
timeline as the one used to launch an antiwraparound autovacuum,
though?" and my answer is "because that's easier to understand and I
don't see that it has much of a downside."

I did take a look at the post-mortem to which you linked, but I am not
quite sure how that bears on the behavior change under discussion.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

12 January 2023, 19:22:14

On Thu, Jan 12, 2023 at 9:12 AM Robert Haas <robertmhaas@gmail.com> wrote:
> I do agree that it's good to slowly increase the aggressiveness of
> VACUUM as we get further behind, rather than having big behavior
> changes all at once, but I think that should happen by smoothly
> varying various parameters rather than by making discrete behavior
> changes at a whole bunch of different times.

In general I tend to agree, but, as you go on to acknowledge yourself,
this particular behavior is inherently discrete. Either the
PROC_VACUUM_FOR_WRAPAROUND behavior is in effect, or it isn't.

In many important cases the only kind of autovacuum that ever runs
against a certain big table is antiwraparound autovacuum. And
therefore every autovacuum that runs against the table must
necessarily not be auto cancellable. These are the cases where we see
disastrous interactions with automated DDL, such as a TRUNCATE run by
a cron job (to stop those annoying antiwraparound autovacuums) -- a
heavyweight lock traffic jam that causes the application to lock up.

All that I really want to do here is give an autovacuum that *can* be
auto cancelled *some* non-zero chance to succeed with these kinds of
tables. TRUNCATE completes immediately, so the AEL is no big deal.
Except when it's blocked behind an antiwraparound autovacuum. That
kind of interaction is occasionally just disastrous. Even just the
tiniest bit of wiggle room could avoid it in most cases, possibly even
almost all cases.

> Maybe that's not the right idea, I don't know, and a naive
> implementation might be worse than nothing, but I think it has some
> chance of being worth consideration.

It's a question of priorities. The failsafe isn't supposed to be used
(when it is it is a kind of a failure), and so presumably only kicks
in on very rare occasions, where nobody was paying attention anyway.
So far I've heard no complaints about this, but I've heard lots of
complaints about the antiwrap autocancellation behavior.

> The behavior already changes when you hit
> vacuum_freeze_min_age and then again when you hit
> vacuum_freeze_table_age and then there's also
> autoovacuum_freeze_max_age and xidWarnLimit and xidStopLimit and a few
> others, and these setting all interact in pretty complex ways. The
> more conditional logic we add to that, the harder it becomes to
> understand what's actually happening.

In general I strongly agree. In fact that's a big part of what
motivates my ongoing work on VACUUM. The user experience is central.

As Andres pointed out, presenting antiwraparound autovacuums as kind
of an emergency thing but also somehow a routine thing is just
horribly confusing. I want to make them into an emergency thing in
every sense -- something that you as a user can reasonably expect to
never see (like the failsafe). But if you do see one, then that's a
useful signal of an underlying problem with contention, say from
automated DDL that pathologically cancels autovacuums again and again.

> Now, you might reply to the above by saying, well, some behaviors
> can't vary continuously. vacuum_cost_limit can perhaps be phased out
> gradually, but autocancellation seems like something that you must
> either do, or not do. I would agree with that. But what I'm saying is
> that we ought to favor having those kinds of behaviors all engage at
> the same point rather than at different times.

Right now aggressive VACUUMs do just about all freezing at the same
time, to the extent that many users seem to think that it's a totally
different thing with totally different responsibilities to regular
VACUUM. Doing everything at the same time like that causes huge
practical problems, and is very confusing.

I think that users will really appreciate having only one kind of
VACUUM/autovacuum (since the other patch gets rid of discrete
aggressive mode VACUUMs). I want "table age autovacuuming" (as I
propose to call it) come to be seen as not any different to any other
autovacuum, such as an "insert tuples" autovacuum or a "dead tuples"
autovacuum. The difference is only in how autovacuum.c triggers the
VACUUM, not in any runtime behavior. That's an important goal here.

> I did take a look at the post-mortem to which you linked, but I am not
> quite sure how that bears on the behavior change under discussion.

The post-mortem involved a single "DROP TRIGGER" that caused chaos
when it interacted with the auto cancellation behavior. It would
usually completely instantly, so the AEL wasn't actually disruptive,
but one day antiwraparound autovacuum made the cron job effectively
block all reads and writes for hours.

The similar outages I was called in to help with personally had either
an automated TRUNCATE or an automated CREATE INDEX. Had autovacuum
only been willing to yield once or twice, then it probably would have
been fine -- the situation probably would have worked itself out
naturally. That's the best outcome you can hope for.

--
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Robert Haas

Date:

12 January 2023, 21:08:06

On Thu, Jan 12, 2023 at 2:22 PM Peter Geoghegan <pg@bowt.ie> wrote:
> All that I really want to do here is give an autovacuum that *can* be
> auto cancelled *some* non-zero chance to succeed with these kinds of
> tables. TRUNCATE completes immediately, so the AEL is no big deal.
> Except when it's blocked behind an antiwraparound autovacuum. That
> kind of interaction is occasionally just disastrous. Even just the
> tiniest bit of wiggle room could avoid it in most cases, possibly even
> almost all cases.

I doubt it. Wiggle room that's based on the XID threshold being
different for one behavior vs. another can easily fail to produce any
benefit, because there's no guarantee that the autovacuum launcher
will ever try to launch a worker against that table while the XID is
in the range where you'd get one behavior and not the other.  I've
long thought that the fact that vacuum_freeze_table_age is documented
as capped at 0.95 * autovacuum_freeze_max_age is silly for just this
reason. The interval that you're proposing is much wider so the
chances of getting a benefit are greater, but supposing that it's
going to solve it in most cases seems like an exercise in unwarranted
optimism.

In fact, I would guess that in fact it will very rarely solve the
problem. Normally, the XID age of a table never reaches
autovacuum_freeze_max_age in the first place. If it does, there's some
reason. Maybe there's a really old open transaction or an abandon
replication slot or an unresolved 2PC transaction. Maybe the
autovacuum system is overloaded and no table is getting visited
regularly because the system just can't keep up. Or maybe there are
regular AELs being taken on the table at issue. If there's only an AEL
taken against a table once in blue moon, some autovacuum attempt ought
to succeed before we reach autovacuum_freeze_max_age. Flipping that
around, if we reach autovacuum_freeze_max_age without advancing
relfrozenxid, and an AEL shows up behind us in the lock queue, it's
really likely that the reason *why* we've reached
autovacuum_freeze_max_age is that this same thing has happened to
every previous autovacuum attempt and they all cancelled themselves.
If we cancel ourselves too, we're just postponing resolution of the
problem to some future point when we decide to stop cancelling
ourselves. That's not a win.

> I think that users will really appreciate having only one kind of
> VACUUM/autovacuum (since the other patch gets rid of discrete
> aggressive mode VACUUMs). I want "table age autovacuuming" (as I
> propose to call it) come to be seen as not any different to any other
> autovacuum, such as an "insert tuples" autovacuum or a "dead tuples"
> autovacuum. The difference is only in how autovacuum.c triggers the
> VACUUM, not in any runtime behavior. That's an important goal here.

I don't agree with that goal. I think that having different kinds of
autovacuums with different, identifiable names and corresponding,
easily-identifiable behaviors is really important for troubleshooting.
Trying to remove those distinctions and make everything look the same
will not keep autovacuum from getting itself into trouble. It will
just make it harder to understand what's happening when it does.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

12 January 2023, 22:12:31

On Thu, Jan 12, 2023 at 1:08 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I doubt it. Wiggle room that's based on the XID threshold being
> different for one behavior vs. another can easily fail to produce any
> benefit, because there's no guarantee that the autovacuum launcher
> will ever try to launch a worker against that table while the XID is
> in the range where you'd get one behavior and not the other.

Of course it's true that in general it might not succeed in
forestalling the auto cancellation behavior. You can say something
similar about approximately anything like this. For example, there is
no absolute guarantee that any autovacuum will ever complete. But we
still try!

> I've long thought that the fact that vacuum_freeze_table_age is documented
> as capped at 0.95 * autovacuum_freeze_max_age is silly for just this
> reason. The interval that you're proposing is much wider so the
> chances of getting a benefit are greater, but supposing that it's
> going to solve it in most cases seems like an exercise in unwarranted
> optimism.

I don't claim to be dealing in certainties, especially about the final
outcome. Whether or not you accept my precise claim is perhaps not
important, in the end. What is important is that we give things a
chance to succeed, based on the information that we have available,
with a constant eye towards avoiding disaster scenarios.

Some of the problems with VACUUM seem to be cases where VACUUM takes
on a potentially ruinous obligation, that it cannot possibly meet in
some rare cases that do come up sometimes -- like the cleanup lock
behavior. Is a check for $1000 written by me really worth less than a
check written by me for a billion dollars? They're both nominally
equivalent guarantees about an outcome, after all, though one has a
far greater monetary value. Which would you value more, subjectively?

Nothing is guaranteed -- even (and perhaps especially) strong guarantees.

> In fact, I would guess that in fact it will very rarely solve the
> problem. Normally, the XID age of a table never reaches
> autovacuum_freeze_max_age in the first place. If it does, there's some
> reason.

Probably, but none of this matters at all if the table age never
reaches autovacuum_freeze_max_age in the first place. We're only
talking about tables where that isn't the case, by definition.
Everything else is out of scope here.

> Maybe there's a really old open transaction or an abandon
> replication slot or an unresolved 2PC transaction. Maybe the
> autovacuum system is overloaded and no table is getting visited
> regularly because the system just can't keep up. Or maybe there are
> regular AELs being taken on the table at issue.

Maybe an asteroid hits the datacenter, making all of these
considerations irrelevant. But perhaps it won't!

> If there's only an AEL
> taken against a table once in blue moon, some autovacuum attempt ought
> to succeed before we reach autovacuum_freeze_max_age. Flipping that
> around, if we reach autovacuum_freeze_max_age without advancing
> relfrozenxid, and an AEL shows up behind us in the lock queue, it's
> really likely that the reason *why* we've reached
> autovacuum_freeze_max_age is that this same thing has happened to
> every previous autovacuum attempt and they all cancelled themselves.

Why do you assume that a previous autovacuum ever got launched in the
first place? There is always going to be a certain kind of table that
can only get an autovacuum when its table age crosses
autovacuum_freeze_max_age. And it's not just static tables -- there is
very good reason to have doubts about the statistics that drive
autovacuum. Plus vacuum_freeze_table_age works very unreliably (which
is why my big VACUUM patch more or less relegates it to a
compatibility option, while retaining a more sophisticated notion of
table age creating pressure to advance relfrozenxid).

Under the scheme from this autovacuum patch, it really does become
reasonable to make a working assumption that there was a previous
autovacuum, that failed (likely due to the autocancellation behavior,
as you said). We must have tried and failed in an earlier autovacuum,
once we reach the point of needing an antiwraparound autovacuum
(meaning a table age autovacuum which cannot be autocancelled) --
which is not the case today at all. If nothing else, table age
autovacuums will have been scheduled much earlier on -- they will have
at least started up, barring pathological cases.

That's a huge difference in the strength of the signal, compared to today.

The super aggressive autocancellation behavior is actually
proportionate to the problem at hand. Kind of like how if you go to
the doctor and tell them you have a headache, they don't schedule you
for emergency brain surgery. What they do is tell you to take an
aspirin, and make sure that you stay well hydrated -- if the problem
doesn't go away after a few days, then call back, reassess. Perhaps it
really will be a brain tumor, but there is nothing to gain and
everything to lose by taking such drastic action at the first sign of
trouble.

> If we cancel ourselves too, we're just postponing resolution of the
> problem to some future point when we decide to stop cancelling
> ourselves. That's not a win.

It's also only a very minor loss, relative to what would have happened
without any of this. This is something that we can be relatively sure
of (unlike anything about final outcomes). It's clear that we have a
lot to gain. What do we have to lose, really?

> > I think that users will really appreciate having only one kind of
> > VACUUM/autovacuum (since the other patch gets rid of discrete
> > aggressive mode VACUUMs). I want "table age autovacuuming" (as I
> > propose to call it) come to be seen as not any different to any other
> > autovacuum, such as an "insert tuples" autovacuum or a "dead tuples"
> > autovacuum. The difference is only in how autovacuum.c triggers the
> > VACUUM, not in any runtime behavior. That's an important goal here.
>
> I don't agree with that goal. I think that having different kinds of
> autovacuums with different, identifiable names and corresponding,
> easily-identifiable behaviors is really important for troubleshooting.

You need to distinguish between different types of autovacuums and
different types of VACUUMs here. Sure, it's valuable to have
information about why autovacuum launched a VACUUM, and the patch
greatly improves that. But runtime behavior is another story.

It's not really generic behavior -- more like generic policies that
produce different behavior under different runtime conditions. VACUUM
has always had generic policies about how to do things, at least up
until the introduction of the visibility map, which added
scan_all/aggressive VACUUMs, and the vacuum_freeze_table_age GUC. The
policy should be the same in every VACUUM, which the behavior itself
emerges from.

> Trying to remove those distinctions and make everything look the same
> will not keep autovacuum from getting itself into trouble. It will
> just make it harder to understand what's happening when it does.

The point isn't to have every VACUUM behave in the same way. The point
is to make decisions dynamically, based on the observed conditions in
the table. And to delay committing to things until there really is no
alternative, to maximize our opportunities to avoid disaster. In
short: loose, springy behavior.

Imposing absolute obligations on VACUUM has the potential to create
lots of problems. It is sometimes necessary, but can easily be
overused, making a bad situation much worse.

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Andres Freund

Date:

13 January 2023, 21:59:56

Hi,

On 2023-01-12 16:08:06 -0500, Robert Haas wrote:
> Normally, the XID age of a table never reaches autovacuum_freeze_max_age in
> the first place.

That's not at all my experience. I often see it being the primary reason for
autovacuum vacuuming large tables on busy OLTP systems. Even without any
longrunning transactions or such, with available autovac workers and without
earlier autovacuums getting interrupted by locks.  Once a table is large,
reasonable scale factors require a lot of changes to accumulate to trigger an
autovacuum, and during a vacuum a lot of transactions complete, leading to
large tables having a significant age by the time autovac finishes.

The most common "bad" reason for reaching autovacuum_freeze_max_age that I see
is cost limits not allowing vacuum to complete on time.

Perhaps we should track how often autovacuum was triggered by what reason in
a relation's pgstats? That'd make it a lot easier to collect data, both for
tuning the thresholds on a system, and for hacking on postgres.

Tracking the number of times autovacuum was interruped due to a lock request
might be a good idea as well?

I think it'd be a good idea to split off the part of the patch that introduces
AutoVacType / adds logging for what triggered. That's independently useful,
likely uncontroversial and makes the remaining patch smaller.

I'd also add the trigger to the pg_stat_activity entry for the autovac
worker. Additionally I think we should add information about using failsafe
mode to the p_s_a entry.

I've wished for a non-wraparound, xid age based, "autovacuum trigger" many
times, FWIW. And I've seen plenty of places write their own userspace version
of it, because without it they run into trouble.  However, I don't like that
the patch infers the various thresholds using magic constants / multipliers.

autovacuum_freeze_max_age is really a fairly random collection of things:
1) triggers autovacuum on tables based on age, in addition to the dead tuple /
   inserted tuples triggers
2) prevents auto-cancellation of autovacuum
3) starts autovacuum, even if autovacuum is disabled

IME hitting 1) isn't a reason for concern, it's perfectly normal. Needing 2)
to make progress is a bit more concerning. 3) should rarely be needed, but is
a good safety mechanism.

I doubt that controlling all of them via one GUC is sensible.

If I understand the patch correctly, we now have the following age based
thresholds for av:

- force-enable autovacuum:
  oldest_datfrozenxid + autovacuum_freeze_max_age < nextXid
- autovacuum based on age:
  freeze_max_age = Min(autovacuum_freeze_max_age, table_freeze_max_age)
    tableagevac = relfrozenxid < recentXid - freeze_max_age
- prevent auto-cancellation:
  freeze_max_age = Min(autovacuum_freeze_max_age, table_freeze_max_age)
  prevent_auto_cancel_age = Min(freeze_max_age * 2, 1 billion)
  prevent_auto_cancel = reflrozenxid < recentXid - prevent_auto_cancel_age

Is that right?

One thing I just noticed: Isn't it completely bonkers that we compute
recentXid/recentMulti once at the start of a worker in
relation_needs_vacanalyze()?  That's fine for the calls in do_autovacuum()'s
initial loops over all tables. But seems completely wrong for the later calls
via table_recheck_autovac() -> recheck_relation_needs_vacanalyze() ->
relation_needs_vacanalyze()?

These variables really shouldn't be globals. It makes sense to cache them
locally in do_autovacuum(), but reusing them
recheck_relation_needs_vacanalyze() and sharing it between launcher and worker
is bad.

Greetings,

Andres Freund

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

14 January 2023, 00:13:45

On Fri, Jan 13, 2023 at 2:00 PM Andres Freund <andres@anarazel.de> wrote:
> On 2023-01-12 16:08:06 -0500, Robert Haas wrote:
> > Normally, the XID age of a table never reaches autovacuum_freeze_max_age in
> > the first place.
>
> That's not at all my experience. I often see it being the primary reason for
> autovacuum vacuuming large tables on busy OLTP systems. Even without any
> longrunning transactions or such, with available autovac workers and without
> earlier autovacuums getting interrupted by locks.  Once a table is large,
> reasonable scale factors require a lot of changes to accumulate to trigger an
> autovacuum, and during a vacuum a lot of transactions complete, leading to
> large tables having a significant age by the time autovac finishes.

I've definitely seen this. I've also noticed that TPC-C's stock and
customer tables (both of which are subject to many HOT updates) only
ever receive antiwraparound autovacuums, even with my very aggressive
autovacuum settings.

Overall, I think that it's quite common among the largest tables, even
when things are running normally.

> The most common "bad" reason for reaching autovacuum_freeze_max_age that I see
> is cost limits not allowing vacuum to complete on time.

There are many problems with the statistics driving this whole
process, that I won't rehash right now. I actually think that the
whole idea of relying on statistical sampling for dead tuples is
fundamentally just bogus (that's not how statistics work in general),
though I have quite a few less fundamental and more concrete
complaints about the statistics just being wrong on their own terms.

> Perhaps we should track how often autovacuum was triggered by what reason in
> a relation's pgstats? That'd make it a lot easier to collect data, both for
> tuning the thresholds on a system, and for hacking on postgres.

That would definitely be useful.

> Tracking the number of times autovacuum was interruped due to a lock request
> might be a good idea as well?

Also useful information worth having.

> I think it'd be a good idea to split off the part of the patch that introduces
> AutoVacType / adds logging for what triggered. That's independently useful,
> likely uncontroversial and makes the remaining patch smaller.

I like that idea.

Attached revision v4 breaks things up mechanically, along those lines
(no real changes to the code inself, though). The controversial parts
of the patch are indeed a fairly small proportion of the total
changes.

> I'd also add the trigger to the pg_stat_activity entry for the autovac
> worker. Additionally I think we should add information about using failsafe
> mode to the p_s_a entry.

I agree that that's all useful, but it seems like it can be treated as
later work.

> I've wished for a non-wraparound, xid age based, "autovacuum trigger" many
> times, FWIW. And I've seen plenty of places write their own userspace version
> of it, because without it they run into trouble.  However, I don't like that
> the patch infers the various thresholds using magic constants / multipliers.

As I said, these details are totally negotiable, and likely could be a
lot better without much effort.

What are your concerns about the thresholds? For example, is it that
you can't configure the behavior directly at all? Something else?

> autovacuum_freeze_max_age is really a fairly random collection of things:
> 1) triggers autovacuum on tables based on age, in addition to the dead tuple /
>    inserted tuples triggers
> 2) prevents auto-cancellation of autovacuum
> 3) starts autovacuum, even if autovacuum is disabled
>
> IME hitting 1) isn't a reason for concern, it's perfectly normal. Needing 2)
> to make progress is a bit more concerning. 3) should rarely be needed, but is
> a good safety mechanism.
>
> I doubt that controlling all of them via one GUC is sensible.

I agree, of course, but just to be clear: I don't think it matters
that we couple together 1 and 3. In fact it's good that we do that,
because the point to the user is that they cannot disable table-age
(i.e. what we current call antiwraparound) autovacuums -- that just
makes sense.

The only problem that I see is that item 2 is tied to the other items
from your list.

> If I understand the patch correctly, we now have the following age based
> thresholds for av:
>
> - force-enable autovacuum:
>   oldest_datfrozenxid + autovacuum_freeze_max_age < nextXid
> - autovacuum based on age:
>   freeze_max_age = Min(autovacuum_freeze_max_age, table_freeze_max_age)
>     tableagevac = relfrozenxid < recentXid - freeze_max_age
> - prevent auto-cancellation:
>   freeze_max_age = Min(autovacuum_freeze_max_age, table_freeze_max_age)
>   prevent_auto_cancel_age = Min(freeze_max_age * 2, 1 billion)
>   prevent_auto_cancel = reflrozenxid < recentXid - prevent_auto_cancel_age
>
> Is that right?

That summary looks accurate, but I'm a bit confused about why you're
asking the question this way. I thought that it was obvious that the
patch doesn't change most of these things.

The only mechanism that the patch changes is related to "prevent
auto-cancellation" behaviors -- which is now what the term
"antiwraparound" refers to. It does change the name of "autovacuum
based on age", though -- the name is now "table age autovacuum" (the
old name was antiwraparound autovacuum, of course). As I pointed out
to you already, it's mechanically impossible for any autovacuum to be
antiwraparound unless it's an XID table age/MXID table age autovacuum.

The naming convention I propose here makes it a little confusing for
us to discuss, but it seems like the best thing for users. Users'
basic intuitions about antiwraparound autovacuums (that they're scary
things needed because wraparound is starting to become a real concern)
don't need to change. If anything they become more accurate, because
antiwraparound autovacuums become non-routine -- which is really how
it should have been when autovacuum was first added IMV. Users have
rather good reasons to find antiwraparound autovacuums scary, even
though that's kind of wrong (it's really our fault for making it so
confusing for them, not their fault for being wrong).

> One thing I just noticed: Isn't it completely bonkers that we compute
> recentXid/recentMulti once at the start of a worker in
> relation_needs_vacanalyze()?  That's fine for the calls in do_autovacuum()'s
> initial loops over all tables. But seems completely wrong for the later calls
> via table_recheck_autovac() -> recheck_relation_needs_vacanalyze() ->
> relation_needs_vacanalyze()?
>
> These variables really shouldn't be globals. It makes sense to cache them
> locally in do_autovacuum(), but reusing them
> recheck_relation_needs_vacanalyze() and sharing it between launcher and worker
> is bad.

I am not sure. I do hope that there isn't some subtle way in which the
design relies on that. It seems obviously weird, and so I have to
wonder if there is a reason behind it that isn't immediately apparent.

--
Peter Geoghegan

Hi,

On 2023-01-18 12:15:17 -0800, Peter Geoghegan wrote:
> On Wed, Jan 18, 2023 at 11:02 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > On Wed, Jan 18, 2023 at 1:31 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > > pgstat_report_analyze() will totally override the
> > > tabentry->dead_tuples information that drives autovacuum.c, based on
> > > an estimate derived from a random sample -- which seems to me to be an
> > > approach that just doesn't have any sound theoretical basis.
> >
> > Yikes. I think we don't have a choice but to have a method to correct
> > the information somehow, because AFAIK the statistics system is not
> > crash-safe. But that approach does seem to carry significant risk of
> > overwriting correct information with wrong information.

I suggested nearby to only have ANALYZE dead_tuples it if there's been no
[auto]vacuum since the stats entry was created. That allows recovering from
stats resets, be it via crashes or explicitly. What do you think?

To add insult to injury, we overwrite accurate information gathered by VACUUM
with bad information gathered by ANALYZE if you do VACUUM ANALYZE.



One complicating factor is that VACUUM sometimes computes an incrementally
more bogus n_live_tup when it skips pages due to the VM, whereas ANALYZE
computes something sane. I unintentionally encountered one when I was trying
something while writing this email, reproducer attached.


VACUUM (DISABLE_PAGE_SKIPPING) foo;
SELECT n_live_tup, n_dead_tup FROM pg_stat_user_tables WHERE relid = 'foo'::regclass;
┌────────────┬────────────┐
│ n_live_tup │ n_dead_tup │
├────────────┼────────────┤
│    9000001 │     500000 │
└────────────┴────────────┘

after one VACUUM:
┌────────────┬────────────┐
│ n_live_tup │ n_dead_tup │
├────────────┼────────────┤
│    8549905 │     500000 │
└────────────┴────────────┘

after 9 more VACUUMs:
┌────────────┬────────────┐
│ n_live_tup │ n_dead_tup │
├────────────┼────────────┤
│    5388421 │     500000 │
└────────────┴────────────┘
(1 row)


I briefly tried it out, and it does *not* reproduce in 11, but does in
12. Haven't dug into what the cause is, but we probably use the wrong
denominator somewhere...

Greetings,

Andres Freund

Attachment

vactest.sql

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

18 January 2023, 21:12:27

On Wed, Jan 18, 2023 at 1:02 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Some of what I'm proposing arguably amounts to deliberately adding a
> bias. But that's not an unreasonable thing in itself. I think of it as
> related to the bias-variance tradeoff, which is a concept that comes
> up a lot in machine learning and statistical inference.

To be clear, I was thinking of unreservedly trusting what
pgstat_report_analyze() says about dead_tuples in the event of its
estimate increasing our dead_tuples estimate, while being skeptical
(to a varying degree) when it's the other way around.

But now I need to go think about what Andres just brought up...

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Andres Freund

Date:

18 January 2023, 21:42:40

Hi,

On 2023-01-18 13:08:44 -0800, Andres Freund wrote:
> One complicating factor is that VACUUM sometimes computes an incrementally
> more bogus n_live_tup when it skips pages due to the VM, whereas ANALYZE
> computes something sane. I unintentionally encountered one when I was trying
> something while writing this email, reproducer attached.
> 
> 
> VACUUM (DISABLE_PAGE_SKIPPING) foo;
> SELECT n_live_tup, n_dead_tup FROM pg_stat_user_tables WHERE relid = 'foo'::regclass;
> ┌────────────┬────────────┐
> │ n_live_tup │ n_dead_tup │
> ├────────────┼────────────┤
> │    9000001 │     500000 │
> └────────────┴────────────┘
> 
> after one VACUUM:
> ┌────────────┬────────────┐
> │ n_live_tup │ n_dead_tup │
> ├────────────┼────────────┤
> │    8549905 │     500000 │
> └────────────┴────────────┘
> 
> after 9 more VACUUMs:
> ┌────────────┬────────────┐
> │ n_live_tup │ n_dead_tup │
> ├────────────┼────────────┤
> │    5388421 │     500000 │
> └────────────┴────────────┘
> (1 row)
> 
> 
> I briefly tried it out, and it does *not* reproduce in 11, but does in
> 12. Haven't dug into what the cause is, but we probably use the wrong
> denominator somewhere...

Oh, it does actually reproduce in 11 too - my script just didn't see it
because it was "too fast". For some reason < 12 it takes longer for the new
pgstat snapshot to be available. If I add a few sleeps, it shows in 11.

The real point of change appears to be 10->11.

There's a relevant looking difference in the vac_estimate_reltuples call:
10:
    /* now we can compute the new value for pg_class.reltuples */
    vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
                                                         nblocks,
                                                         vacrelstats->tupcount_pages,
                                                         num_tuples);

11:
    /* now we can compute the new value for pg_class.reltuples */
    vacrelstats->new_live_tuples = vac_estimate_reltuples(onerel,
                                                          nblocks,
                                                          vacrelstats->tupcount_pages,
                                                          live_tuples);
which points to:

commit 7c91a0364fcf5d739a09cc87e7adb1d4a33ed112
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   2018-03-22 15:47:29 -0400

    Sync up our various ways of estimating pg_class.reltuples.

Greetings,

Andres Freund

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

18 January 2023, 21:45:19

On Wed, Jan 18, 2023 at 1:08 PM Andres Freund <andres@anarazel.de> wrote:
> I suggested nearby to only have ANALYZE dead_tuples it if there's been no
> [auto]vacuum since the stats entry was created. That allows recovering from
> stats resets, be it via crashes or explicitly. What do you think?

I like that idea. It's far simpler than the kind of stuff I was
thinking about, and probably just as effective. Even if it introduces
some unforeseen problem (which seems unlikely), we can still rely on
pgstat_report_vacuum() to set things straight before too long.

Are you planning on writing a patch for this? I'd be very interested
in seeing this through. Could definitely review it.

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Andres Freund

Date:

18 January 2023, 22:22:39

Hi,

On 2023-01-18 13:42:40 -0800, Andres Freund wrote:
> The real point of change appears to be 10->11.
>
> There's a relevant looking difference in the vac_estimate_reltuples call:
> 10:
>     /* now we can compute the new value for pg_class.reltuples */
>     vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
>                                                          nblocks,
>                                                          vacrelstats->tupcount_pages,
>                                                          num_tuples);
>
> 11:
>     /* now we can compute the new value for pg_class.reltuples */
>     vacrelstats->new_live_tuples = vac_estimate_reltuples(onerel,
>                                                           nblocks,
>                                                           vacrelstats->tupcount_pages,
>                                                           live_tuples);
> which points to:
>
> commit 7c91a0364fcf5d739a09cc87e7adb1d4a33ed112
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Date:   2018-03-22 15:47:29 -0400
>
>     Sync up our various ways of estimating pg_class.reltuples.

The problem with the change is here:

    /*
     * Okay, we've covered the corner cases.  The normal calculation is to
     * convert the old measurement to a density (tuples per page), then
     * estimate the number of tuples in the unscanned pages using that figure,
     * and finally add on the number of tuples in the scanned pages.
     */
    old_density = old_rel_tuples / old_rel_pages;
    unscanned_pages = (double) total_pages - (double) scanned_pages;
    total_tuples = old_density * unscanned_pages + scanned_tuples;
    return floor(total_tuples + 0.5);


Because we'll re-scan the pages for not-yet-removable rows in subsequent
vacuums, the next vacuum will process the same pages again. By using
scanned_tuples = live_tuples, we basically remove not-yet-removable tuples
from reltuples, each time.

The commit *did* try to account for that to some degree:

+    /* also compute total number of surviving heap entries */
+    vacrelstats->new_rel_tuples =
+        vacrelstats->new_live_tuples + vacrelstats->new_dead_tuples;


but new_rel_tuples isn't used for pg_class.reltuples or pgstat.


This is pretty nasty. We use reltuples for a lot of things. And while analyze
might fix it sometimes, that won't reliably be the case, particularly when
there are repeated autovacuums due to a longrunning transaction - there's no
cause for auto-analyze to trigger again soon, while autovacuum will go at it
again and again.

Greetings,

Andres Freund

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

18 January 2023, 22:37:20

On Wed, Jan 18, 2023 at 2:22 PM Andres Freund <andres@anarazel.de> wrote:
> The problem with the change is here:
>
>         /*
>          * Okay, we've covered the corner cases.  The normal calculation is to
>          * convert the old measurement to a density (tuples per page), then
>          * estimate the number of tuples in the unscanned pages using that figure,
>          * and finally add on the number of tuples in the scanned pages.
>          */
>         old_density = old_rel_tuples / old_rel_pages;
>         unscanned_pages = (double) total_pages - (double) scanned_pages;
>         total_tuples = old_density * unscanned_pages + scanned_tuples;
>         return floor(total_tuples + 0.5);

My assumption has always been that vac_estimate_reltuples() is prone
to issues like this because it just doesn't have access to very much
information each time it runs. It can only see the delta between what
VACUUM just saw, and what the last VACUUM (or possibly the last
ANALYZE) saw according to pg_class. You're always going to find
weaknesses in such a model if you go looking for them. You're always
going to find a way to salami slice your way from good information to
total nonsense, if you pick the right/wrong test case, which runs
VACUUM in a way that allows whatever bias there may be to accumulate.
It's sort of like the way floating point values can become very
inaccurate through a process that allows many small inaccuracies to
accumulate over time.

Maybe you're right to be concerned to the degree that you're concerned
-- I'm not sure. I'm just adding what I see as important context.

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

18 January 2023, 23:28:19

On Wed, Jan 18, 2023 at 2:37 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Maybe you're right to be concerned to the degree that you're concerned
> -- I'm not sure. I'm just adding what I see as important context.

The problems in this area tend to be that vac_estimate_reltuples()
behaves as if it sees a random sample, when in fact it's far from
random -- it's the same scanned_pages as last time, and the ten other
times before that. That's a feature of your test case, and other
similar vac_estimate_reltuples test cases that I came up with in the
past. Both scenarios involved skipping using the visibility map in
multiple successive VACUUM operations.

Perhaps we should make vac_estimate_reltuples focus on the pages that
VACUUM newly set all-visible each time (not including all-visible
pages that got scanned despite being all-visible) -- only that subset
of scanned_pages seems to be truly relevant. That way you wouldn't be
able to do stuff like this. We'd have to start explicitly tracking the
number of pages that were newly set in the VM in vacuumlazy.c to be
able to do that, but that seems like a good idea anyway.

This probably has consequences elsewhere, but maybe that's okay. We
know when the existing pg_class has no information, since that is
explicitly encoded by a reltuples of -1. Obviously we'd need to be
careful about stuff like that. Overall, the danger from assuming that
"unsettled" pages (pages that couldn't be newly set all-visible by
VACUUM) have a generic tuple density seems less than the danger of
assuming that they're representative. We know that we're bound to scan
these same pages in the next VACUUM anyway, so they'll have another
chance to impact our view of the table's tuple density at that time.

--
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

18 January 2023, 23:59:00

On Wed, Jan 18, 2023 at 3:28 PM Peter Geoghegan <pg@bowt.ie> wrote:
> The problems in this area tend to be that vac_estimate_reltuples()
> behaves as if it sees a random sample, when in fact it's far from
> random -- it's the same scanned_pages as last time, and the ten other
> times before that. That's a feature of your test case, and other
> similar vac_estimate_reltuples test cases that I came up with in the
> past. Both scenarios involved skipping using the visibility map in
> multiple successive VACUUM operations.

FWIW, the problem in your test case just goes away if you just change this line:

DELETE FROM foo WHERE i < (10000000 * 0.1)

To this:

DELETE FROM foo WHERE i < (10000000 * 0.065)

Which is not a huge difference, overall. This effect is a consequence
of the heuristics I added in commit 74388a1a, so it's only present on
Postgres 15+.

Whether or not this is sufficient protection is of course open to
interpretation. One problem with those heuristics (as far as your test
case is concerned) is that they either work, or they don't work. For
example they're conditioned on "old_rel_pages == total_page".

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Andres Freund

Date:

19 January 2023, 00:02:14

Hi,

On 2023-01-18 14:37:20 -0800, Peter Geoghegan wrote:
> On Wed, Jan 18, 2023 at 2:22 PM Andres Freund <andres@anarazel.de> wrote:
> > The problem with the change is here:
> >
> >         /*
> >          * Okay, we've covered the corner cases.  The normal calculation is to
> >          * convert the old measurement to a density (tuples per page), then
> >          * estimate the number of tuples in the unscanned pages using that figure,
> >          * and finally add on the number of tuples in the scanned pages.
> >          */
> >         old_density = old_rel_tuples / old_rel_pages;
> >         unscanned_pages = (double) total_pages - (double) scanned_pages;
> >         total_tuples = old_density * unscanned_pages + scanned_tuples;
> >         return floor(total_tuples + 0.5);
> 
> My assumption has always been that vac_estimate_reltuples() is prone
> to issues like this because it just doesn't have access to very much
> information each time it runs. It can only see the delta between what
> VACUUM just saw, and what the last VACUUM (or possibly the last
> ANALYZE) saw according to pg_class. You're always going to find
> weaknesses in such a model if you go looking for them. You're always
> going to find a way to salami slice your way from good information to
> total nonsense, if you pick the right/wrong test case, which runs
> VACUUM in a way that allows whatever bias there may be to accumulate.
> It's sort of like the way floating point values can become very
> inaccurate through a process that allows many small inaccuracies to
> accumulate over time.

Sure. To start with, there's always going to be some inaccuracies when you
assume an even distribution across a table. But I think this goes beyond
that.

This problem occurs with a completely even distribution, exactly the same
inputs to the estimation function every time.  My example under-sold the
severity, because I had only 5% non-deletable tuples. Here's it with 50%
non-removable tuples (I've seen way worse than 50% in many real-world cases),
and a bunch of complexity removed (attched).

vacuum-no    reltuples/n_live_tup    n_dead_tup
1        4999976            5000000
2        2500077            5000000
3        1250184            5000000
4         625266            5000000
5         312821            5000000
10          10165            5000000

Each vacuum halves reltuples.  That's going to screw badly with all kinds of
things. Planner costs completely out of whack etc.

I wonder if this is part of the reason for the distortion you addressed with
74388a1a / 3097bde7dd1d. I am somewhat doubtful they're right as is. For a
large relation 2% of blocks is a significant number of rows, and simply never
adjusting reltuples seems quite problematic. At the very least we ought to
account for dead tids we removed or such, instead of just freezing reltuples.

Greetings,

Andres Freund

Attachment

vactest_more_extreme.sql

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

19 January 2023, 00:19:02

On Wed, Jan 18, 2023 at 4:02 PM Andres Freund <andres@anarazel.de> wrote:
> vacuum-no       reltuples/n_live_tup    n_dead_tup
> 1               4999976                 5000000
> 2               2500077                 5000000
> 3               1250184                 5000000
> 4                625266                 5000000
> 5                312821                 5000000
> 10                10165                 5000000
>
> Each vacuum halves reltuples.  That's going to screw badly with all kinds of
> things. Planner costs completely out of whack etc.

I get that that could be a big problem, even relative to the more
immediate problem of VACUUM just spinning like it does in your test
case. What do you think we should do about it? What do you think about
my idea of focussing on the subset of pages newly set all-visible in
the VM?

> I wonder if this is part of the reason for the distortion you addressed with
> 74388a1a / 3097bde7dd1d. I am somewhat doubtful they're right as is. For a
> large relation 2% of blocks is a significant number of rows, and simply never
> adjusting reltuples seems quite problematic. At the very least we ought to
> account for dead tids we removed or such, instead of just freezing reltuples.

As I mentioned, it only kicks in when relpages is *precisely* the same
as last time (not one block more or one block less), *and* we only
scanned less than 2% of rel_pages. It's quite possible that that's
insufficient, but I can't imagine it causing any new problems.

I think that we need to be realistic about what's possible while
storing a small, fixed amount of information. There is always going to
be some distortion of this kind. We can do something about the
obviously pathological cases, where errors can grow without bound. But
you have to draw the line somewhere, unless you're willing to replace
the whole approach with something that stores historic metadata.

What kind of tradeoff do you want to make here? I think that you have
to make one.

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Andres Freund

Date:

19 January 2023, 00:37:18

Hi,

On 2023-01-18 13:45:19 -0800, Peter Geoghegan wrote:
> On Wed, Jan 18, 2023 at 1:08 PM Andres Freund <andres@anarazel.de> wrote:
> > I suggested nearby to only have ANALYZE dead_tuples it if there's been no
> > [auto]vacuum since the stats entry was created. That allows recovering from
> > stats resets, be it via crashes or explicitly. What do you think?
>
> I like that idea. It's far simpler than the kind of stuff I was
> thinking about, and probably just as effective. Even if it introduces
> some unforeseen problem (which seems unlikely), we can still rely on
> pgstat_report_vacuum() to set things straight before too long.
>
> Are you planning on writing a patch for this? I'd be very interested
> in seeing this through. Could definitely review it.

I can, it should be just about trivial code-wise. A bit queasy about trying to
forsee the potential consequences.

A somewhat related issue is that pgstat_report_vacuum() sets dead_tuples to
what VACUUM itself observed, ignoring any concurrently reported dead
tuples. As far as I can tell, when vacuum takes a long time, that can lead to
severely under-accounting dead tuples.

We probably loose track of a bit more than 50% of the dead tuples reported
since vacuum started. During the heap scan phase we don't notice all the
tuples reported before the current scan point, and then we don't notice them
at all during the index/heap vacuuming.

The only saving grace is that it'll be corrected at the next VACUUM. But the
next vacuum might very well be delayed noticably due to this.

This is similar to the issue around ins_since_vacuum that Peter pointed out.

I wonder if we ought to remember the dead_tuples value at the start of the
heap scan and use that to adjust the final dead_tuples value. I'd lean towards
over-counting rather than under-counting and thus probably would go for
something like

  tabentry->dead_tuples = livetuples + Min(0, tabentry->dead_tuples - deadtuples_at_start);

i.e. assuming we might have missed all concurrently reported dead tuples.

Of course we could instead move to something like ins_since_vacuum and reset
it at the *start* of the vacuum. But that'd make the error case harder,
without giving us more accuracy, I think?

I do think this is an argument for splitting up dead_tuples into separate
"components" that we track differently. I.e. tracking the number of dead
items, not-yet-removable rows, and the number of dead tuples reported from DML
statements via pgstats.

Greetings,

Andres Freund

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

19 January 2023, 01:00:48

On Wed, Jan 18, 2023 at 4:37 PM Andres Freund <andres@anarazel.de> wrote:
> I can, it should be just about trivial code-wise. A bit queasy about trying to
> forsee the potential consequences.

That's always going to be true, though.

> A somewhat related issue is that pgstat_report_vacuum() sets dead_tuples to
> what VACUUM itself observed, ignoring any concurrently reported dead
> tuples. As far as I can tell, when vacuum takes a long time, that can lead to
> severely under-accounting dead tuples.

Did I not mention that one? There are so many that it can be hard to
keep track! That's why I catalog them.

As you point out, it's the dead tuples equivalent of my
ins_since_vacuum complaint. The problem is exactly analogous to my
recent complaint about insert-driven autovacuums.

> I wonder if we ought to remember the dead_tuples value at the start of the
> heap scan and use that to adjust the final dead_tuples value. I'd lean towards
> over-counting rather than under-counting and thus probably would go for
> something like
>
>   tabentry->dead_tuples = livetuples + Min(0, tabentry->dead_tuples - deadtuples_at_start);
>
> i.e. assuming we might have missed all concurrently reported dead tuples.

This is exactly what I was thinking of doing for both issues (the
ins_since_vacuum one and this similar dead tuples one). It's
completely logical.

This creates an awkward but logical question, though: what if
dead_tuples doesn't go down at all? What if VACUUM actually has to
increase it, because VACUUM runs so slowly relative to the workload?
Of course the whole point is to make it more likely that VACUUM will
keep up with the workload. I'm just not quite sure that the
consequences of doing it that way are strictly a good thing. Bearing
in mind that we don't differentiate between recently dead and dead
here.

Fun fact: autovacuum can spin with pgbench because of recently dead
tuples, even absent an old snapshot/long running xact, if you set
things aggressively enough:

https://postgr.es/m/CAH2-Wz=sJm3tm+FpXbyBhEhX5tbz1trQrhG6eOhYk4-+5uL=ww@mail.gmail.com

I think that we probably need to do something like always make sure
that dead_items goes down by a small amount at the end of each VACUUM,
even when that's a lie. Maybe we also have a log message about
autovacuum not keeping up, so as to not feel too guilty about it. You
know, to give the user a chance to reconfigure autovacuum so that it
stops happening.

> Of course we could instead move to something like ins_since_vacuum and reset
> it at the *start* of the vacuum. But that'd make the error case harder,
> without giving us more accuracy, I think?

It would. It seems illogical to me.

> I do think this is an argument for splitting up dead_tuples into separate
> "components" that we track differently. I.e. tracking the number of dead
> items, not-yet-removable rows, and the number of dead tuples reported from DML
> statements via pgstats.

Is it? Why?

I'm all in favor of doing that, of course. I just don't particularly
think that it's related to this other problem. One problem is that we
count dead tuples incorrectly because we don't account for the fact
that things change while VACUUM runs. The other problem is that the
thing that is counted isn't broken down into distinct subcategories of
things -- things are bunched together that shouldn't be.

Oh wait, you were thinking of what I said before -- my "awkward but
logical question". Is that it?

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Andres Freund

Date:

19 January 2023, 01:49:05

Hi,

On 2023-01-18 16:19:02 -0800, Peter Geoghegan wrote:
> On Wed, Jan 18, 2023 at 4:02 PM Andres Freund <andres@anarazel.de> wrote:
> > vacuum-no       reltuples/n_live_tup    n_dead_tup
> > 1               4999976                 5000000
> > 2               2500077                 5000000
> > 3               1250184                 5000000
> > 4                625266                 5000000
> > 5                312821                 5000000
> > 10                10165                 5000000
> >
> > Each vacuum halves reltuples.  That's going to screw badly with all kinds of
> > things. Planner costs completely out of whack etc.
>
> I get that that could be a big problem, even relative to the more
> immediate problem of VACUUM just spinning like it does in your test
> case. What do you think we should do about it?

The change made in 7c91a0364fc imo isn't right. We need to fix it. I think
it's correct that now pgstat_report_vacuum() doesn't include recently dead
tuples in livetuples - they're still tracked via deadtuples.  But it's wrong
for vacuum's call to vac_update_relstats() to not include recently dead
tuples, at least when we only scanned part of the relation.

I think the right thing would be to not undo the semantic change of
7c91a0364fc as a whole, but instead take recently-dead tuples into account
only in the "Okay, we've covered the corner cases." part, to avoid the
spiraling seen above.

Not super clean, but it seems somewhat fundamental that we'll re-scan pages
full of recently-dead tuples in the near future. If we, in a way, subtract
the recently dead tuples from reltuples in this cycle, we shouldn't do so
again in the next - but not taking recently dead into account, does so.

It's a bit complicated because of the APIs involved. vac_estimate_reltuples()
computes vacrel->new_live_tuples and contains the logic for how to compute the
new reltuples. But we use the ->new_live_tuples both vac_update_relstats(),
where we, imo, should take recently-dead into account for partial scans and
pgstat_report_vacuum where we shouldn't.  I guess we would need to add an
output paramter both for "reltuples" and "new_live_tuples".

> What do you think about my idea of focussing on the subset of pages newly
> set all-visible in the VM?

I don't understand it yet.

On 2023-01-18 15:28:19 -0800, Peter Geoghegan wrote:
> Perhaps we should make vac_estimate_reltuples focus on the pages that
> VACUUM newly set all-visible each time (not including all-visible
> pages that got scanned despite being all-visible) -- only that subset
> of scanned_pages seems to be truly relevant. That way you wouldn't be
> able to do stuff like this. We'd have to start explicitly tracking the
> number of pages that were newly set in the VM in vacuumlazy.c to be
> able to do that, but that seems like a good idea anyway.

Can you explain a bit more what you mean with "focus on the pages" means?

> > I wonder if this is part of the reason for the distortion you addressed with
> > 74388a1a / 3097bde7dd1d. I am somewhat doubtful they're right as is. For a
> > large relation 2% of blocks is a significant number of rows, and simply never
> > adjusting reltuples seems quite problematic. At the very least we ought to
> > account for dead tids we removed or such, instead of just freezing reltuples.
>
> As I mentioned, it only kicks in when relpages is *precisely* the same
> as last time (not one block more or one block less), *and* we only
> scanned less than 2% of rel_pages. It's quite possible that that's
> insufficient, but I can't imagine it causing any new problems.

In OLTP workloads relpages will often not change, even if there's lots of
write activity, because there's plenty free space in the relation, and there's
something not-removable on the last page. relpages also won't change if data
is deleted anywhere but the end.

I don't think it's hard to see this causing problems. Set
autovacuum_vacuum_scale_factor to something smaller than 2% or somewhat
frequently vacuum manually. Incrementally delete old data.  Unless analyze
saves you - which might not be run or might have a different scale factor or
not be run manually - reltuples will stay exactly the same, despite data
changing substantially.

Greetings,

Andres Freund

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

19 January 2023, 01:54:27

On Fri, Jan 13, 2023 at 9:55 PM Andres Freund <andres@anarazel.de> wrote:
> How about a float autovacuum_no_auto_cancel_age where positive values are
> treated as absolute values, and negative values are a multiple of
> autovacuum_freeze_max_age? And where the "computed" age is capped at
> vacuum_failsafe_age? A "failsafe" autovacuum clearly shouldn't be cancelled.
>
> And maybe a default setting of -1.8 or so?

Attached is a new revision, v5. I'm not happy with this, but thought
it would be useful to show you where I am with it.

It's a bit awkward that we have a GUC (autovacuum_no_auto_cancel_age)
that can sometimes work as a cutoff that works similarly to both
freeze_max_age and multixact_freeze_max_age, but usually works as a
multiplier. It's both an XID age value, an MXID age value, and a
multiplier on XID/MXID age values.

What if it was just a simple multiplier on
freeze_max_age/multixact_freeze_max_age, without changing any other
detail?

-- 
Peter Geoghegan

Hi,

On 2023-01-18 18:21:33 -0800, Peter Geoghegan wrote:
> On Wed, Jan 18, 2023 at 5:49 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2023-01-18 15:28:19 -0800, Peter Geoghegan wrote:
> > > Perhaps we should make vac_estimate_reltuples focus on the pages that
> > > VACUUM newly set all-visible each time (not including all-visible
> > > pages that got scanned despite being all-visible) -- only that subset
> > > of scanned_pages seems to be truly relevant. That way you wouldn't be
> > > able to do stuff like this. We'd have to start explicitly tracking the
> > > number of pages that were newly set in the VM in vacuumlazy.c to be
> > > able to do that, but that seems like a good idea anyway.
> >
> > Can you explain a bit more what you mean with "focus on the pages" means?
>
> We don't say anything about pages we didn't scan right now -- only
> scanned_pages have new information, so we just extrapolate. Why not go
> even further than that, by only saying something about the pages that
> were both scanned and newly set all-visible?
>
> Under this scheme, the pages that were scanned but couldn't be newly
> set all-visible are treated just like the pages that weren't scanned
> at all -- they get a generic estimate from the existing reltuples.

I don't think that'd work well either. If we actually removed a large chunk of
the tuples in the table it should be reflected in reltuples, otherwise costing
and autovac scheduling suffers. And you might not be able to set all those
page to all-visible, because of more recent rows.


> > I don't think it's hard to see this causing problems. Set
> > autovacuum_vacuum_scale_factor to something smaller than 2% or somewhat
> > frequently vacuum manually. Incrementally delete old data.  Unless analyze
> > saves you - which might not be run or might have a different scale factor or
> > not be run manually - reltuples will stay exactly the same, despite data
> > changing substantially.
>
> You seem to be saying that it's a problem if we don't update reltuples
> -- an estimate -- when less than 2% of the table is scanned by VACUUM.
> But why? Why can't we just do nothing sometimes? I mean in general,
> leaving aside the heuristics I came up with for a moment?

The problem isn't that we might apply the heuristic once, that'd be fine. But
that there's nothing preventing it from applying until there basically are no
tuples left, as long as the vacuum is frequent enough.

As a demo: The attached sql script ends up with a table containing 10k rows,
but relpages being set 1 million.

VACUUM foo;
EXPLAIN (ANALYZE) SELECT * FROM foo;
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                QUERY PLAN                                                 │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Seq Scan on foo  (cost=0.00..14425.00 rows=1000000 width=4) (actual time=3.251..4.693 rows=10000 loops=1) │
│ Planning Time: 0.056 ms                                                                                   │
│ Execution Time: 5.491 ms                                                                                  │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(3 rows)


> It will cause problems if we remove the heuristics. Much less
> theoretical problems. What about those?

I don't think what I describe is a theoretical problem.


> How often does VACUUM scan so few pages, anyway? We've been talking
> about how ineffective autovacuum_vacuum_scale_factor is, at great
> length, but now you're saying that it *can* meaningfully trigger not
> just one VACUUM, but many VACUUMs, where no more than 2% of  rel_pages
> are not all-visible (pages, not tuples)? Not just once, mind you, but
> many times?

I've seen autovacuum_vacuum_scale_factor set to 0.01 repeatedly. But that's
not even needed - you just need a longrunning transaction preventing at least
one dead row from getting removed and hit autovacuum_freeze_max_age. There'll
be continuous VACUUMs of the table, all only processing a small fraction of
the table.

And I have many times seen bulk loading / deletion scripts that do VACUUM on a
regular schedule, which also could easily trigger this.


> And in the presence of some kind of highly variable tuple
> size, where it actually could matter to the planner at some point?

I don't see how a variable tuple size needs to be involved? As the EXPLAIN
ANALYZE above shows, we'll end up with wrong row count estimates etc.


> I would be willing to just avoid even these theoretical problems if
> there was some way to do so, that didn't also create new problems. I
> have my doubts that that is possible, within the constraints of
> updating pg_class. Or the present constraints, at least. I am not a
> miracle worker -- I can only do so much with the information that's
> available to vac_update_relstats (and/or the information that can
> easily be made available).

I'm worried they might cause new problems.

Greetings,

Andres Freund

Attachment

vac_relpages.sql

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

19 January 2023, 03:26:22

On Wed, Jan 18, 2023 at 7:04 PM Andres Freund <andres@anarazel.de> wrote:
> > You seem to be saying that it's a problem if we don't update reltuples
> > -- an estimate -- when less than 2% of the table is scanned by VACUUM.
> > But why? Why can't we just do nothing sometimes? I mean in general,
> > leaving aside the heuristics I came up with for a moment?
>
> The problem isn't that we might apply the heuristic once, that'd be fine. But
> that there's nothing preventing it from applying until there basically are no
> tuples left, as long as the vacuum is frequent enough.
>
> As a demo: The attached sql script ends up with a table containing 10k rows,
> but relpages being set 1 million.

I saw that too. But then I checked again a few seconds later, and
autoanalyze had run, so reltuples was 10k. Just like it would have if
there was no VACUUM statements in your script.

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Robert Haas

Date:

19 January 2023, 20:12:12

On Wed, Jan 18, 2023 at 1:31 PM Peter Geoghegan <pg@bowt.ie> wrote:
> pgstat_report_analyze() will totally override the
> tabentry->dead_tuples information that drives autovacuum.c, based on
> an estimate derived from a random sample -- which seems to me to be an
> approach that just doesn't have any sound theoretical basis.

In other words, ANALYZE sometimes (but not always) produces wrong answers.

On Wed, Jan 18, 2023 at 4:08 PM Andres Freund <andres@anarazel.de> wrote:
> One complicating factor is that VACUUM sometimes computes an incrementally
> more bogus n_live_tup when it skips pages due to the VM, whereas ANALYZE
> computes something sane. I unintentionally encountered one when I was trying
> something while writing this email, reproducer attached.

In other words, VACUUM sometimes (but not always) produces wrong answers.

TL;DR: We're screwed.

I refuse to believe that any amount of math you can do on numbers that
can be arbitrarily inaccurate will result in an accurate answer
popping out the other end. Trying to update the reltuples estimate
incrementally based on an estimate derived from a non-random,
likely-to-be-skewed subset of the table is always going to produce
distortion that gets worse and worse the more times you do it. If
could say, well, the existing estimate of let's say 100 tuples per
page is based on the density being 200 tuples per page in the pages I
just scanned and 50 tuples per page in the rest of the table, then you
could calculate a new estimate that keeps the value of 50 tuples per
page for the remainder of the table intact and just replaces the
estimate for the part you just scanned. But we have no way of doing
that, so we just make some linear combination of the old estimate with
the new one. That overweights the repeatedly-sampled portions of the
table more and more, making the estimate wronger and wronger.

Now, that is already quite bad. But if we accept the premise that
neither VACUUM nor ANALYZE is guaranteed to ever produce a new
actually-reliable estimate, then not only will we go progressively
more wrong as time goes by, but we have no way of ever fixing
anything. If you get a series of unreliable data points followed by a
reliable data point, you can at least get back on track when the
reliable data shows up. But it sounds like you guys are saying that
there's no guarantee that will ever happen, which is a bit like
discovering that not only do you have a hole in your gas tank but
there is no guarantee that you will arrive at a gas station ever again
regardless of distance travelled.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Andres Freund

Date:

19 January 2023, 20:56:57

Hi,

On 2023-01-19 15:12:12 -0500, Robert Haas wrote:
> On Wed, Jan 18, 2023 at 1:31 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > pgstat_report_analyze() will totally override the
> > tabentry->dead_tuples information that drives autovacuum.c, based on
> > an estimate derived from a random sample -- which seems to me to be an
> > approach that just doesn't have any sound theoretical basis.
> 
> In other words, ANALYZE sometimes (but not always) produces wrong answers.

For dead tuples, but not live tuples.

> On Wed, Jan 18, 2023 at 4:08 PM Andres Freund <andres@anarazel.de> wrote:
> > One complicating factor is that VACUUM sometimes computes an incrementally
> > more bogus n_live_tup when it skips pages due to the VM, whereas ANALYZE
> > computes something sane. I unintentionally encountered one when I was trying
> > something while writing this email, reproducer attached.
> 
> In other words, VACUUM sometimes (but not always) produces wrong answers.

For live tuples, but not badly so for dead tuples.

> TL;DR: We're screwed.

We are, but perhaps not too badly so, because we can choose to believe analyze
more for live tuples, and vacuum for dead tuples. Analyze doesn't compute
reltuples incrementally and vacuum doesn't compute deadtuples incrementally.

> I refuse to believe that any amount of math you can do on numbers that
> can be arbitrarily inaccurate will result in an accurate answer
> popping out the other end. Trying to update the reltuples estimate
> incrementally based on an estimate derived from a non-random,
> likely-to-be-skewed subset of the table is always going to produce
> distortion that gets worse and worse the more times you do it. If
> could say, well, the existing estimate of let's say 100 tuples per
> page is based on the density being 200 tuples per page in the pages I
> just scanned and 50 tuples per page in the rest of the table, then you
> could calculate a new estimate that keeps the value of 50 tuples per
> page for the remainder of the table intact and just replaces the
> estimate for the part you just scanned. But we have no way of doing
> that, so we just make some linear combination of the old estimate with
> the new one. That overweights the repeatedly-sampled portions of the
> table more and more, making the estimate wronger and wronger.

Perhaps we should, at least occasionally, make vacuum do a cheaper version of
analyze's sampling to compute an updated reltuples. This could even happen
during the heap scan phase.

I don't like relying on analyze to fix vacuum's bogus reltuples, because
there's nothing forcing an analyze run soon after vacuum [incrementally]
screwed it up. Vacuum can be forced to run a lot of times due to xid horizons
preventing cleanup, after which there isn't anything forcing analyze to run
again.

But in contrast to dead_tuples, where I think we can just stop analyze from
updating it unless we crashed recently, I do think we need to update reltuples
in vacuum. So computing an accurate value seems like the least unreasonable
thing I can see.

Greetings,

Andres Freund

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Andres Freund

Date:

19 January 2023, 20:58:24

Hi

On 2023-01-18 19:26:22 -0800, Peter Geoghegan wrote:
> On Wed, Jan 18, 2023 at 7:04 PM Andres Freund <andres@anarazel.de> wrote:
> > > You seem to be saying that it's a problem if we don't update reltuples
> > > -- an estimate -- when less than 2% of the table is scanned by VACUUM.
> > > But why? Why can't we just do nothing sometimes? I mean in general,
> > > leaving aside the heuristics I came up with for a moment?
> >
> > The problem isn't that we might apply the heuristic once, that'd be fine. But
> > that there's nothing preventing it from applying until there basically are no
> > tuples left, as long as the vacuum is frequent enough.
> >
> > As a demo: The attached sql script ends up with a table containing 10k rows,
> > but relpages being set 1 million.
> 
> I saw that too. But then I checked again a few seconds later, and
> autoanalyze had run, so reltuples was 10k. Just like it would have if
> there was no VACUUM statements in your script.

There's absolutely no guarantee that autoanalyze is triggered
there. Particularly with repeated vacuums triggered due to an relfrozenxid age
that can't be advanced that very well might not happen within days on a large
relation.

Greetings,

Andres Freund

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

19 January 2023, 21:22:28

On Thu, Jan 19, 2023 at 12:56 PM Andres Freund <andres@anarazel.de> wrote:
> > In other words, ANALYZE sometimes (but not always) produces wrong answers.
>
> For dead tuples, but not live tuples.

> > In other words, VACUUM sometimes (but not always) produces wrong answers.
>
> For live tuples, but not badly so for dead tuples.

Agreed. More generally, there is a need to think about the whole table
in some cases (like for regular optimizer statistics including
reltuples/live tuples), and the subset of pages that will be scanned
by VACUUM in other cases (for dead tuples accounting). Random sampling
at the table level is appropriate and works well enough for the
former, though not for the latter.

> We are, but perhaps not too badly so, because we can choose to believe analyze
> more for live tuples, and vacuum for dead tuples. Analyze doesn't compute
> reltuples incrementally and vacuum doesn't compute deadtuples incrementally.

Good points.

> But in contrast to dead_tuples, where I think we can just stop analyze from
> updating it unless we crashed recently, I do think we need to update reltuples
> in vacuum. So computing an accurate value seems like the least unreasonable
> thing I can see.

I agree, but there is no reasonable basis for treating scanned_pages
as a random sample, especially if it's only a small fraction of all of
rel_pages -- treating it as a random sample is completely
unjustifiable. And so it seems to me that the only thing that can be
done is to either make VACUUM behave somewhat like ANALYZE in at least
some cases, or to have it invoke ANALYZE directly (or indirectly) in
those same cases.

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

19 January 2023, 21:36:41

On Thu, Jan 19, 2023 at 12:58 PM Andres Freund <andres@anarazel.de> wrote:
> There's absolutely no guarantee that autoanalyze is triggered
> there. Particularly with repeated vacuums triggered due to an relfrozenxid age
> that can't be advanced that very well might not happen within days on a large
> relation.

Arguments like that work far better as arguments in favor of the
vac_estimate_reltuples heuristics.

That doesn't mean that the heuristics are good in any absolute sense,
of course. They were just a band aid intended to ameliorate some of
the negative impact that came from treating scanned_pages as a random
sample. I think that we both agree that the real problem is that
scanned_pages just isn't a random sample, at least not as far as
reltuples/live tuples is concerned (for dead tuples it kinda isn't a
sample, but is rather something close to an exact count).

I now understand that you're in favor of addressing the root problem
directly. I am also in favor of that approach. I'd be more than happy
to get rid of the band aid as part of that whole effort.

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Andres Freund

Date:

19 January 2023, 22:51:18

Hi,

On 2023-01-19 13:36:41 -0800, Peter Geoghegan wrote:
> On Thu, Jan 19, 2023 at 12:58 PM Andres Freund <andres@anarazel.de> wrote:
> > There's absolutely no guarantee that autoanalyze is triggered
> > there. Particularly with repeated vacuums triggered due to an relfrozenxid age
> > that can't be advanced that very well might not happen within days on a large
> > relation.
> 
> Arguments like that work far better as arguments in favor of the
> vac_estimate_reltuples heuristics.

I don't agree. But mainly my issue is that the devil you know (how this has
worked for a while) is preferrable to introducing an unknown quantity (your
patch that hasn't yet seen real world exposure).

Greetings,

Andres Freund

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Andres Freund

Date:

19 January 2023, 22:54:52

Hi,

On 2023-01-19 13:22:28 -0800, Peter Geoghegan wrote:
> On Thu, Jan 19, 2023 at 12:56 PM Andres Freund <andres@anarazel.de> wrote:
> > But in contrast to dead_tuples, where I think we can just stop analyze from
> > updating it unless we crashed recently, I do think we need to update reltuples
> > in vacuum. So computing an accurate value seems like the least unreasonable
> > thing I can see.
> 
> I agree, but there is no reasonable basis for treating scanned_pages
> as a random sample, especially if it's only a small fraction of all of
> rel_pages -- treating it as a random sample is completely
> unjustifiable.

Agreed.


> And so it seems to me that the only thing that can be done is to either make
> VACUUM behave somewhat like ANALYZE in at least some cases, or to have it
> invoke ANALYZE directly (or indirectly) in those same cases.

Yea. Hence my musing about potentially addressing this by choosing to visit
additional blocks during the heap scan using vacuum's block sampling logic.

IME most of the time in analyze isn't spent doing IO for the sample blocks
themselves, but CPU time and IO for toasted columns. A trimmed down version
that just computes relallvisible should be a good bit faster.

Greetings,

Andres Freund

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

19 January 2023, 23:10:38

On Thu, Jan 19, 2023 at 2:54 PM Andres Freund <andres@anarazel.de> wrote:
> Yea. Hence my musing about potentially addressing this by choosing to visit
> additional blocks during the heap scan using vacuum's block sampling logic.

I'd rather just invent a way for vacuumlazy.c to tell the top-level
vacuum.c caller "I didn't update reltuples, but you ought to go
ANALYZE the table now that I'm done, even if you weren't already
planning to do so". This wouldn't have to happen every time, but it
would happen fairly often.

> IME most of the time in analyze isn't spent doing IO for the sample blocks
> themselves, but CPU time and IO for toasted columns. A trimmed down version
> that just computes relallvisible should be a good bit faster.

I worry about that from a code maintainability point of view. I'm
concerned that it won't be very cut down at all, in the end.

Presumably you'll want to add the same I/O prefetching logic to this
cut-down version, just for example. Since without that there will be
no competition between it and ANALYZE proper. Besides which, isn't it
kinda wasteful to not just do a full ANALYZE? Sure, you can avoid
detoasting overhead that way. But even still.

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Andres Freund

Date:

19 January 2023, 23:38:57

Hi,

On 2023-01-19 15:10:38 -0800, Peter Geoghegan wrote:
> On Thu, Jan 19, 2023 at 2:54 PM Andres Freund <andres@anarazel.de> wrote:
> > Yea. Hence my musing about potentially addressing this by choosing to visit
> > additional blocks during the heap scan using vacuum's block sampling logic.
> 
> I'd rather just invent a way for vacuumlazy.c to tell the top-level
> vacuum.c caller "I didn't update reltuples, but you ought to go
> ANALYZE the table now that I'm done, even if you weren't already
> planning to do so".

I'm worried about increasing the number of analyzes that much - on a subset of
workloads it's really quite slow.

Another version of this could be to integrate analyze.c's scan more closely
with vacuum all the time. It's a bit bonkers that we often sequentially read
blocks, evict them from shared buffers if we read them, just to then
afterwards do random IO for blocks we've already read. That's imo what we
eventually should do, but clearly it's not a small project.

> This wouldn't have to happen every time, but it would happen fairly often.

Do you have a mechanism for that in mind? Just something vacuum_count % 10 ==
0 like? Or remember scanned_pages in pgstats and re-computing

> > IME most of the time in analyze isn't spent doing IO for the sample blocks
> > themselves, but CPU time and IO for toasted columns. A trimmed down version
> > that just computes relallvisible should be a good bit faster.
> 
> I worry about that from a code maintainability point of view. I'm
> concerned that it won't be very cut down at all, in the end.

I think it'd be fine to just use analyze.c and pass in an option to not
compute column and inheritance stats.

> Presumably you'll want to add the same I/O prefetching logic to this
> cut-down version, just for example. Since without that there will be
> no competition between it and ANALYZE proper. Besides which, isn't it
> kinda wasteful to not just do a full ANALYZE? Sure, you can avoid
> detoasting overhead that way. But even still.

It's not just that analyze is expensive, I think it'll also be confusing if
the column stats change after a manual VACUUM without ANALYZE.

It shouldn't be too hard to figure out whether we're going to do an analyze
anyway and not do the rowcount-estimate version when doing VACUUM ANALYZE or
if autovacuum scheduled an analyze as well.

Greetings,

Andres Freund

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

20 January 2023, 00:17:00

On Thu, Jan 19, 2023 at 3:38 PM Andres Freund <andres@anarazel.de> wrote:
> Another version of this could be to integrate analyze.c's scan more closely
> with vacuum all the time. It's a bit bonkers that we often sequentially read
> blocks, evict them from shared buffers if we read them, just to then
> afterwards do random IO for blocks we've already read. That's imo what we
> eventually should do, but clearly it's not a small project.

Very often, what we're really missing in VACUUM is high level context.
That's true of what you say here, about analyze.c, as well as
complaints like your vac_estimate_reltuples complaint. The problem
scenarios involving vac_estimate_reltuples all involve repeating the
same silly action again and again, never realizing that that's what's
going on. I've found it very useful to think of one VACUUM as picking
up where the last one left off for my work on freezing.

This seems related to pre-autovacuum historical details. VACUUM
shouldn't really be a command in the same way that CREATE INDEX is a
command. I do think that we need to retain a VACUUM command in some
form, but it should be something pretty close to a command that just
enqueues off-schedule autovacuums. That can do things like coalesce
duplicate requests into one.

Anyway, I am generally in favor of a design that makes VACUUM and
ANALYZE things that are more or less owned by autovacuum. It should be
less and less of a problem to blur the distinction between VACUUM and
ANALYZE under this model, in each successive release. These
distinctions are quite unhelpful, overall, because they make it hard
for autovacuum scheduling to work at the whole-system level.

> > This wouldn't have to happen every time, but it would happen fairly often.
>
> Do you have a mechanism for that in mind? Just something vacuum_count % 10 ==
> 0 like? Or remember scanned_pages in pgstats and re-computing

I was thinking of something very simple like that, yes.

> I think it'd be fine to just use analyze.c and pass in an option to not
> compute column and inheritance stats.

That could be fine. Just as long as it's not duplicative in an obvious way.

> > Presumably you'll want to add the same I/O prefetching logic to this
> > cut-down version, just for example. Since without that there will be
> > no competition between it and ANALYZE proper. Besides which, isn't it
> > kinda wasteful to not just do a full ANALYZE? Sure, you can avoid
> > detoasting overhead that way. But even still.
>
> It's not just that analyze is expensive, I think it'll also be confusing if
> the column stats change after a manual VACUUM without ANALYZE.

Possibly, but it doesn't have to happen there. It's not like the rules
aren't a bit different compared to autovacuum already. For example,
the way TOAST tables are handled by the VACUUM command versus
autovacuum.

Even if it's valuable to maintain this kind of VACUUM/autovacuum
parity (which I tend to doubt), doesn't the same argument work almost
as well with whatever stripped down version you come up with? It's
also confusing that a manual VACUUM command will be doing an
ANALYZE-like thing. Especially in cases where it's really expensive
relative to the work of VACUUM, because VACUUM scanned so few pages.
You just have to make some kind of trade-off.

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Robert Haas

Date:

20 January 2023, 13:47:18

On Thu, Jan 19, 2023 at 5:51 PM Andres Freund <andres@anarazel.de> wrote:
> I don't agree. But mainly my issue is that the devil you know (how this has
> worked for a while) is preferrable to introducing an unknown quantity (your
> patch that hasn't yet seen real world exposure).

Yeah, this is a major reason why I'm very leery about changes in this
area. A lot of autovacuum behavior is emergent, in the sense that it
wasn't directly intended by whoever wrote the code. It's just a
consequence of other decisions that probably seemed very reasonable at
the time they were made but turned out to have surprising and
unpleasant consequences.

In this particular case, I think that there is a large risk that
postponing auto-cancellation will make things significantly worse,
possibly drastically worse, for a certain class of users -
specifically, those whose vacuums often get auto-cancelled. I think
that it's actually pretty common for people to have workloads where
something pretty close to all of the autovacuums get auto-cancelled on
certain tables, and those people are always hard up against
autovacuum_freeze_max_age because they *have* to hit that in order to
get any vacuuming done on the affected tables. If the default
threshold for auto-cancellation goes up, those people will be
vacuuming even less often than they are now.

That's why I really liked your idea of decoupling auto-cancellation
from XID age. Such an approach can still avoid disabling
auto-cancellation just because autovacuum_freeze_max_age has been hit,
but it can also disable it much earlier when it detects that doing so
is necessary to make progress.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

20 January 2023, 20:42:47

On Fri, Jan 20, 2023 at 5:47 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Yeah, this is a major reason why I'm very leery about changes in this
> area. A lot of autovacuum behavior is emergent, in the sense that it
> wasn't directly intended by whoever wrote the code. It's just a
> consequence of other decisions that probably seemed very reasonable at
> the time they were made but turned out to have surprising and
> unpleasant consequences.

I certainly agree with your general description of the ways things
are. To a large degree we're living in a world where DBAs have already
compensated for some of the autovacuum shortcomings discussed on this
thread. For example, by setting autovacuum_vacuum_scale_factor (and
even autovacuum_vacuum_insert_scale_factor) to very low values, to
compensate for the issues with random sampling of dead tuples by
analyze, and to compensate for the way that VACUUM doesn't reason
correctly about how the number of dead tuples changes as VACUUM runs.
They might not have thought of it that way -- it could have happened
as a byproduct of tuning a production system through trial and error
-- but it still counts as compensating for a defect in autovacuum
scheduling IMV.

It's actually quite likely that even a strict improvement to (say)
autovacuum scheduling will cause some number of regressions, since now
what were effectively mitigations become unnecessary. This is somewhat
similar to the dynamic with optimizer improvements, where (say) a
selectivity estimate function that's better by every available metric
can still easily cause regressions that really cannot be blamed on the
improvement itself. I personally believe that it's a price worth
paying when it comes to the issues with autovacuum statistics,
particularly the dead tuple count issues. Since much of the behavior
that we sometimes see is just absurdly bad. We have both water tight
theoretical arguments and practical experiences pointing in that
direction.

> In this particular case, I think that there is a large risk that
> postponing auto-cancellation will make things significantly worse,
> possibly drastically worse, for a certain class of users -
> specifically, those whose vacuums often get auto-cancelled.

I agree that that's a real concern for the autocancellation side of
things. That seems quite different to the dead tuples accounting
issues, in that nobody would claim that the existing behavior is
flagrantly wrong (just that it sometimes causes problems instead of
preventing them).

> That's why I really liked your idea of decoupling auto-cancellation
> from XID age. Such an approach can still avoid disabling
> auto-cancellation just because autovacuum_freeze_max_age has been hit,
> but it can also disable it much earlier when it detects that doing so
> is necessary to make progress.

To be clear, I didn't think that that's what Andres was proposing, and
my recent v5 doesn't actually do that. Even in v5, it's still
fundamentally impossible for autovacuums that are triggered by the
tuples inserted or dead tuples thresholds to not be autocancellable.

ISTM that it doesn't make sense to condition the autocancellation
behavior on table XID age in the context of dead tuple VACUUMs. It
could either be way too early or way too late at that point. I was
rather hoping to not have to build the infrastructure required for
fully decoupling the autocancellation behavior from the triggering
condition (dead tuples vs table XID age) in the scope of this
thread/patch, though I can see the appeal of that.

The only reason why I'm using table age at all is because that's how
it works already, rightly or wrongly. If nothing else, t's pretty
clear that there is no theoretical or practical reason why it has to
be exactly the same table age as the one for launching autovacuums to
advance relfrozenxid/relminmxid. In v5 of the patch, the default is to
use 1.8x of the threshold that initially makes autovacuum.c want to
launch autovacuums to deal with table age. That's based on a
suggestion from Andres, but I'd be almost as happy with a default as
low as 1.1x or even 1.05x. That's going to make very little difference
to those users that really rely on the no-auto-cancellation behavior,
while at the same time making things a lot safer for scenarios like
the Joyent/Manta "DROP TRIGGER" outage (not perfectly safe, by any
means, but meaningfully safer).

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Robert Haas

Date:

20 January 2023, 20:57:21

On Fri, Jan 20, 2023 at 3:43 PM Peter Geoghegan <pg@bowt.ie> wrote:
> The only reason why I'm using table age at all is because that's how
> it works already, rightly or wrongly. If nothing else, t's pretty
> clear that there is no theoretical or practical reason why it has to
> be exactly the same table age as the one for launching autovacuums to
> advance relfrozenxid/relminmxid. In v5 of the patch, the default is to
> use 1.8x of the threshold that initially makes autovacuum.c want to
> launch autovacuums to deal with table age. That's based on a
> suggestion from Andres, but I'd be almost as happy with a default as
> low as 1.1x or even 1.05x. That's going to make very little difference
> to those users that really rely on the no-auto-cancellation behavior,
> while at the same time making things a lot safer for scenarios like
> the Joyent/Manta "DROP TRIGGER" outage (not perfectly safe, by any
> means, but meaningfully safer).

It doesn't seem that way to me. What am I missing? In that case, the
problem was a DROP TRIGGER command waiting behind autovacuum's lock
and thus causing all new table locks to wait behind DROP TRIGGER's
lock request. But it does not sound like that was a one-off event. It
sounds like they used DROP TRIGGER pretty regularly. So I think this
sounds like exactly the kind of case I was talking about, where
autovacuums keep getting cancelled until we decide to stop cancelling
them.

If so, then they were going to have a problem whenever that happened.
Delaying the point at which we stop cancelling them would not help at
all, as your patch would do. What about stopping cancelling them
sooner, as with the proposal to switch to that behavior after a
certain number of auto-cancels? That doesn't prevent the problem
either. If it's aggressive enough, it has some chance of making the
problem visible in a well-run test environment, which could
conceivably prevent you from hitting it in production, but certainly
there are no guarantees.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

20 January 2023, 21:23:45

On Fri, Jan 20, 2023 at 12:57 PM Robert Haas <robertmhaas@gmail.com> wrote:
> It doesn't seem that way to me. What am I missing? In that case, the
> problem was a DROP TRIGGER command waiting behind autovacuum's lock
> and thus causing all new table locks to wait behind DROP TRIGGER's
> lock request. But it does not sound like that was a one-off event.

It's true that I cannot categorically state that it would have made
the crucial difference in this particular case. It comes down to two
factors:

1. How many attempts would any given amount of additional XID space
head room have bought them in practice? We can be all but certain that
the smallest possible number is 1, which is something.

2. Would that have been enough for relfrozenxid to be advanced in practice?

I think that it's likely that the answer to 2 is yes, since there was
no mention of bloat as a relevant factor at any point in the
postmortem. It was all about locking characteristics of antiwraparound
autovacuuming in particular, and its interaction with their
application. I think that they were perfectly okay with the autovacuum
cancellation behavior most of the time. In fact, I don't think that
there was any bloat in the table at all -- it was a really huge table
(likely an events table), and those tend to be append-only.

Even if I'm wrong about this specific case (we'll never know for
sure), the patch as written would be virtually guaranteed to make the
crucial differences in cases that I have seen up close. For example, a
case with TRUNCATE.

> It sounds like they used DROP TRIGGER pretty regularly. So I think this
> sounds like exactly the kind of case I was talking about, where
> autovacuums keep getting cancelled until we decide to stop cancelling
> them.

I don't know how you can reach that conclusion. The chances of a
non-aggressive VACUUM advancing relfrozenxid right now are virtually
zero, at least for a big table like this one. It seems quite likely
that plenty of non-aggressive autovacuums completed, or would have had
the insert-driven autovacuum feature been available.

The whole article was about how this DROP TRIGGER pattern worked just
fine most of the time, because most of the time autovacuum was just
autocancelled. They say this at one point:

"The normal autovacuum mechanism is skipped when locks are held in
order to minimize service disruption. However, because transaction
wraparound is such a severe problem, if the system gets too close to
wraparound, an autovacuum is launched that does not back off under
lock contention."

At another point:

"When the outage was resolved, we still had a number of questions: is
a wraparound autovacuum always so disruptive? Given that it was
blocking all table operations, why does it throttle itself?"

ISTM that it was a combination of aggressive vacuuming taking far
longer than usual (especially likely because this was pre freeze map),
and the no-auto-cancel behavior. Aggressive/antiwraparound VACUUMs are
naturally much more likely to coincide with periodic DDL, just because
they take so much longer. That is a dangerous combination.

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

22 January 2023, 02:37:59

On Tue, Jan 17, 2023 at 10:02 AM Andres Freund <andres@anarazel.de> wrote:
> I think with a bit of polish "Add autovacuum trigger instrumentation." ought
> to be quickly mergeable.

Any thoughts on v5-0001-*?

It would be nice to get the uncontentious part of all this (which is
the instrumentation patch) out of the way soon.

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Andres Freund

Date:

22 January 2023, 02:54:35

Hi,

On 2023-01-21 18:37:59 -0800, Peter Geoghegan wrote:
> On Tue, Jan 17, 2023 at 10:02 AM Andres Freund <andres@anarazel.de> wrote:
> > I think with a bit of polish "Add autovacuum trigger instrumentation." ought
> > to be quickly mergeable.
> 
> Any thoughts on v5-0001-*?
> 
> It would be nice to get the uncontentious part of all this (which is
> the instrumentation patch) out of the way soon.

Is https://www.postgresql.org/message-id/CAH2-WzmytCuSpaMEhv8H-jt8x_9whTi0T5bjNbH2gvaR0an2Pw%40mail.gmail.com
the last / relevant version of the patch to look at?

Greetings,

Andres Freund

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

22 January 2023, 03:19:58

On Sat, Jan 21, 2023 at 6:54 PM Andres Freund <andres@anarazel.de> wrote:
> Is https://www.postgresql.org/message-id/CAH2-WzmytCuSpaMEhv8H-jt8x_9whTi0T5bjNbH2gvaR0an2Pw%40mail.gmail.com
> the last / relevant version of the patch to look at?

Yes. I'm mostly just asking about v5-0001-* right now.

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Andres Freund

Date:

24 January 2023, 02:56:36

Hi,

On 2023-01-18 17:54:27 -0800, Peter Geoghegan wrote:
> From 0afaf310255a068d3c1ca9d2ce6f00118cbff890 Mon Sep 17 00:00:00 2001
> From: Peter Geoghegan <pg@bowt.ie>
> Date: Fri, 25 Nov 2022 11:23:20 -0800
> Subject: [PATCH v5 1/2] Add autovacuum trigger instrumentation.
> 
> Add new instrumentation that lists a triggering condition in the server
> log whenever an autovacuum is logged.  This reports "table age" as the
> triggering criteria when antiwraparound autovacuum runs (the XID age
> trigger case and the MXID age trigger case are represented separately).
> Other cases are reported as autovacuum trigger when the tuple insert
> thresholds or the dead tuple thresholds were crossed.
> 
> Author: Peter Geoghegan <pg@bowt.ie>
> Reviewed-By: Andres Freund <andres@anarazel.de>
> Reviewed-By: Jeff Davis <pgsql@j-davis.com>
> Discussion: https://postgr.es/m/CAH2-Wz=S-R_2rO49Hm94Nuvhu9_twRGbTm6uwDRmRu-Sqn_t3w@mail.gmail.com
> ---
>  src/include/commands/vacuum.h        |  19 +++-
>  src/backend/access/heap/vacuumlazy.c |   5 ++
>  src/backend/commands/vacuum.c        |  31 ++++++-
>  src/backend/postmaster/autovacuum.c  | 124 ++++++++++++++++++---------
>  4 files changed, 137 insertions(+), 42 deletions(-)
> 
> diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
> index 689dbb770..13f70a1f6 100644
> --- a/src/include/commands/vacuum.h
> +++ b/src/include/commands/vacuum.h
> @@ -191,6 +191,21 @@ typedef struct VacAttrStats
>  #define VACOPT_SKIP_DATABASE_STATS 0x100    /* skip vac_update_datfrozenxid() */
>  #define VACOPT_ONLY_DATABASE_STATS 0x200    /* only vac_update_datfrozenxid() */
>  
> +/*
> + * Values used by autovacuum.c to tell vacuumlazy.c about the specific
> + * threshold type that triggered an autovacuum worker.
> + *
> + * AUTOVACUUM_NONE is used when VACUUM isn't running in an autovacuum worker.
> + */
> +typedef enum AutoVacType
> +{
> +    AUTOVACUUM_NONE = 0,
> +    AUTOVACUUM_TABLE_XID_AGE,
> +    AUTOVACUUM_TABLE_MXID_AGE,
> +    AUTOVACUUM_DEAD_TUPLES,
> +    AUTOVACUUM_INSERTED_TUPLES,
> +} AutoVacType;

Why is there TABLE_ in AUTOVACUUM_TABLE_XID_AGE but not
AUTOVACUUM_DEAD_TUPLES? Both are on tables.


What do you think about naming this VacuumTriggerType and adding an
VAC_TRIG_MANUAL or such?


>  /*
>   * Values used by index_cleanup and truncate params.
>   *
> @@ -222,7 +237,8 @@ typedef struct VacuumParams
>                                               * use default */
>      int            multixact_freeze_table_age; /* multixact age at which to scan
>                                               * whole table */
> -    bool        is_wraparound;    /* force a for-wraparound vacuum */
> +    bool        is_wraparound;    /* antiwraparound autovacuum? */
> +    AutoVacType trigger;        /* autovacuum trigger condition, if any */

The comment change for is_wraparound seems a bit pointless, but whatever.


> @@ -2978,7 +2995,10 @@ relation_needs_vacanalyze(Oid relid,
>                            bool *doanalyze,
>                            bool *wraparound)
>  {

The changes here are still bigger than I'd like, but ...


> -    bool        force_vacuum;
> +    TransactionId relfrozenxid = classForm->relfrozenxid;
> +    MultiXactId relminmxid = classForm->relminmxid;
> +    AutoVacType trigger = AUTOVACUUM_NONE;
> +    bool        tableagevac;

Here + below we end up with three booleans that just represent the choices in
our fancy new enum. That seems awkward to me.



> @@ -3169,14 +3212,15 @@ autovacuum_do_vac_analyze(autovac_table *tab, BufferAccessStrategy bstrategy)
>  static void
>  autovac_report_activity(autovac_table *tab)
>  {
> -#define MAX_AUTOVAC_ACTIV_LEN (NAMEDATALEN * 2 + 56)
> +#define MAX_AUTOVAC_ACTIV_LEN (NAMEDATALEN * 2 + 100)
>      char        activity[MAX_AUTOVAC_ACTIV_LEN];
>      int            len;
>  
>      /* Report the command and possible options */
>      if (tab->at_params.options & VACOPT_VACUUM)
>          snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,
> -                 "autovacuum: VACUUM%s",
> +                 "autovacuum for %s: VACUUM%s",
> +                 vac_autovacuum_trigger_msg(tab->at_params.trigger),
>                   tab->at_params.options & VACOPT_ANALYZE ? " ANALYZE" : "");
>      else
>          snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,

Somehow the added "for ..." sounds a bit awkward. "autovacuum for table XID
age". Maybe "autovacuum due to ..."?

Greetings,

Andres Freund

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

24 January 2023, 03:22:18

On Mon, Jan 23, 2023 at 6:56 PM Andres Freund <andres@anarazel.de> wrote:
> Why is there TABLE_ in AUTOVACUUM_TABLE_XID_AGE but not
> AUTOVACUUM_DEAD_TUPLES? Both are on tables.

Why does vacuum_freeze_table_age contain the word "table", while
autovacuum_vacuum_scale_factor does not?

To me, "table XID age" is a less awkward term for "relfrozenxid
advancing", useful in contexts where it's probably more important to
be understood by non-experts than it is to be unambiguous. Besides,
relfrozenxid works at the level of the pg_class metadata. Nothing
whatsoever needs to have changed about the table itself, nor will
anything necessarily be changed by VACUUM (except the relfrozenxid
field from pg_class).

> What do you think about naming this VacuumTriggerType and adding an
> VAC_TRIG_MANUAL or such?

But we're not doing anything with it in the context of manual VACUUMs.
I'd prefer to keep this about what autovacuum.c thought needed to
happen, at least for as long as manual VACUUMs are something that
autovacuum.c knows nothing about.

> > -     bool            force_vacuum;
> > +     TransactionId relfrozenxid = classForm->relfrozenxid;
> > +     MultiXactId relminmxid = classForm->relminmxid;
> > +     AutoVacType trigger = AUTOVACUUM_NONE;
> > +     bool            tableagevac;
>
> Here + below we end up with three booleans that just represent the choices in
> our fancy new enum. That seems awkward to me.

I don't follow. It's true that "wraparound" is still a synonym of
"tableagevac" in 0001, but that changes in 0002. And even if you
assume that 0002 won't get in, I think that it still makes sense to
structure it in a way that shows that in principle the "wraparound"
behaviors don't necessarily have to be used whenever "tableagevac" is
in use.

> > @@ -3169,14 +3212,15 @@ autovacuum_do_vac_analyze(autovac_table *tab, BufferAccessStrategy bstrategy)
> >  static void
> >  autovac_report_activity(autovac_table *tab)

> Somehow the added "for ..." sounds a bit awkward. "autovacuum for table XID
> age". Maybe "autovacuum due to ..."?

That works just as well IMV. I'll change it to that.

Anything else for 0001? Would be nice to get it committed tomorrow.

-- 
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Robert Haas

Date:

24 January 2023, 19:21:15

On Fri, Jan 20, 2023 at 4:24 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > It sounds like they used DROP TRIGGER pretty regularly. So I think this
> > sounds like exactly the kind of case I was talking about, where
> > autovacuums keep getting cancelled until we decide to stop cancelling
> > them.
>
> I don't know how you can reach that conclusion.

I can accept that there might be some way I'm wrong about this in
theory, but your email then seems to go on to say that I'm right just
a few sentences later:

> The whole article was about how this DROP TRIGGER pattern worked just
> fine most of the time, because most of the time autovacuum was just
> autocancelled. They say this at one point:
>
> "The normal autovacuum mechanism is skipped when locks are held in
> order to minimize service disruption. However, because transaction
> wraparound is such a severe problem, if the system gets too close to
> wraparound, an autovacuum is launched that does not back off under
> lock contention."

If this isn't arguing in favor of exactly what I'm saying, I don't
know what that would look like.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

24 January 2023, 20:32:56

On Tue, Jan 24, 2023 at 11:21 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > The whole article was about how this DROP TRIGGER pattern worked just
> > fine most of the time, because most of the time autovacuum was just
> > autocancelled. They say this at one point:
> >
> > "The normal autovacuum mechanism is skipped when locks are held in
> > order to minimize service disruption. However, because transaction
> > wraparound is such a severe problem, if the system gets too close to
> > wraparound, an autovacuum is launched that does not back off under
> > lock contention."
>
> If this isn't arguing in favor of exactly what I'm saying, I don't
> know what that would look like.

I'm happy to clear that up. What you said was:

"So I think this sounds like exactly the kind of case I was talking about, where
autovacuums keep getting cancelled until we decide to stop cancelling them.
If so, then they were going to have a problem whenever that happened."

Just because *some* autovacuums get cancelled doesn't mean they *all*
get cancelled. And, even if the rate is quite high, that may not be
much of a problem in itself (especially now that we have the freeze
map). 200 million XIDs usually amounts to a lot of wall clock time.
Even if it is rather difficult to finish up, we only have to get lucky
once.

The fact that autovacuum eventually got to the point of requiring an
antiwraparound autovacuum on the problematic table does indeed
strongly suggest that any other, earlier autovacuums were relatively
unlikely to have advanced relfrozenxid in the end -- or at least
couldn't on this one occasion. But that in itself is just not relevant
to our current discussion, since even the tiniest perturbation would
have been enough to prevent a non-aggressive VACUUM from being able to
advance relfrozenxid. Before 15, non-aggressive VACUUMs would throw
away the opportunity to do so just because they couldn't immediately
get a cleanup lock on one single heap page.

It's quite possible that most or all prior aggressive VACUUMs were not
antiwraparound autovacuums, because the dead tuples accounting was
enough to launch an autovacuum at some point after age(relfrozenxid)
exceeded vacuum_freeze_table_age that was still before it could reach
autovacuum_freeze_max_age. That would give you a cancellable
aggressive VACUUM -- a VACUUM that actually has a non-zero chance of
advancing relfrozenxid.

Sure, it's possible that such a cancellable aggressive autovacuum was
indeed cancelled, and that that factor made the crucial difference.
But I find it far easier to believe that there simply was no such
aggressive autovacuum in the first place (not this time), since it
could have only happened when autovacuum thinks that there are
sufficiently many dead tuples to justify launching an autovacuum in
the first place. Which, as we now all accept, is based on highly
dubious sampling by ANALYZE. So I think it's much more likely to be
that factor (dead tuple accounting is bad), as well as the absurd
false dichotomy between aggressive and non-aggressive -- plus the
issue at hand, the auto-cancellation behavior.

I don't claim to know what is inevitable, or what is guaranteed to
work or not work. I only claim that we can meaningfully reduce the
absolute risk by using a fairly simple approach, principally by not
needlessly coupling the auto-cancellation behavior to *all*
autovacuums that are specifically triggered by age(relfrozenxid). As
Andres said at one point, doing those two things at exactly the same
time is just arbitrary.

--
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Andres Freund

Date:

25 January 2023, 04:59:04

Hi,

On 2023-01-23 19:22:18 -0800, Peter Geoghegan wrote:
> On Mon, Jan 23, 2023 at 6:56 PM Andres Freund <andres@anarazel.de> wrote:
> > Why is there TABLE_ in AUTOVACUUM_TABLE_XID_AGE but not
> > AUTOVACUUM_DEAD_TUPLES? Both are on tables.
> 
> Why does vacuum_freeze_table_age contain the word "table", while
> autovacuum_vacuum_scale_factor does not?

I don't know. But that's not really a reason to introduce more oddities.


> To me, "table XID age" is a less awkward term for "relfrozenxid
> advancing", useful in contexts where it's probably more important to
> be understood by non-experts than it is to be unambiguous. Besides,
> relfrozenxid works at the level of the pg_class metadata. Nothing
> whatsoever needs to have changed about the table itself, nor will
> anything necessarily be changed by VACUUM (except the relfrozenxid
> field from pg_class).

I'd just go for "xid age", I don't see a point in adding 'table', particularly
when you don't for dead tuples.


> > What do you think about naming this VacuumTriggerType and adding an
> > VAC_TRIG_MANUAL or such?
> 
> But we're not doing anything with it in the context of manual VACUUMs.

It's a member of a struct passed to the routines handling both manual and
interactive vacuum.  And we could e.g. eventually start replace
IsAutoVacuumWorkerProcess() checks with it - which aren't e.g. going to work
well if we add parallel index vacuuming support to autovacuum.



> I'd prefer to keep this about what autovacuum.c thought needed to
> happen, at least for as long as manual VACUUMs are something that
> autovacuum.c knows nothing about.

It's an enum defined in a general header, not something in autovacuum.c - so I
don't really buy this.


> > > -     bool            force_vacuum;
> > > +     TransactionId relfrozenxid = classForm->relfrozenxid;
> > > +     MultiXactId relminmxid = classForm->relminmxid;
> > > +     AutoVacType trigger = AUTOVACUUM_NONE;
> > > +     bool            tableagevac;
> >
> > Here + below we end up with three booleans that just represent the choices in
> > our fancy new enum. That seems awkward to me.
> 
> I don't follow. It's true that "wraparound" is still a synonym of
> "tableagevac" in 0001, but that changes in 0002. And even if you
> assume that 0002 won't get in, I think that it still makes sense to
> structure it in a way that shows that in principle the "wraparound"
> behaviors don't necessarily have to be used whenever "tableagevac" is
> in use.

You have booleans tableagevac, deadtupvac, inserttupvac. Particularly the
latter ones really are just a rephrasing of the trigger:

+    tableagevac = true;
+    *wraparound = false;
+    /* See header comments about trigger precedence */
+    if (TransactionIdIsNormal(relfrozenxid) &&
+        TransactionIdPrecedes(relfrozenxid, xidForceLimit))
+        trigger = AUTOVACUUM_TABLE_XID_AGE;
+    else if (MultiXactIdIsValid(relminmxid) &&
+             MultiXactIdPrecedes(relminmxid, multiForceLimit))
+        trigger = AUTOVACUUM_TABLE_MXID_AGE;
+    else
+        tableagevac = false;
+
+    /* User disabled non-table-age autovacuums in pg_class.reloptions? */
+    if (!av_enabled && !tableagevac)

...

+        deadtupvac = (vactuples > vacthresh);
+        inserttupvac = (vac_ins_base_thresh >= 0 && instuples > vacinsthresh);
+        /* See header comments about trigger precedence */
+        if (!tableagevac)
+        {
+            if (deadtupvac)
+                trigger = AUTOVACUUM_DEAD_TUPLES;
+            else if (inserttupvac)
+                trigger = AUTOVACUUM_INSERTED_TUPLES;
+        }
+
         /* Determine if this table needs vacuum or analyze. */
-        *dovacuum = force_vacuum || (vactuples > vacthresh) ||
-            (vac_ins_base_thresh >= 0 && instuples > vacinsthresh);
+        *dovacuum = (tableagevac || deadtupvac || inserttupvac);


I find this to be awkward code. The booleans are kinda pointless, and the
tableagevac case is hard to follow because trigger is set elsewhere.

I can give reformulating it a go. Need to make some food first.


I suspect that the code would look better if we didn't continue to have
"bool *dovacuum" and the trigger. They're redundant.


> Anything else for 0001? Would be nice to get it committed tomorrow.

Sorry, today was busy with meetings and bashing my head against AIX.

Greetings,

Andres Freund

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Andres Freund

Date:

25 January 2023, 09:19:04

Hi,

On 2023-01-24 20:59:04 -0800, Andres Freund wrote:
> I find this to be awkward code. The booleans are kinda pointless, and the
> tableagevac case is hard to follow because trigger is set elsewhere.
>
> I can give reformulating it a go. Need to make some food first.

Here's a draft of what I am thinking of. Not perfect yet, but I think it looks
better.

The pg_stat_activity output looks like this right now:

autovacuum due to table XID age: VACUUM public.pgbench_accounts (to prevent wraparound)

Why don't we drop the "(to prevent wraparound)" now?

And I still think removing the "table " bit would be an improvement.

Greetings,

Andres Freund

Attachment

v3-autovac-trigger-wip.diff

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Robert Haas

Date:

25 January 2023, 15:28:22

On Tue, Jan 24, 2023 at 3:33 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Sure, it's possible that such a cancellable aggressive autovacuum was
> indeed cancelled, and that that factor made the crucial difference.
> But I find it far easier to believe that there simply was no such
> aggressive autovacuum in the first place (not this time), since it
> could have only happened when autovacuum thinks that there are
> sufficiently many dead tuples to justify launching an autovacuum in
> the first place. Which, as we now all accept, is based on highly
> dubious sampling by ANALYZE. So I think it's much more likely to be
> that factor (dead tuple accounting is bad), as well as the absurd
> false dichotomy between aggressive and non-aggressive -- plus the
> issue at hand, the auto-cancellation behavior.

In my opinion, this is too speculative to justify making changes to
the behavior. I'm not here to defend the way we do dead tuple
accounting. I think it's a hot mess. But whether or not it played any
role in this catastrophe is hard to say. The problems with dead tuple
accounting are, as I understand it, all about statistical
independence. That is, we make assumptions that what's true of the
sample is likely to be true of the whole table when in reality it may
not be true at all. Perhaps it's even unlikely to be true. But the
kinds of problems you get from assuming statistical independence tend
to hit users very unevenly. We make similar assumptions about
selectivity estimation: unless there are extended statistics, we take
P(a=1 and b=1) = P(a=1)*P(b = 1), which can be vastly and dramatically
wrong. People can and do get extremely bad query plans as a result of
that assumption. However, other people run PostgreSQL for years and
years and never really have big problems with it. I'd say that it's
completely fair to describe this as a big problem, but we can't
therefore conclude that some particular user has this problem, not
even if we know that they have a slow query and we know that it's due
to a bad plan. And similarly here, I don't see a particular reason to
think that your theory about what happened is more likely than mine. I
freely admit that yours could be right, just as you admitted that mine
could be right. But I think we really just don't know.

It feels unlikely to me that there was ONE cancellable aggressive
autovacuum and that it got cancelled. I think that it's probably
either ZERO or LOTS, depending on whether the dead tuple threshold was
ever reached. If it wasn't, then it must have been zero. But if it
was, and the first autovacuum worker to visit that table got
cancelled, then the next one would try again. And
autovacuum_naptime=1m, so if we're not out of autovacuum workers,
we're going to retry that table every minute. If we do run out of
autovacuum workers, which is pretty likely, we'll still launch new
workers in that database as often as we can given when other workers
exit. If the system is very tight on autovacuum capacity, probably
because the cost limit is too low, then you could have a situation
where only one try gets made before we hit autovacuum_freeze_max_age.
Otherwise, though, a single failed try would probably lead to trying a
whole lot more times after that, and you only hit
autovacuum_freeze_max_age if all those attempts fail.

At the risk of repeating myself, here's what bugs me. If we suppose
that your intuition is right and no aggressive autovacuum happened
before autovacuum_freeze_max_age was reached, then what you are
proposing will make things better. But if we suppose that my intuition
is right and many aggressive autovacuums happened before
autovacuum_freeze_max_age was reached, then what you are proposing
will make things worse, because if we've been auto-cancelling
repeatedly we're probably going to keep doing so until we shut that
behavior off, and we want a vacuum to succeed sooner rather than
later. So it doesn't feel obvious to me that we should change
anything. Even if we knew what had happened for certain in this
particular case, I don't know how we can possibly know what is typical
in similar cases.

My personal support experience has been that cases where autovacuum
runs a lot but doesn't solve the problem for some reason are a lot
more common than cases where it doesn't run when it should have done.
That probably accounts for my intuition about what is likely to have
happened here. But as my colleagues are often at pains to point out to
me, my experiences aren't representative of what happens to PostgreSQL
users generally for all kinds of reasons, and therefore sometimes my
intuition is wrong. But since I have nobody else's experiences to use
in lieu of my own, I don't know what else I can use to judge anything.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From

Peter Geoghegan

Date:

26 January 2023, 06:41:34

On Wed, Jan 25, 2023 at 1:19 AM Andres Freund <andres@anarazel.de> wrote:
> Here's a draft of what I am thinking of. Not perfect yet, but I think it looks
> better.

I'm afraid that I will be unable to do any more work on this project.
I have withdrawn it from the CF app.

If you would like to complete some or all of the patches yourself, in
part or in full, I have no objections.

-- 
Peter Geoghegan