Thread: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

From
Peter Geoghegan
Date:
My work on page-level freezing for PostgreSQL 16 has some remaining
loose ends to tie up with the documentation. The "Routine Vacuuming"
section of the docs has no mention of page-level freezing. It also
doesn't mention the FPI optimization added by commit 1de58df4. This
isn't a small thing to leave out; I fully expect that the FPI
optimization will very significantly alter when and how VACUUM
freezes. The cadence will look quite a lot different.

It seemed almost impossible to fit in discussion of page-level
freezing to the existing structure. In part this is because the
existing documentation emphasizes the worst case scenario, rather than
talking about freezing as a maintenance task that affects physical
heap pages in roughly the same way as pruning does. There isn't a
clean separation of things that would allow me to just add a paragraph
about the FPI thing.

Obviously it's important that the system never enters xidStopLimit
mode -- not being able to allocate new XIDs is a huge problem. But it
seems unhelpful to define that as the only goal of freezing, or even
the main goal. To me this seems similar to defining the goal of
cleaning up bloat as avoiding completely running out of disk space;
while it may be "the single most important thing" in some general
sense, it isn't all that important in most individual cases. There are
many very bad things that will happen before that extreme worst case
is hit, which are far more likely to be the real source of pain.

There are also very big structural problems with "Routine Vacuuming",
that I also propose to do something about. Honestly, it's a huge mess
at this point. It's nobody's fault in particular; there has been
accretion after accretion added, over many years. It is time to
finally bite the bullet and do some serious restructuring. I'm hoping
that I don't get too much push back on this, because it's already very
difficult work.

Attached patch series shows what I consider to be a much better
overall structure. To make this convenient to take a quick look at, I
also attach a prebuilt version of routine-vacuuming.html (not the only
page that I've changed, but the most important set of changes by far).

This initial version is still quite lacking in overall polish, but I
believe that it gets the general structure right. That's what I'd like
to get feedback on right now: can I get agreement with me about the
general nature of the problem? Does this high level direction seem
like the right one?

The following list is a summary of the major changes that I propose:

1. Restructures the order of items to match the actual processing
order within VACUUM (and ANALYZE), rather than jumping from VACUUM to
ANALYZE and then back to VACUUM.

This flows a lot better, which helps with later items that deal with
freezing/wraparound.

2. Renamed "Preventing Transaction ID Wraparound Failures" to
"Freezing to manage the transaction ID space". Now we talk about
wraparound as a subtopic of freezing, not vice-versa. (This is a
complete rewrite, as described by later items in this list).

3. All of the stuff about modulo-2^32 arithmetic is moved to the
storage chapter, where we describe the heap tuple header format.

It seems crazy to me that the second sentence in our discussion of
wraparound/freezing is still:

"But since transaction IDs have limited size (32 bits) a cluster that
runs for a long time (more than 4 billion transactions) would suffer
transaction ID wraparound: the XID counter wraps around to zero, and
all of a sudden transactions that were in the past appear to be in the
future"

Here we start the whole discussion of wraparound (a particularly
delicate topic) by describing how VACUUM used to work 20 years ago,
before the invention of freezing. That was the last time that a
PostgreSQL cluster could run for 4 billion XIDs without freezing. The
invariant is that we activate xidStopLimit mode protections to avoid a
"distance" between any two unfrozen XIDs that exceeds about 2 billion
XIDs. So why on earth are we talking about 4 billion XIDs? This is the
most confusing, least useful way of describing freezing that I can
think of.

4. No more separate section for MultiXactID freezing -- that's
discussed as part of the discussion of page-level freezing.

Page-level freezing takes place without regard to the trigger
condition for freezing. So the new approach to freezing has a fixed
idea of what it means to freeze a given page (what physical
modifications it entails). This means that having a separate sect3
subsection for MultiXactIds now makes no sense (if it ever did).

5. The top-level list of maintenance tasks has a new addition: "To
truncate obsolescent transaction status information, when possible".

It makes a lot of sense to talk about this as something that happens
last (or last among those steps that take place during VACUUM). It's
far less important than avoiding xidStopLimit outages, obviously
(using some extra disk space is almost certainly the least of your
worries when you're near to xidStopLimit). The current documentation
seems to take precisely the opposite view, when it says the following:

"The sole disadvantage of increasing autovacuum_freeze_max_age (and
vacuum_freeze_table_age along with it) is that the pg_xact and
pg_commit_ts subdirectories of the database cluster will take more
space"

This sentence is dangerously bad advice. It is precisely backwards. At
the same time, we'd better say something about the need to truncate
pg_xact/clog here. Besides all this, the new section for this is a far
more accurate reflection of what's really going on: most individual
VACUUMs (even most aggressive VACUUMs) won't ever truncate
pg_xact/clog (or the other relevant SLRUs). Truncation only happens
after a VACUUM that advances the relfrozenxid of the table which
previously had the oldest relfrozenxid among all tables in the entire
cluster -- so we need to talk about it as an issue with the high
watermark storage for pg_xact.

6. Rename the whole "Routine Vacuuming" section to "Autovacuum
Maintenance Tasks".

This is what we should be emphasizing over manually run VACUUMs.
Besides, the current title just seems wrong -- we're talking about
ANALYZE just as much as VACUUM.

Thoughts?

-- 
Peter Geoghegan

Attachment
On Tue, Apr 25, 2023 at 4:58 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> There are also very big structural problems with "Routine Vacuuming",
> that I also propose to do something about. Honestly, it's a huge mess
> at this point. It's nobody's fault in particular; there has been
> accretion after accretion added, over many years. It is time to
> finally bite the bullet and do some serious restructuring. I'm hoping
> that I don't get too much push back on this, because it's already very
> difficult work.

Now is a great time to revise this section, in my view. (I myself am about ready to get back to testing and writing for the task of removing that "obnoxious hint".)

> Attached patch series shows what I consider to be a much better
> overall structure. To make this convenient to take a quick look at, I
> also attach a prebuilt version of routine-vacuuming.html (not the only
> page that I've changed, but the most important set of changes by far).
>
> This initial version is still quite lacking in overall polish, but I
> believe that it gets the general structure right. That's what I'd like
> to get feedback on right now: can I get agreement with me about the
> general nature of the problem? Does this high level direction seem
> like the right one?

I believe the high-level direction is sound, and some details have been discussed before.

> The following list is a summary of the major changes that I propose:
>
> 1. Restructures the order of items to match the actual processing
> order within VACUUM (and ANALYZE), rather than jumping from VACUUM to
> ANALYZE and then back to VACUUM.
>
> This flows a lot better, which helps with later items that deal with
> freezing/wraparound.

Seems logical.

> 2. Renamed "Preventing Transaction ID Wraparound Failures" to
> "Freezing to manage the transaction ID space". Now we talk about
> wraparound as a subtopic of freezing, not vice-versa. (This is a
> complete rewrite, as described by later items in this list).

+1

> 3. All of the stuff about modulo-2^32 arithmetic is moved to the
> storage chapter, where we describe the heap tuple header format.

It does seem to be an excessive level of detail for this chapter, so +1. Speaking of excessive detail, however...(skipping ahead)

+    <note>
+     <para>
+      There is no fundamental difference between a
+      <command>VACUUM</command> run during anti-wraparound
+      autovacuuming and a <command>VACUUM</command> that happens to
+      use the aggressive strategy (whether run by autovacuum or
+      manually issued).
+     </para>
+    </note>

I don't see the value of this, from the user's perspective, of mentioning this at all, much less for it to be called out as a Note. Imagine a user who has been burnt by non-cancellable vacuums. How would they interpret this statement?

> It seems crazy to me that the second sentence in our discussion of
> wraparound/freezing is still:
>
> "But since transaction IDs have limited size (32 bits) a cluster that
> runs for a long time (more than 4 billion transactions) would suffer
> transaction ID wraparound: the XID counter wraps around to zero, and
> all of a sudden transactions that were in the past appear to be in the
> future"

Hah!

> 4. No more separate section for MultiXactID freezing -- that's
> discussed as part of the discussion of page-level freezing.
>
> Page-level freezing takes place without regard to the trigger
> condition for freezing. So the new approach to freezing has a fixed
> idea of what it means to freeze a given page (what physical
> modifications it entails). This means that having a separate sect3
> subsection for MultiXactIds now makes no sense (if it ever did).

I have no strong opinion on that.

> 5. The top-level list of maintenance tasks has a new addition: "To
> truncate obsolescent transaction status information, when possible".

+1

> 6. Rename the whole "Routine Vacuuming" section to "Autovacuum
> Maintenance Tasks".
>
> This is what we should be emphasizing over manually run VACUUMs.
> Besides, the current title just seems wrong -- we're talking about
> ANALYZE just as much as VACUUM.

Seems more accurate. On top of that, "Routine vacuuming" slightly implies manual vacuums.

I've only taken a cursory look, but will look more closely as time permits.

(Side note: My personal preference for rough doc patches would be to leave out spurious whitespace changes. That not only includes indentation, but also paragraphs where many of the words haven't changed at all, but every line has changed to keep the paragraph tidy. Seems like more work for both the author and the reviewer.)

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

From
Peter Geoghegan
Date:
On Wed, Apr 26, 2023 at 12:16 AM John Naylor
<john.naylor@enterprisedb.com> wrote:
> Now is a great time to revise this section, in my view. (I myself am about ready to get back to testing and writing
forthe task of removing that "obnoxious hint".) 

Although I didn't mention the issue with single user mode in my
introductory email (the situation there is just appalling IMV), it
seems like I might not be able to ignore that problem while I'm
working on this patch. Declaring that as out of scope for this doc
patch series (on pragmatic grounds) feels awkward. I have to work
around something that is just wrong. For now, the doc patch just has
an "XXX" item about it. (Hopefully I'll think of a more natural way of
not fixing it.)

> > This initial version is still quite lacking in overall polish, but I
> > believe that it gets the general structure right. That's what I'd like
> > to get feedback on right now: can I get agreement with me about the
> > general nature of the problem? Does this high level direction seem
> > like the right one?
>
> I believe the high-level direction is sound, and some details have been discussed before.

I'm relieved that you think so. I was a bit worried that I'd get
bogged down, having already invested a lot of time in this.

Attached is v2. It has the same high level direction as v1, but is a
lot more polished. Still not committable, to be sure. But better than
v1.

I'm also attaching a prebuilt copy of routine-vacuuming.html, as with
v1 -- hopefully that's helpful.

> > 3. All of the stuff about modulo-2^32 arithmetic is moved to the
> > storage chapter, where we describe the heap tuple header format.
>
> It does seem to be an excessive level of detail for this chapter, so +1. Speaking of excessive detail,
however...(skippingahead) 

My primary objection to talking about modulo-2^32 stuff first is not
that it's an excessive amount of detail (though it definitely is). My
objection is that it places emphasis on exactly the thing that *isn't*
supposed to matter, under the design of freezing -- greatly confusing
the reader (even sophisticated readers). Discussion of so-called
wraparound should start with logical concepts, such as xmin XIDs being
treated as "infinitely far in the past" once frozen. The physical data
structures do matter too, but even there the emphasis should be on
heap pages being "self-contained", in the sense that SQL queries won't
need to access pg_xact to read the rows from the pages going forward
(even on standbys).

Why do we call wraparound wraparound, anyway? The 32-bit XID space is
circular! The whole point of the design is that unsigned integer
wraparound is meaningless -- there isn't really a point in "the
circle" that you should think of as the start point or end point.
(We're probably stuck with the term "wraparound" for now, so I'm not
proposing that it be changed here, purely on pragmatic grounds.)

> +    <note>
> +     <para>
> +      There is no fundamental difference between a
> +      <command>VACUUM</command> run during anti-wraparound
> +      autovacuuming and a <command>VACUUM</command> that happens to
> +      use the aggressive strategy (whether run by autovacuum or
> +      manually issued).
> +     </para>
> +    </note>
>
> I don't see the value of this, from the user's perspective, of mentioning this at all, much less for it to be called
outas a Note. Imagine a user who has been burnt by non-cancellable vacuums. How would they interpret this statement? 

I meant that it isn't special from the point of view of vacuumlazy.c.
I do see your point, though. I've taken that out in v2.

(I happen to believe that the antiwraparound autocancellation behavior
is very unhelpful as currently implemented, which biased my view of
this.)

> > 4. No more separate section for MultiXactID freezing -- that's
> > discussed as part of the discussion of page-level freezing.
> >
> > Page-level freezing takes place without regard to the trigger
> > condition for freezing. So the new approach to freezing has a fixed
> > idea of what it means to freeze a given page (what physical
> > modifications it entails). This means that having a separate sect3
> > subsection for MultiXactIds now makes no sense (if it ever did).
>
> I have no strong opinion on that.

Most of the time, when antiwraparound autovacuums are triggered by
autovacuum_multixact_freeze_max_age, in a way that is noticeable (say
a large table), VACUUM will in all likelihood end up processing
exactly 0 multis. What you'll get is pretty much an "early" aggressive
VACUUM, which isn't such a big deal (especially with page-level
freezing). You can already get an "early" aggressive VACUUM due to
hitting vacuum_freeze_table_age before autovacuum_freeze_max_age is
ever reached (in fact it's the common case, now that we have
insert-driven autovacuums).

So I'm trying to suggest that an aggressive VACUUM is the same
regardless of the trigger condition. To a lesser extent, I'm trying to
make the user aware that the mechanical difference between aggressive
and non-aggressive is fairly minor, even if the consequences of that
difference are quite noticeable. (Though maybe they're less noticeable
with the v16 work in place.)

> I've only taken a cursory look, but will look more closely as time permits.

I would really appreciate that. This is not easy work.

I suspect that the docs talk about wraparound using extremely alarming
language possible because at one point it really was necessary to
scare users into running VACUUM to avoid data loss. This was before
autovacuum, and before the invention of vxids, and even before the
invention of freezing. It was up to you as a user to VACUUM your
database using cron, and if you didn't then eventually data loss could
result.

Obviously these docs were updated many times over the years, but I
maintain that the basic structure from 20 years ago is still present
in a way that it really shouldn't be.

> (Side note: My personal preference for rough doc patches would be to leave out spurious whitespace changes.

I've tried to keep them out (or at least break the noisy whitespace
changes out into their own commit). I might have missed a few of them
in v1, which are fixed in v2.

Thanks
--
Peter Geoghegan

Attachment

On Thu, Apr 27, 2023 at 12:58 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Wed, Apr 26, 2023 at 12:16 AM John Naylor
> <john.naylor@enterprisedb.com> wrote:
> > Now is a great time to revise this section, in my view. (I myself am about ready to get back to testing and writing for the task of removing that "obnoxious hint".)
>
> Although I didn't mention the issue with single user mode in my
> introductory email (the situation there is just appalling IMV), it
> seems like I might not be able to ignore that problem while I'm
> working on this patch. Declaring that as out of scope for this doc
> patch series (on pragmatic grounds) feels awkward. I have to work
> around something that is just wrong. For now, the doc patch just has
> an "XXX" item about it. (Hopefully I'll think of a more natural way of
> not fixing it.)

If it helps, I've gone ahead with some testing and polishing on that, and it's close to ready, I think (CC'd you). I'd like that piece to be separate and small enough to be backpatchable (at least in theory).

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

From
Peter Geoghegan
Date:
On Sat, Apr 29, 2023 at 1:17 AM John Naylor
<john.naylor@enterprisedb.com> wrote:
> > Although I didn't mention the issue with single user mode in my
> > introductory email (the situation there is just appalling IMV), it
> > seems like I might not be able to ignore that problem while I'm
> > working on this patch. Declaring that as out of scope for this doc
> > patch series (on pragmatic grounds) feels awkward. I have to work
> > around something that is just wrong. For now, the doc patch just has
> > an "XXX" item about it. (Hopefully I'll think of a more natural way of
> > not fixing it.)
>
> If it helps, I've gone ahead with some testing and polishing on that, and it's close to ready, I think (CC'd you).
I'dlike that piece to be separate and small enough to be backpatchable (at least in theory). 

That's great news. Not least because it unblocks this patch series of mine.

--
Peter Geoghegan




On Thu, Apr 27, 2023 at 12:58 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> [v2]

I've done a more careful read-through, but I'll need a couple more, I imagine.

I'll first point out some things I appreciate, and I'm glad are taken care of as part of this work:

- Pushing the talk of scheduled manual vacuums to the last, rather than first, para in the intro
- No longer pretending that turning off autovacuum is somehow normal
- Removing the egregiously outdated practice of referring to VACUUM FULL as a "variant" of VACUUM
- Removing the mention of ALTER TABLE that has no earthly business in this chapter -- for that, rewriting the table is a side effect to try to avoid, not a tool in our smorgasbord for removing severe bloat.

Some suggestions:

- The section "Recovering Disk Space" now has 5 tips/notes/warnings in a row. This is good information, but I wonder about:

"Note: Although VACUUM FULL is technically an option of the VACUUM command, VACUUM FULL uses a completely different implementation. VACUUM FULL is essentially a variant of CLUSTER. (The name VACUUM FULL is historical; the original implementation was somewhat closer to standard VACUUM.)"

...maybe move this to a second paragraph in the warning about VACUUM FULL and CLUSTER?

- The sentence "The XID cutoff point that VACUUM uses..." reads a bit abruptly and unmotivated (although it is important). Part of the reason for this is that the hyperlink "transaction ID number (XID)" which points to the glossary is further down the page than this first mention.

- "VACUUM often marks certain pages frozen, indicating that all eligible rows on the page were inserted by a transaction that committed sufficiently far in the past that the effects of the inserting transaction are certain to be visible to all current and future transactions."
    -> This sentence is much harder to understand than the one it replaces. Also, this is the first time "eligible" is mentioned. It may not need a separate definition, but in this form it's rather circular.

- "freezing plays a crucial role in enabling _management of the XID address_ space by VACUUM"
    -> "management of the XID address space" links to the aggressive-strategy sub-section below, but it's a strange link title because the section we're in is itself titled "Freezing to manage the transaction ID space".

- "The maximum “distance” that the system can tolerate..."
    -> The next sentence goes on to show the "age" function, so using different terms is a bit strange. Mixing the established age term with an in-quotes "distance" could perhaps be done once in a definition, but then all uses should stick to age.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

From
Peter Geoghegan
Date:
On Sat, Apr 29, 2023 at 8:54 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
> I've done a more careful read-through, but I'll need a couple more, I imagine.

Yeah, it's tough to get this stuff right.

> I'll first point out some things I appreciate, and I'm glad are taken care of as part of this work:
>
> - Pushing the talk of scheduled manual vacuums to the last, rather than first, para in the intro
> - No longer pretending that turning off autovacuum is somehow normal
> - Removing the egregiously outdated practice of referring to VACUUM FULL as a "variant" of VACUUM
> - Removing the mention of ALTER TABLE that has no earthly business in this chapter -- for that, rewriting the table
isa side effect to try to avoid, not a tool in our smorgasbord for removing severe bloat. 
>
> Some suggestions:
>
> - The section "Recovering Disk Space" now has 5 tips/notes/warnings in a row.

It occurs to me that all of this stuff (TRUNCATE, VACUUM FULL, and so
on) isn't "routine" at all. And so maybe this is the wrong chapter for
this entirely. The way I dealt with it in v2 wasn't very worked out --
I just knew that I had to do something, but hadn't given much thought
to what actually made sense.

I wonder if it would make sense to move all of that stuff into its own
new sect1 of "Chapter 29. Monitoring Disk Usage" -- something along
the lines of "what to do about bloat when all else fails, when the
problem gets completely out of hand". Naturally we'd link to this new
section from "Routine Vacuuming". What do you think of that general
approach?

> This is good information, but I wonder about:
> (Various points)

That's good feedback. I'll get to this in a couple of days.

--
Peter Geoghegan



On Wed, Apr 26, 2023 at 1:58 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Why do we call wraparound wraparound, anyway? The 32-bit XID space is
> circular! The whole point of the design is that unsigned integer
> wraparound is meaningless -- there isn't really a point in "the
> circle" that you should think of as the start point or end point.
> (We're probably stuck with the term "wraparound" for now, so I'm not
> proposing that it be changed here, purely on pragmatic grounds.)

To me, the fact that the XID space is circular is the whole point of
talking about wraparound. If the XID space were non-circular, it could
never try to reuse the XID values that have previously been used, and
this entire class of problems would go away. Because it is circular,
it's possible for the XID counter to arrive back at a place that it's
been before i.e. it can wrap around.

--
Robert Haas
EDB: http://www.enterprisedb.com



On Mon, May 1, 2023 at 8:03 AM Robert Haas <robertmhaas@gmail.com> wrote:
> To me, the fact that the XID space is circular is the whole point of
> talking about wraparound.

The word wraparound is ambiguous. It's not the same thing as
xidStopLimit in my view. It's literal integer wraparound.

If you think of XIDs as having a native 64-bit representation, while
using a truncated 32-bit on-disk representation in tuple headers
(which is the view promoted by the doc patch), then XIDs cannot wrap
around. There is still no possibility of "the future becoming the
past" (assuming no use of single user mode), either, because even in
the worst case we have xidStopLimit to make sure that the database
doesn't become corrupt. Why talk about what's *not* happening in a
place of prominence?

We'll still talk about literal integer wraparound with the doc patch,
but it's part of a discussion of the on-disk format in a distant
chapter. It's just an implementation detail, which is of no practical
consequence. The main discussion need only say something succinct and
vague about the use of a truncated representation (lacking a separate
epoch) in tuple headers eventually forcing freezing.

> If the XID space were non-circular, it could
> never try to reuse the XID values that have previously been used, and
> this entire class of problems would go away. Because it is circular,
> it's possible for the XID counter to arrive back at a place that it's
> been before i.e. it can wrap around.

But integer wrap around isn't really aligned with anything important.
xidStopLimit will kick in when we're only halfway towards literal
integer wrap around. Users have practical concerns about avoiding
xidStopLimit -- what a world without xidStopLimit looks like just
doesn't matter. Just having some vague awareness of truncated XIDs
being insufficient at some point is all you really need, even if
you're an advanced user.

--
Peter Geoghegan



On Mon, May 1, 2023 at 12:01 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > If the XID space were non-circular, it could
> > never try to reuse the XID values that have previously been used, and
> > this entire class of problems would go away. Because it is circular,
> > it's possible for the XID counter to arrive back at a place that it's
> > been before i.e. it can wrap around.
>
> But integer wrap around isn't really aligned with anything important.
> xidStopLimit will kick in when we're only halfway towards literal
> integer wrap around. Users have practical concerns about avoiding
> xidStopLimit -- what a world without xidStopLimit looks like just
> doesn't matter. Just having some vague awareness of truncated XIDs
> being insufficient at some point is all you really need, even if
> you're an advanced user.

I disagree. If you start the cluster in single-user mode, you can
actually wrap it around, unless something has changed that I don't
know about.

I'm not trying to debate the details of the patch, which I have not
read. I am saying that, while wraparound is perhaps not a perfect term
for what's happening, it is not, in my opinion, a bad term either. I
don't think it's accurate to imagine that this is a 64-bit counter
where we only store 32 bits on disk. We're trying to retcon that into
being true, but we'd have to work significantly harder to actually
make it true.

--
Robert Haas
EDB: http://www.enterprisedb.com



On Mon, May 1, 2023 at 9:08 AM Robert Haas <robertmhaas@gmail.com> wrote:
> I disagree. If you start the cluster in single-user mode, you can
> actually wrap it around, unless something has changed that I don't
> know about.

This patch relies on John's other patch which strongly discourages the
use of single-user mode. Were it not for that, I might agree.

> I'm not trying to debate the details of the patch, which I have not
> read. I am saying that, while wraparound is perhaps not a perfect term
> for what's happening, it is not, in my opinion, a bad term either. I
> don't think it's accurate to imagine that this is a 64-bit counter
> where we only store 32 bits on disk. We're trying to retcon that into
> being true, but we'd have to work significantly harder to actually
> make it true.

The purpose of this documentation section is to give users practical
guidance, obviously. The main reason to frame it this way is because
it seems to make the material easier to understand.

--
Peter Geoghegan



On Mon, May 1, 2023 at 9:16 AM Peter Geoghegan <pg@bowt.ie> wrote:
> On Mon, May 1, 2023 at 9:08 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > I disagree. If you start the cluster in single-user mode, you can
> > actually wrap it around, unless something has changed that I don't
> > know about.
>
> This patch relies on John's other patch which strongly discourages the
> use of single-user mode. Were it not for that, I might agree.

Also, it's not clear that the term "wraparound" even describes what
happens when you corrupt the database by violating the "no more than
~2.1 billion XIDs distance between any two unfrozen XIDs" invariant in
single-user mode. What specific thing will have wrapped around? It's
possible (and very likely) that every unfrozen XID in the database is
from the same 64-XID-wise epoch.

I don't think that we need to say very much about this scenario (and
nothing at all about the specifics in "Routine Vacuuming"), so maybe
it doesn't matter much. But I maintain that it makes most sense to
describe this scenario as a violation of the "no more than ~2.1
billion XIDs distance between any two unfrozen XIDs" invariant, while
leaving the term "wraparound" out of it completely. That terms has way
too much baggage.

--
Peter Geoghegan



On Mon, May 1, 2023, 18:08 Robert Haas <robertmhaas@gmail.com> wrote:
I am saying that, while wraparound is perhaps not a perfect term
for what's happening, it is not, in my opinion, a bad term either.

I don't want to put words into Peter's mouth, but I think that he's arguing that the term "wraparound" suggests that there is something special about the transition between xid 2^32 and xid 0 (or, well, 3). There isn't. There's only something special about the transition, as your current xid advances, between the xid that's half the xid space ahead of your current xid and the xid that's half the xid space behind the current xid, if the latter is not frozen. I don't think that's what most users think of when they hear "wraparound".
On Mon, May 1, 2023 at 12:03 PM Maciek Sakrejda <m.sakrejda@gmail.com> wrote:
> I don't want to put words into Peter's mouth, but I think that he's arguing that the term "wraparound" suggests that
thereis something special about the transition between xid 2^32 and xid 0 (or, well, 3). There isn't. 

Yes, that's exactly what I mean. There are two points that seem to be
very much in tension here:

1. The scenario where you corrupt the database in single user mode by
unsafely allocating XIDs (you need single user mode to bypass the
xidStopLimit protections) generally won't involve unsigned integer
wraparound (and if it does it's *entirely* incidental to the data
corruption).

2. Actual unsigned integer wraparound is 100% harmless and routine, by design.

So why do we use the term wraparound as a synonym of "the end of the
world"? I assume that it's just an artefact of how the system worked
before the invention of freezing. Back then, you had to do a dump and
restore when the system reached about 4 billion XIDs. Wraparound
really did mean "the end of the world" over 20 years ago.

This is related to my preference for explaining the issues with
reference to a 64-bit XID space. Today we compare 64-bit XIDs using
simple unsigned integer comparisons. That's the same way that 32-bit
XID comparisons worked before freezing was invented in 2001. So it
really does seem like the natural way to explain it.

--
Peter Geoghegan



On Tue, May 2, 2023 at 12:09 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Mon, May 1, 2023 at 9:16 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > On Mon, May 1, 2023 at 9:08 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > > I disagree. If you start the cluster in single-user mode, you can
> > > actually wrap it around, unless something has changed that I don't
> > > know about.

+1 Pretending otherwise is dishonest.

> > This patch relies on John's other patch which strongly discourages the
> > use of single-user mode. Were it not for that, I might agree.

Oh that's rich. I'll note that 5% of your review was actually helpful (actual correction), the other 95% was needless distraction trying to enlist me in your holy crusade against the term "wraparound". It had the opposite effect.

> Also, it's not clear that the term "wraparound" even describes what
> happens when you corrupt the database by violating the "no more than
> ~2.1 billion XIDs distance between any two unfrozen XIDs" invariant in
> single-user mode. What specific thing will have wrapped around?

In your first message you said "I'm hoping that I don't get too much push back on this, because it's already very difficult work."

Here's some advice on how to avoid pushback:

1. Insist that all terms can only be interpreted in the most pig-headedly literal sense possible.
2. Use that premise to pretend basic facts are a complete mystery.
3. Claim that others are holding you back, and then try to move the goalposts in their work.

--
John Naylor
EDB: http://www.enterprisedb.com
On Mon, May 1, 2023 at 8:04 PM John Naylor <john.naylor@enterprisedb.com> wrote:
> Here's some advice on how to avoid pushback:
>
> 1. Insist that all terms can only be interpreted in the most pig-headedly literal sense possible.
> 2. Use that premise to pretend basic facts are a complete mystery.

I can't imagine why you feel it necessary to communicate with me like
this. This is just vitriol, lacking any substance.

How we use words like wraparound is actually something of great
consequence to the Postgres project. We've needlessly scared users
with the way this information has been presented up until now -- that
much is clear. To have you talk to me like this when I'm working on
such a difficult, thankless task is a real slap in the face.

> 3. Claim that others are holding you back, and then try to move the goalposts in their work.

When did I say that? When did I even suggest it?

--
Peter Geoghegan



On Mon, May 1, 2023 at 8:04 PM John Naylor <john.naylor@enterprisedb.com> wrote:
> Oh that's rich. I'll note that 5% of your review was actually helpful (actual correction), the other 95% was needless
distractiontrying to enlist me in your holy crusade against the term "wraparound". It had the opposite effect. 

I went back and checked. There were exactly two short paragraphs about
wraparound terminology on the thread associated with the patch you're
working on, towards the end of this one email:

https://postgr.es/m/CAH2-Wzm2fpPQ_=pXpRvkNiuTYBGTAUfxRNW40kLitxj9T3Ny7w@mail.gmail.com

In what world does that amount to 95% of my review, or anything like it?

--
Peter Geoghegan



On Mon, May 1, 2023 at 11:21 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I can't imagine why you feel it necessary to communicate with me like
> this. This is just vitriol, lacking any substance.

John's email is pretty harsh, but I can understand why he's frustrated.

I told you that I did not agree with your dislike for the term
wraparound and I explained why. You sent a couple more emails telling
me that I was wrong and, frankly, saying a lot of things that seem
only tangentially related to the point that I was actually making. You
seem to expect other people to spend a LOT OF TIME trying to
understand what you're trying to say, but you don't seem to invest
similar effort in trying to understand what they're trying to say. I
couldn't even begin to grasp what your point was until Maciek stepped
in to explain, and I still don't really agree with it, and I expect
that no matter how many emails I write about that, your position won't
budge an iota.

It's really demoralizing. If I just vote -1 on the patch set, then I'm
a useless obstruction. If I actually try to review it, we'll exchange
100 emails and I won't get anything else done for the next two weeks
and I probably won't feel much better about the patch at the end of
that process than at the beginning. I don't see that I have any
winning options here.

--
Robert Haas
EDB: http://www.enterprisedb.com



On Tue, May 2, 2023 at 1:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I told you that I did not agree with your dislike for the term
> wraparound and I explained why. You sent a couple more emails telling
> me that I was wrong and, frankly, saying a lot of things that seem
> only tangentially related to the point that I was actually making.

I agree that that's what I did. You're perfectly entitled to find that
annoying (though I maintain that my point about the 64-bit XID space
was a good one, assuming the general subject matter was of interest).
However, you're talking about this as if I dug my feet in on a
substantive issue affecting the basic shape of the patch -- I don't
believe that that conclusion is justified by anything I've said or
done. I'm not even sure that we disagree on some less important point
that will directly affect the patch (it's quite possible, but I'm not
even sure of it).

I've already said that I don't think that the term wraparound is going
anywhere anytime soon (granted, that was on the other thread). So it's
not like I'm attempting to banish all existing use of that terminology
within the scope of this patch series -- far from it. At most I tried
to avoid inventing new terms that contain the word "wraparound" (also
on the other thread).

The topic originally came up in the context of moving talk about
physical wraparound to an entirely different chapter. Which is, I
believe (based in part on previous discussions), something that all
three of us already agree on! So again, I must ask: is there actually
a substantive disagreement at all?

> It's really demoralizing. If I just vote -1 on the patch set, then I'm
> a useless obstruction. If I actually try to review it, we'll exchange
> 100 emails and I won't get anything else done for the next two weeks
> and I probably won't feel much better about the patch at the end of
> that process than at the beginning. I don't see that I have any
> winning options here.

I've already put a huge amount of work into this. It is inherently a
very difficult thing to get right -- it's not hard to understand why
it was put off for so long. Why shouldn't I have opinions, given all
that? I'm frustrated too.

Despite all this, John basically agreed with my high level direction
-- all of the important points seemed to have been settled without any
arguments whatsoever (also based in part on previous discussions).
John's volley of abuse seemed to come from nowhere at all.

--
Peter Geoghegan



Hi,

On Mon, Apr 24, 2023 at 2:58 PM Peter Geoghegan <pg@bowt.ie> wrote:
My work on page-level freezing for PostgreSQL 16 has some remaining
loose ends to tie up with the documentation. The "Routine Vacuuming"
section of the docs has no mention of page-level freezing. It also
doesn't mention the FPI optimization added by commit 1de58df4. This
isn't a small thing to leave out; I fully expect that the FPI
optimization will very significantly alter when and how VACUUM
freezes. The cadence will look quite a lot different.

It seemed almost impossible to fit in discussion of page-level
freezing to the existing structure. In part this is because the
existing documentation emphasizes the worst case scenario, rather than
talking about freezing as a maintenance task that affects physical
heap pages in roughly the same way as pruning does. There isn't a
clean separation of things that would allow me to just add a paragraph
about the FPI thing.

Obviously it's important that the system never enters xidStopLimit
mode -- not being able to allocate new XIDs is a huge problem. But it
seems unhelpful to define that as the only goal of freezing, or even
the main goal. To me this seems similar to defining the goal of
cleaning up bloat as avoiding completely running out of disk space;
while it may be "the single most important thing" in some general
sense, it isn't all that important in most individual cases. There are
many very bad things that will happen before that extreme worst case
is hit, which are far more likely to be the real source of pain.

There are also very big structural problems with "Routine Vacuuming",
that I also propose to do something about. Honestly, it's a huge mess
at this point. It's nobody's fault in particular; there has been
accretion after accretion added, over many years. It is time to
finally bite the bullet and do some serious restructuring. I'm hoping
that I don't get too much push back on this, because it's already very
difficult work.

Thanks for taking the time to do this. It is indeed difficult work. I'll give my perspective as someone who has not read the vacuum code but have learnt most of what I know about autovacuum / vacuuming by reading the "Routine Vacuuming" page 10s of times. 
 

Attached patch series shows what I consider to be a much better
overall structure. To make this convenient to take a quick look at, I
also attach a prebuilt version of routine-vacuuming.html (not the only
page that I've changed, but the most important set of changes by far).

This initial version is still quite lacking in overall polish, but I
believe that it gets the general structure right. That's what I'd like
to get feedback on right now: can I get agreement with me about the
general nature of the problem? Does this high level direction seem
like the right one?

There are things I like about the changes you've proposed and some where I feel that the previous section was easier to understand. I'll comment inline on the summary below and will put in a few points about things I think can be improved at the end.
 

The following list is a summary of the major changes that I propose:

1. Restructures the order of items to match the actual processing
order within VACUUM (and ANALYZE), rather than jumping from VACUUM to
ANALYZE and then back to VACUUM.

This flows a lot better, which helps with later items that deal with
freezing/wraparound.

+1
 

2. Renamed "Preventing Transaction ID Wraparound Failures" to
"Freezing to manage the transaction ID space". Now we talk about
wraparound as a subtopic of freezing, not vice-versa. (This is a
complete rewrite, as described by later items in this list).

+1 on this too. Freezing is a normal part of vacuuming and while the aggressive vacuums are different, I think just talking about the worst case scenario while referring to it is alarmist.
 

3. All of the stuff about modulo-2^32 arithmetic is moved to the
storage chapter, where we describe the heap tuple header format.

It seems crazy to me that the second sentence in our discussion of
wraparound/freezing is still:

"But since transaction IDs have limited size (32 bits) a cluster that
runs for a long time (more than 4 billion transactions) would suffer
transaction ID wraparound: the XID counter wraps around to zero, and
all of a sudden transactions that were in the past appear to be in the
future"

Here we start the whole discussion of wraparound (a particularly
delicate topic) by describing how VACUUM used to work 20 years ago,
before the invention of freezing. That was the last time that a
PostgreSQL cluster could run for 4 billion XIDs without freezing. The
invariant is that we activate xidStopLimit mode protections to avoid a
"distance" between any two unfrozen XIDs that exceeds about 2 billion
XIDs. So why on earth are we talking about 4 billion XIDs? This is the
most confusing, least useful way of describing freezing that I can
think of.

4. No more separate section for MultiXactID freezing -- that's
discussed as part of the discussion of page-level freezing.

Page-level freezing takes place without regard to the trigger
condition for freezing. So the new approach to freezing has a fixed
idea of what it means to freeze a given page (what physical
modifications it entails). This means that having a separate sect3
subsection for MultiXactIds now makes no sense (if it ever did).

5. The top-level list of maintenance tasks has a new addition: "To
truncate obsolescent transaction status information, when possible".

It makes a lot of sense to talk about this as something that happens
last (or last among those steps that take place during VACUUM). It's
far less important than avoiding xidStopLimit outages, obviously
(using some extra disk space is almost certainly the least of your
worries when you're near to xidStopLimit). The current documentation
seems to take precisely the opposite view, when it says the following:

"The sole disadvantage of increasing autovacuum_freeze_max_age (and
vacuum_freeze_table_age along with it) is that the pg_xact and
pg_commit_ts subdirectories of the database cluster will take more
space"

This sentence is dangerously bad advice. It is precisely backwards. At
the same time, we'd better say something about the need to truncate
pg_xact/clog here. Besides all this, the new section for this is a far
more accurate reflection of what's really going on: most individual
VACUUMs (even most aggressive VACUUMs) won't ever truncate
pg_xact/clog (or the other relevant SLRUs). Truncation only happens
after a VACUUM that advances the relfrozenxid of the table which
previously had the oldest relfrozenxid among all tables in the entire
cluster -- so we need to talk about it as an issue with the high
watermark storage for pg_xact.

6. Rename the whole "Routine Vacuuming" section to "Autovacuum
Maintenance Tasks".

This is what we should be emphasizing over manually run VACUUMs.
Besides, the current title just seems wrong -- we're talking about
ANALYZE just as much as VACUUM.

+1 on this. Talking about autovacuum as the default and how to get the most out of it seems like the right way to go.

I read through the new version a couple times and here is some of my feedback. I haven't yet reviewed individual patches or done a very detailed comparison with the previous version.

1) While I agree that bundling VACUUM and VACUUM FULL is not the right way, moving all VACUUM FULL references into tips and warnings also seems excessive. I think it's probably best to just have a single paragraph which talks about VACUUM FULL as I do think it should be mentioned in the reclaiming disk space section.
2) I felt that the new section, "Freezing to manage the transaction ID space" could be made simpler to understand. As an example, I understood what the parameters (autovacuum_freeze_max_age, vacuum_freeze_table_age) do and how they interact better in the previous version of the docs.
3) In the "VACUUMs aggressive strategy" section, we should first introduce what an aggressive VACUUM is before going into when it's triggered, where metadata is stored etc. It's only several paragraphs later that I get to know what we are referring to as an "aggressive" autovacuum.
4) I think we should explicitly call out that seeing an anti-wraparound VACUUM or "VACUUM table (to prevent wraparound)" is normal and that it's just a VACUUM triggered due to the table having unfrozen rows with an XID older than autovacuum_freeze_max_age. I've seen many users panicking on seeing this and feeling that they are close to a wraparound. Also, we should be more clear about how it's different from VACUUMs triggered due to the scale factors (cancellation behavior, being triggered when autovacuum is disabled etc.). I think you do some of this but given the panic around transactionid wraparounds, being more clear about this is better.
5) Can we use a better name for the XidStopLimit mode? It seems like a very implementation centric name. Maybe a better version of "Running out of the XID space" or something like that?
6) In the XidStopLimit mode section, it would be good to explain briefly why you could get to this scenario. It's not something which should happen in a normal running system unless you have a long running transaction or inactive replication slots or a badly configured system or something of that sort. If you got to this point, other than running VACUUM to get out of the situation, it's also important to figure out what got you there in the first place as many VACUUMs should have attempted to advance the relfrozenxid and failed.

There are a few other small things I noticed along the way but my goal was to look at the overall structure. As we address some of these, I'm happy to do more detailed review of individual patches.

Regards,
Samay

Thoughts?

--
Peter Geoghegan
Hi Samay,

On Tue, May 2, 2023 at 11:40 PM samay sharma <smilingsamay@gmail.com> wrote:
> Thanks for taking the time to do this. It is indeed difficult work.

Thanks for the review! I think that this is something that would
definitely benefit from a perspective such as yours.

> There are things I like about the changes you've proposed and some where I feel that the previous section was easier
tounderstand. 

That makes sense, and I think that I agree with every point you've
raised, bar none. I'm pleased to see that you basically agree with the
high level direction.

I would estimate that the version you looked at (v2) is perhaps 35%
complete. So some of the individual problems you noticed were a direct
consequence of the work just not being anywhere near complete. I'll
try to do a better job of tracking the relative maturity of each
commit/patch in each commit message, going forward.

Anything that falls under "25.2.1. Recovering Disk Space" is
particularly undeveloped in v2. The way that I broke that up into a
bunch of WARNINGs/NOTEs/TIPs was just a short term way of breaking it
up into pieces, so that the structure was very approximately what I
wanted. I actually think that the stuff about CLUSTER and VACUUM FULL
belongs in a completely different chapter. Since it is not "Routine
Vacuuming" at all.

>> 2. Renamed "Preventing Transaction ID Wraparound Failures" to
>> "Freezing to manage the transaction ID space". Now we talk about
>> wraparound as a subtopic of freezing, not vice-versa. (This is a
>> complete rewrite, as described by later items in this list).
>
> +1 on this too. Freezing is a normal part of vacuuming and while the aggressive vacuums are different, I think just
talkingabout the worst case scenario while referring to it is alarmist. 

Strangely enough, Postgres 16 is the first version that instruments
freezing in its autovacuum log reports. I suspect that some long term
users will find it quite surprising to see how much (or how little)
freezing takes place in non-aggressive VACUUMs.

The introduction of page-level freezing will make it easier and more
natural to tune settings like vacuum_freeze_min_age, with the aim of
smoothing out the burden of freezing over time (particularly by making
non-aggressive VACUUMs freeze more). Page-level freezing removes any
question of not freezing every tuple on a page (barring cases where
"removable cutoff" is noticeably held back by an old MVCC snapshot).
This makes it more natural to think of freezing as a process that
makes it okay to store data in individual physical heap pages, long
term.

> 1) While I agree that bundling VACUUM and VACUUM FULL is not the right way, moving all VACUUM FULL references into
tipsand warnings also seems excessive. I think it's probably best to just have a single paragraph which talks about
VACUUMFULL as I do think it should be mentioned in the reclaiming disk space section. 

As I mentioned briefly already, my intention is to move it to another
chapter entirely. I was thinking of "Chapter 29. Monitoring Disk
Usage". The "Routine Vacuuming" docs would then link to this sect1 --
something along the lines of "non-routine commands to reclaim a lot of
disk space in the event of extreme bloat".

> 2) I felt that the new section, "Freezing to manage the transaction ID space" could be made simpler to understand. As
anexample, I understood what the parameters (autovacuum_freeze_max_age, vacuum_freeze_table_age) do and how they
interactbetter in the previous version of the docs. 

Agreed. I'm going to split it up some more. I think that the current
"25.2.2.1. VACUUM's Aggressive Strategy" should be split in two, so we
go from talking about aggressive VACUUMs to Antiwraparound
autovacuums. Finding the least confusing way of explaining it has been
a focus of mine in the last few days.

> 4) I think we should explicitly call out that seeing an anti-wraparound VACUUM or "VACUUM table (to prevent
wraparound)"is normal and that it's just a VACUUM triggered due to the table having unfrozen rows with an XID older
thanautovacuum_freeze_max_age. I've seen many users panicking on seeing this and feeling that they are close to a
wraparound.

That has also been my exact experience. Users are terrified, usually
for no good reason at all. I'll make sure that this comes across in
the next revision of the patch series.

> Also, we should be more clear about how it's different from VACUUMs triggered due to the scale factors (cancellation
behavior,being triggered when autovacuum is disabled etc.). 

Right. Though I think that the biggest point of confusion for users is
how *few* differences there really are between antiwraparound
autovacuum, and any other kind of autovacuum that happens to use
VACUUM's aggressive strategy. There is really only one important
difference: the autocancellation behavior. This is an autovacuum
behavior, not a VACUUM behavior -- so the "VACUUM side" doesn't know
anything about that at all.

> 5) Can we use a better name for the XidStopLimit mode? It seems like a very implementation centric name. Maybe a
betterversion of "Running out of the XID space" or something like that? 

Coming up with a new user-facing name for xidStopLimit is already on
my TODO list (it's surprisingly hard). I have used that name so far
because it unambiguously refers to the exact thing that I want to talk
about when discussing the worst case. Other than that, it's a terrible
name.

> 6) In the XidStopLimit mode section, it would be good to explain briefly why you could get to this scenario. It's not
somethingwhich should happen in a normal running system unless you have a long running transaction or inactive
replicationslots or a badly configured system or something of that sort. 

I agree that that's important. Note that there is already something
about "removable cutoff" being held back at the start of the
discussion of freezing -- that will prevent freezing in exactly the
same way as it prevents cleanup of dead tuples.

That will become a WARNING box in the next revision. There should also
be a similar, analogous WARNING box (about "removable cutoff" being
held back) much earlier on in the docs -- this should appear in
"25.2.1. Recovering Disk Space". Obviously this structure suggests
that there is an isomorphism between freezing and removing bloat. For
example, if you cannot "freeze" an XID that appears in some tuple's
xmax, then you also cannot remove that tuple because VACUUM only sees
it as a recently dead tuple (if xmax is >= OldestXmin/removable
cutoff, and from a deleter that already committed).

I don't think that we need to spell the "isomorphism" point out to the
reader directly, but having a subtle cue that that's how it works
seems like a good idea.

> If you got to this point, other than running VACUUM to get out of the situation, it's also important to figure out
whatgot you there in the first place as many VACUUMs should have attempted to advance the relfrozenxid and failed. 

It's also true that problems that can lead to the system entering
xidStopLimit mode aren't limited to cases where doing required
freezing is fundamentally impossible due to something holding back
"removable cutoff". It's also possible that VACUUM simply can't keep
up (though the failsafe has helped with that problem a lot).

I tend to agree that there needs to be more about this in the
xidStopLimit subsection (discussion of freezing being held back by
"removable cutoff" is insufficient), but FWIW that seems like it
should probably be treated as out of scope for this patch. It is more
the responsibility of the other patch [1] that aims to put the
xidStopLimit documentation on a better footing (and remove that
terrible HINT about single user mode).

Of course, that other patch is closely related to this patch -- the
precise boundaries are unclear at this point. In any case I think that
this should happen, because I think that it's a good idea.

> There are a few other small things I noticed along the way but my goal was to look at the overall structure.

Thanks again! This is very helpful.

[1] https://www.postgresql.org/message-id/flat/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg%40mail.gmail.com
--
Peter Geoghegan



On Wed, May 3, 2023 at 2:59 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Coming up with a new user-facing name for xidStopLimit is already on
> my TODO list (it's surprisingly hard). I have used that name so far
> because it unambiguously refers to the exact thing that I want to talk
> about when discussing the worst case. Other than that, it's a terrible
> name.

What about "XID allocation overload"? The implication that I'm going
for here is that the system was misconfigured, or there was otherwise
some kind of imbalance between XID supply and demand. It also seems to
convey the true gravity of the situation -- it's *bad*, to be sure,
but in many environments it's a survivable condition.

One possible downside of this name is that it could suggest that all
that needs to happen is for autovacuum to catch up on vacuuming. In
reality the user *will* probably have to do more than just wait before
the system's ability to allocate new XIDs returns, because (in all
likelihood) autovacuum just won't be able to catch up unless and until
the user (say) drops a replication slot. Even still, the name seems to
work; it describes the conceptual model of the system accurately. Even
before the user drops the replication slot, autovacuum will at least
*try* to get the system back to being able to allocate new XIDs once
more.

--
Peter Geoghegan



Hi,

On Wed, May 3, 2023 at 2:59 PM Peter Geoghegan <pg@bowt.ie> wrote:
Hi Samay,

On Tue, May 2, 2023 at 11:40 PM samay sharma <smilingsamay@gmail.com> wrote:
> Thanks for taking the time to do this. It is indeed difficult work.

Thanks for the review! I think that this is something that would
definitely benefit from a perspective such as yours.

Glad to hear that my feedback was helpful.
 

> There are things I like about the changes you've proposed and some where I feel that the previous section was easier to understand.

That makes sense, and I think that I agree with every point you've
raised, bar none. I'm pleased to see that you basically agree with the
high level direction.

I would estimate that the version you looked at (v2) is perhaps 35%
complete. So some of the individual problems you noticed were a direct
consequence of the work just not being anywhere near complete. I'll
try to do a better job of tracking the relative maturity of each
commit/patch in each commit message, going forward.

Anything that falls under "25.2.1. Recovering Disk Space" is
particularly undeveloped in v2. The way that I broke that up into a
bunch of WARNINGs/NOTEs/TIPs was just a short term way of breaking it
up into pieces, so that the structure was very approximately what I
wanted. I actually think that the stuff about CLUSTER and VACUUM FULL
belongs in a completely different chapter. Since it is not "Routine
Vacuuming" at all.

>> 2. Renamed "Preventing Transaction ID Wraparound Failures" to
>> "Freezing to manage the transaction ID space". Now we talk about
>> wraparound as a subtopic of freezing, not vice-versa. (This is a
>> complete rewrite, as described by later items in this list).
>
> +1 on this too. Freezing is a normal part of vacuuming and while the aggressive vacuums are different, I think just talking about the worst case scenario while referring to it is alarmist.

Strangely enough, Postgres 16 is the first version that instruments
freezing in its autovacuum log reports. I suspect that some long term
users will find it quite surprising to see how much (or how little)
freezing takes place in non-aggressive VACUUMs.

The introduction of page-level freezing will make it easier and more
natural to tune settings like vacuum_freeze_min_age, with the aim of
smoothing out the burden of freezing over time (particularly by making
non-aggressive VACUUMs freeze more). Page-level freezing removes any
question of not freezing every tuple on a page (barring cases where
"removable cutoff" is noticeably held back by an old MVCC snapshot).
This makes it more natural to think of freezing as a process that
makes it okay to store data in individual physical heap pages, long
term.

> 1) While I agree that bundling VACUUM and VACUUM FULL is not the right way, moving all VACUUM FULL references into tips and warnings also seems excessive. I think it's probably best to just have a single paragraph which talks about VACUUM FULL as I do think it should be mentioned in the reclaiming disk space section.

As I mentioned briefly already, my intention is to move it to another
chapter entirely. I was thinking of "Chapter 29. Monitoring Disk
Usage". The "Routine Vacuuming" docs would then link to this sect1 --
something along the lines of "non-routine commands to reclaim a lot of
disk space in the event of extreme bloat".

> 2) I felt that the new section, "Freezing to manage the transaction ID space" could be made simpler to understand. As an example, I understood what the parameters (autovacuum_freeze_max_age, vacuum_freeze_table_age) do and how they interact better in the previous version of the docs.

Agreed. I'm going to split it up some more. I think that the current
"25.2.2.1. VACUUM's Aggressive Strategy" should be split in two, so we
go from talking about aggressive VACUUMs to Antiwraparound
autovacuums. Finding the least confusing way of explaining it has been
a focus of mine in the last few days.

To be honest, this was not super simple to understand even in the previous version. However, as our goal is to simplify this and make it easier to understand, I'll hold this patch-set to a higher standard :).

I wish there was a simple representation (maybe even a table or something) which would explain the differences between a VACUUM which is not aggressive, a VACUUM which ends up being aggressive due to vacuum_freeze_table_age and an antiwraparound autovacuum.
 

> 4) I think we should explicitly call out that seeing an anti-wraparound VACUUM or "VACUUM table (to prevent wraparound)" is normal and that it's just a VACUUM triggered due to the table having unfrozen rows with an XID older than autovacuum_freeze_max_age. I've seen many users panicking on seeing this and feeling that they are close to a wraparound.

That has also been my exact experience. Users are terrified, usually
for no good reason at all. I'll make sure that this comes across in
the next revision of the patch series.

Thinking about it a bit more, I wonder if there's value in changing the "(to prevent wraparound)" to something else. It's understandable why people who just see that in pg_stat_activity and don't read docs might assume they are close to a wraparound.

Regards,
Samay
 

> Also, we should be more clear about how it's different from VACUUMs triggered due to the scale factors (cancellation behavior, being triggered when autovacuum is disabled etc.).

Right. Though I think that the biggest point of confusion for users is
how *few* differences there really are between antiwraparound
autovacuum, and any other kind of autovacuum that happens to use
VACUUM's aggressive strategy. There is really only one important
difference: the autocancellation behavior. This is an autovacuum
behavior, not a VACUUM behavior -- so the "VACUUM side" doesn't know
anything about that at all.  

> 5) Can we use a better name for the XidStopLimit mode? It seems like a very implementation centric name. Maybe a better version of "Running out of the XID space" or something like that?

Coming up with a new user-facing name for xidStopLimit is already on
my TODO list (it's surprisingly hard). I have used that name so far
because it unambiguously refers to the exact thing that I want to talk
about when discussing the worst case. Other than that, it's a terrible
name.

> 6) In the XidStopLimit mode section, it would be good to explain briefly why you could get to this scenario. It's not something which should happen in a normal running system unless you have a long running transaction or inactive replication slots or a badly configured system or something of that sort.

I agree that that's important. Note that there is already something
about "removable cutoff" being held back at the start of the
discussion of freezing -- that will prevent freezing in exactly the
same way as it prevents cleanup of dead tuples.

That will become a WARNING box in the next revision. There should also
be a similar, analogous WARNING box (about "removable cutoff" being
held back) much earlier on in the docs -- this should appear in
"25.2.1. Recovering Disk Space". Obviously this structure suggests
that there is an isomorphism between freezing and removing bloat. For
example, if you cannot "freeze" an XID that appears in some tuple's
xmax, then you also cannot remove that tuple because VACUUM only sees
it as a recently dead tuple (if xmax is >= OldestXmin/removable
cutoff, and from a deleter that already committed).

I don't think that we need to spell the "isomorphism" point out to the
reader directly, but having a subtle cue that that's how it works
seems like a good idea.

> If you got to this point, other than running VACUUM to get out of the situation, it's also important to figure out what got you there in the first place as many VACUUMs should have attempted to advance the relfrozenxid and failed.

It's also true that problems that can lead to the system entering
xidStopLimit mode aren't limited to cases where doing required
freezing is fundamentally impossible due to something holding back
"removable cutoff". It's also possible that VACUUM simply can't keep
up (though the failsafe has helped with that problem a lot).

I tend to agree that there needs to be more about this in the
xidStopLimit subsection (discussion of freezing being held back by
"removable cutoff" is insufficient), but FWIW that seems like it
should probably be treated as out of scope for this patch. It is more
the responsibility of the other patch [1] that aims to put the
xidStopLimit documentation on a better footing (and remove that
terrible HINT about single user mode).

Of course, that other patch is closely related to this patch -- the
precise boundaries are unclear at this point. In any case I think that
this should happen, because I think that it's a good idea.

> There are a few other small things I noticed along the way but my goal was to look at the overall structure.

Thanks again! This is very helpful.

[1] https://www.postgresql.org/message-id/flat/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg%40mail.gmail.com
--
Peter Geoghegan
Hi,

On Wed, May 3, 2023 at 3:48 PM Peter Geoghegan <pg@bowt.ie> wrote:
On Wed, May 3, 2023 at 2:59 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Coming up with a new user-facing name for xidStopLimit is already on
> my TODO list (it's surprisingly hard). I have used that name so far
> because it unambiguously refers to the exact thing that I want to talk
> about when discussing the worst case. Other than that, it's a terrible
> name.

What about "XID allocation overload"? The implication that I'm going
for here is that the system was misconfigured, or there was otherwise
some kind of imbalance between XID supply and demand. It also seems to
convey the true gravity of the situation -- it's *bad*, to be sure,
but in many environments it's a survivable condition.

My concern with the term "overload" is similar to what you expressed below. It indicates that the situation is due to extra load on the system (or due to too many XIDs being allocated) and people might assume that the situation will resolve itself if the load were to be reduced / removed. However, it's due to that along with some misconfiguration or some other thing holding back the "removable cutoff".

What do you think about the term "Exhaustion"? Maybe something like "XID allocation exhaustion" or "Exhaustion of allocatable XIDs"? The term indicates that we are running out of XIDs to allocate without necessarily pointing towards a reason.

Regards,
Samay
 

One possible downside of this name is that it could suggest that all
that needs to happen is for autovacuum to catch up on vacuuming. In
reality the user *will* probably have to do more than just wait before
the system's ability to allocate new XIDs returns, because (in all
likelihood) autovacuum just won't be able to catch up unless and until
the user (say) drops a replication slot. Even still, the name seems to
work; it describes the conceptual model of the system accurately. Even
before the user drops the replication slot, autovacuum will at least
*try* to get the system back to being able to allocate new XIDs once
more.

--
Peter Geoghegan
On Thu, May 4, 2023 at 3:18 PM samay sharma <smilingsamay@gmail.com> wrote:
> What do you think about the term "Exhaustion"?

I'm really not sure.

Attached is v3, which (as with v1 and v2) comes with a prebuilt html
"Routine Vacuuming", for the convenience of reviewers.

v3 does have some changes based on your feedback (and feedback from
John), but overall v3 can be thought of as v2 with lots and lots of
additional copy-editing -- though still not enough, I'm sure.

v3 does add some (still incomplete) introductory remarks about the
intended audience and goals for "Routine Vacuuming". But most of the
changes are to the way the docs describe freezing and aggressive
VACUUM, which continued to be my focus for v3.

--
Peter Geoghegan

Attachment
On Wed, 3 May 2023 at 18:50, Peter Geoghegan <pg@bowt.ie> wrote:
>
> What about "XID allocation overload"? The implication that I'm going
> for here is that the system was misconfigured, or there was otherwise
> some kind of imbalance between XID supply and demand.

Fwiw while "wraparound" has pitfalls I think changing it for a new
word isn't really helpful. Especially if it's a mostly meaningless
word like "overload" or "exhaustion". It suddenly makes every existing
doc hard to find and confusing to read.

I say "exhaustion" or "overload" are meaningless because their meaning
is entirely dependent on context. It's not like memory exhaustion or
i/o overload where it's a finite resource and it's just the sheer
amount in use that matters. One way or another the user needs to
understand that it's two numbers marching through a sequence  and the
distance between them matters.

I feel like "wraparound" while imperfect is not  any worse than any
other word. It still requires context to understand but it's context
that there are many docs online that already explain and are
googleable.

If we wanted a new word it would be "overrun" but like I say, it would
just create a new context dependent technical term that users would
need to find docs that explain and give context. I don't think that
really helps users at all

-- 
greg



On Thu, May 11, 2023 at 1:04 PM Greg Stark <stark@mit.edu> wrote:
> Fwiw while "wraparound" has pitfalls I think changing it for a new
> word isn't really helpful. Especially if it's a mostly meaningless
> word like "overload" or "exhaustion". It suddenly makes every existing
> doc hard to find and confusing to read.

Just to be clear, I am not proposing changing the name of
anti-wraparound autovacuum at all. What I'd like to do is use a term
like "XID exhaustion" to refer to the state that we internally refer
to as xidStopLimit. My motivation is simple: we've completely
terrified users by emphasizing wraparound, which is something that is
explicitly and prominently presented as a variety of data corruption.
The docs say this:

"But since transaction IDs have limited size (32 bits) a cluster that
runs for a long time (more than 4 billion transactions) would suffer
transaction ID wraparound: the XID counter wraps around to zero, and
all of a sudden transactions that were in the past appear to be in the
future — which means their output become invisible. In short,
catastrophic data loss."

> I say "exhaustion" or "overload" are meaningless because their meaning
> is entirely dependent on context. It's not like memory exhaustion or
> i/o overload where it's a finite resource and it's just the sheer
> amount in use that matters.

But transaction IDs are a finite resource, in the sense that you can
never have more than about 2.1 billion distinct unfrozen XIDs at any
one time. "Transaction ID exhaustion" is therefore a lot more
descriptive of the underlying problem. It's a lot better than
wraparound, which, as I've said, is inaccurate in two major ways:

1. Most cases involving xidStopLimit (or even single-user mode data
corruption) won't involve any kind of physical integer wraparound.

2. Most physical integer wraparound is harmless and perfectly routine.

But even this is fairly secondary to me. I don't actually think it's
that important that the name describe exactly what's going on here --
that's expecting rather a lot from a name. That's not really the goal.
The goal is to undo the damage of documentation that heavily implies
that data corruption is the eventual result of not doing enough
vacuuming, in its basic introductory remarks to freezing stuff.

Like Samay, my consistent experience (particularly back in my Heroku
days) has been that people imagine that data corruption would happen
when the system reached what we'd call xidStopLimit. Can you blame
them for thinking that? Almost any name for xidStopLimit that doesn't
have that historical baggage seems likely to be a vast improvement.

--
Peter Geoghegan



On Thu, May 11, 2023 at 1:40 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Just to be clear, I am not proposing changing the name of
> anti-wraparound autovacuum at all. What I'd like to do is use a term
> like "XID exhaustion" to refer to the state that we internally refer
> to as xidStopLimit. My motivation is simple: we've completely
> terrified users by emphasizing wraparound, which is something that is
> explicitly and prominently presented as a variety of data corruption.
> The docs say this:
>
> "But since transaction IDs have limited size (32 bits) a cluster that
> runs for a long time (more than 4 billion transactions) would suffer
> transaction ID wraparound: the XID counter wraps around to zero, and
> all of a sudden transactions that were in the past appear to be in the
> future — which means their output become invisible. In short,
> catastrophic data loss."

Notice that this says that "catastrophic data loss" occurs when "the
XID counter wraps around to zero". I think that this was how it worked
before the invention of freezing, over 20 years ago -- the last time
the system would allocate about 4 billion XIDs without doing any
freezing.

While it is still possible to corrupt the database in single user
mode, it has precisely nothing to do with the point that "the XID
counter wraps around to zero". I believe that this wording has done
not insignificant damage to the project's reputation. But let's assume
for a moment that there's only a tiny chance that I'm right about all
of this -- let's assume I'm probably just being alarmist about how
this has been received in the wider world. Even then: why take even a
small chance?

--
Peter Geoghegan



On Thu, May 4, 2023 at 3:18 PM samay sharma <smilingsamay@gmail.com> wrote:
> What do you think about the term "Exhaustion"? Maybe something like "XID allocation exhaustion" or "Exhaustion of
allocatableXIDs"? 

I use the term "transaction ID exhaustion" in the attached revision,
v4. Overall, v4 builds on the work that went into v2 and v3, by
continuing to polish the overhaul of everything related to freezing,
relfrozenxid advancement, and anti-wraparound autovacuum.

It would be nice if it was possible to add an animation/diagram a
little like this one: https://tuple-freezing-demo.angusd.com (this is
how I tend to think about the "transaction ID space".)

I feel that the patch that deals with freezing is really coming
together in v4. The main problem now is lack of detailed review --
though the freezing related patch is still not committable, it's
getting close now. (The changes to the docs covering freezing should
be committed separately from any further work on "25.2.1. Recovering
Disk Space". I still haven't done much there in v4, and those parts
clearly aren't anywhere near being committable. So, for now, they can
mostly be ignored.)

v4 also limits use of the term "wraparound" to places that directly
discuss anti-wraparound autovacuums (plus one place in xact.sgml,
where discussion of "true unsigned integer wraparound" and related
implementation details has been moved). Otherwise we use the term
"transaction ID exhaustion", which is pretty much the user-facing name
for "xidStopLimit". I feel that this is a huge improvement, for the
reason given to Greg earlier. I'm flexible on the details, but I feel
strongly that we should minimize use of the term wraparound wherever
it might have the connotation of "the past becoming the future". This
is not a case of inventing a new terminology for its own sake. If
anybody is skeptical I ask that they take a look at what I came up
with before declaring it a bad idea. I have made that as easy as
possible, by once again attaching a prebuilt routine-vacuuming.html.

I no longer believe that committing this patch series needs to block
on the patch that seeks to put things straight with single user mode
and xidStopLimit/transaction ID exhaustion (the one that John Naylor
is currently working on getting in shape), either (I'll explain my
reasoning if somebody wants to hear it).

Other changes in v4, compared to v3:

* Improved discussion of the differences between non-aggressive and
aggressive VACUUM.

Now mentions the issue of aggressive VACUUMs waiting for a cleanup
lock, including mention of the BufferPin wait event. This is the
second, minor difference between each kind of VACUUM. It matters much
less than the first difference, but it does merit a mention.

The discussion of aggressive VACUUM seems to be best approached by
starting with the mechanical differences, and only later going into
the consequences of those differences. (Particularly catch-up
freezing.)

* Explains "catch-up freezing" performed by aggressive VACUUMs directly.

"Catch-up" freezing is the really important "consequence" -- something
that emerges from how each type of VACUUM behaves over time. It is an
indirect consequence of the behaviors. I would like to counter the
perception that some users have about freezing only happening during
aggressive VACUUMs (or anti-wraparound autovacuums). But more than
that, talking about catch-up freezing seems essential because it is
the single most important difference.

* Much improved handling of the discussion of anti-wraparound
autovacuum, and how it relates to aggressive VACUUMs, following
feedback from Samay.

There is now only fairly minimal overlap in the discussion of
aggressive VACUUM and anti-wraparound autovacuuming. We finish the
discussion of aggressive VACUUM just after we start discussing
anti-wraparound autovacuum. This transition works well, because it
enforces the idea that anti-wraparound autovacuum isn't really special
compared to any other aggressive autovacuum. This was something that
Samay expressed particularly concern about: making anti-wraparound
autovacuums sound less scary. Though it's also a concern I had from
the outset, based on practical experience and interactions with people
that have much less knowledge of Postgres than I do.

* Anti-wraparound autovacuum is now mostly discussed as something that
happens to static or mostly-static tables.

This is related to the goal of making anti-wraparound autovacuums
sound less scary. Larger tables don't necessarily require any
anti-wraparound autovacuums these days -- we have the insert-driven
autovacuum trigger condition these days, so it's plausible (perhaps
even likely) that all aggressive VACUUMs against the largest
append-only tables can happen when autovacuum triggers VACUUMs to
processed recently inserted tuples.

This moves discussion of anti-wraparound av in the direction of:
"Anti-wraparound autovacuum is a special type of autovacuum. Its
purpose is to ensure that relfrozenxid advances when no earlier VACUUM
could advance it in passing — often because no VACUUM has run against
the table for an extended period."

* Added a couple of "Tips" about instrumentation that appears in the
server log whenever autovacuum reports on a VACUUM operation.

* Much improved "Truncating Transaction Status Information" subsection.

My explanation of the ways in which autovacuum_freeze_max_age can
affect the storage overhead of commit/abort status in pg_xact is much
clearer than it was in v3 -- pg_xact truncation is now treated as
something loosely related to the global config of anti-wraparound
autovacuum, which makes most sense.

It took a great deal of effort to find a structure that covered
everything, and that highlighted all of the important relationships
without going too far, while at the same time not being a huge mess.
That's what I feel I've arrived at with v4.

--
Peter Geoghegan

Attachment
Thanks for the continued work, Peter. I hate to be the guy that starts this way,
but this is my first ever response on pgsql-hackers. (insert awkward
smile face).
Hopefully I've followed etiquette well, but please forgive any
missteps, and I'm
happy for any help in making better contributions in the future.

On Thu, May 11, 2023 at 9:19 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, May 4, 2023 at 3:18 PM samay sharma <smilingsamay@gmail.com> wrote:
> > What do you think about the term "Exhaustion"? Maybe something like "XID allocation exhaustion" or "Exhaustion of
allocatableXIDs"? 
>
> I use the term "transaction ID exhaustion" in the attached revision,
> v4. Overall, v4 builds on the work that went into v2 and v3, by
> continuing to polish the overhaul of everything related to freezing,
> relfrozenxid advancement, and anti-wraparound autovacuum.

Just to say on the outset, as has been said earlier in the tread by others,
that this is herculean work. Thank you for putting the effort you have thus far.
There's a lot of good from where I sit in the modification efforts.
It's a heavy,
dense topic, so there's probably never going to be a perfect way to
get it all in,
but some of the context early on, especially, is helpful for framing.

>
> It would be nice if it was possible to add an animation/diagram a
> little like this one: https://tuple-freezing-demo.angusd.com (this is
> how I tend to think about the "transaction ID space".)

Indeed. With volunteer docs, illustrations/diagrams are hard for sure. But,
this or something akin to the "clock" image I've seen elsewhere when
describing the transaction ID space would probably be helpful if it were ever
possible. In fact, there's just a lot about the MVCC stuff in general that
would benefit from diagrams. But alas, I guess that's why we have some
good go-to community talks/slide decks. :-)

> v4 also limits use of the term "wraparound" to places that directly
> discuss anti-wraparound autovacuums (plus one place in xact.sgml,
> where discussion of "true unsigned integer wraparound" and related
> implementation details has been moved). Otherwise we use the term
> "transaction ID exhaustion", which is pretty much the user-facing name
> for "xidStopLimit". I feel that this is a huge improvement, for the
> reason given to Greg earlier. I'm flexible on the details, but I feel
> strongly that we should minimize use of the term wraparound wherever
> it might have the connotation of "the past becoming the future". This
> is not a case of inventing a new terminology for its own sake. If
> anybody is skeptical I ask that they take a look at what I came up
> with before declaring it a bad idea. I have made that as easy as
> possible, by once again attaching a prebuilt routine-vacuuming.html.

Thanks again for doing this. Really helpful for doc newbies like me that
want to help but are still working through the process. Really helpful
and appreciated.

>
>
> Other changes in v4, compared to v3:
>
> * Improved discussion of the differences between non-aggressive and
> aggressive VACUUM.

This was helpful for me and not something I've previously put much thought
into. Helpful context that is missing from the current docs.

> * Explains "catch-up freezing" performed by aggressive VACUUMs directly.
>
> "Catch-up" freezing is the really important "consequence" -- something
> that emerges from how each type of VACUUM behaves over time. It is an
> indirect consequence of the behaviors. I would like to counter the
> perception that some users have about freezing only happening during
> aggressive VACUUMs (or anti-wraparound autovacuums). But more than
> that, talking about catch-up freezing seems essential because it is
> the single most important difference.
>

Similarly, this was helpful overall context of various things
happening with freezing.

> * Much improved handling of the discussion of anti-wraparound
> autovacuum, and how it relates to aggressive VACUUMs, following
> feedback from Samay.
>
> There is now only fairly minimal overlap in the discussion of
> aggressive VACUUM and anti-wraparound autovacuuming. We finish the
> discussion of aggressive VACUUM just after we start discussing
> anti-wraparound autovacuum. This transition works well, because it
> enforces the idea that anti-wraparound autovacuum isn't really special
> compared to any other aggressive autovacuum. This was something that
> Samay expressed particularly concern about: making anti-wraparound
> autovacuums sound less scary. Though it's also a concern I had from
> the outset, based on practical experience and interactions with people
> that have much less knowledge of Postgres than I do.

Agree. This flows fairly well and helps the user understand that each
"next step"
in the vacuum/freezing process has a distinct job based on previous work.

>
> * Anti-wraparound autovacuum is now mostly discussed as something that
> happens to static or mostly-static tables....
> ...This moves discussion of anti-wraparound av in the direction of:
> "Anti-wraparound autovacuum is a special type of autovacuum. Its
> purpose is to ensure that relfrozenxid advances when no earlier VACUUM
> could advance it in passing — often because no VACUUM has run against
> the table for an extended period."
>

Again, learned something new here, at least in how I think about it and talk
with others. In total, I do think these changes make wraparound/exhaustion
seem less "the sky is falling".

> * Added a couple of "Tips" about instrumentation that appears in the
> server log whenever autovacuum reports on a VACUUM operation.
>
> * Much improved "Truncating Transaction Status Information" subsection.
>
> My explanation of the ways in which autovacuum_freeze_max_age can
> affect the storage overhead of commit/abort status in pg_xact is much
> clearer than it was in v3 -- pg_xact truncation is now treated as
> something loosely related to the global config of anti-wraparound
> autovacuum, which makes most sense.
>
This one isn't totally sinking in with me yet. Need another read.

> It took a great deal of effort to find a structure that covered
> everything, and that highlighted all of the important relationships
> without going too far, while at the same time not being a huge mess.
> That's what I feel I've arrived at with v4.

In most respects I agree with the overall flow of changes w.r.t the current doc.
Focusing on all of this as something that should normally just be happening
as part of autovacuum is helpful. Working through it as an order of operations
(and I'm just assuming this is the general order) feels like it ties
things together
a lot more. I honestly come away from this document with more of a "I understand
the process" feel than I did previously.

For now, I'd add the following few comments on the intro section,
2.5.1 and 2.5.2. I
haven't gotten to the bottom sections yet for much feedback.

Intro Comments:
1) "The autovacuum daemon automatically schedules maintenance tasks based on
workload requirements." feels at tension with "Autovacuum scheduling
is controlled
via threshold settings."

Owing to the lingering belief that many users have whereby hosting providers
have magically enabled Postgres to do all of this for you, there is
still a need to
actively deal with these thresholds based on load. That is, as far as
I understand,
Postgres doesn't automatically adjust based on load. Someone/thing
still has to modify
the thresholds as load and data size changes.

If the "workload requirements" is pointing towards aggressive
freezing/wraparound
tasks that happen regardless of thresholds, then for me at least that
isn't clear
in that sentence and it feels like there's an implication that
Postgres/autovacuum
is going to magically adjust overall vacuum work based on database workload.

2) "The intended audience is database administrators that wish to
perform more advanced
 autovacuum tuning, with any of the following goals in mind:"

I love calling out the audience in some way. That's really helpful, as are the
stated goals in the bullet list. However, as someone feeling pretty novice
after reading all of this, I can't honestly connect how the content on this page
helps me to more advanced tuning. I have a much better idea how freezing,
in particular, works (yay!), but I'm feeling a bit dense how almost anything
here helps me tune vacuum, at least as it relates to the bullets.

I'm sure you have a connection in mind for each, and certainly understanding the
inner workings of what's happening under the covers is tremendously beneficial,
but when I search for "response" or "performance" in this document, it refers
back to another page (not included in this patch) that talks about the
thresholds.

It might be as simple as adding something to the end of each bullet to draw
that relationship, but as is, it's hard for me to do it mentally (although I can
conjecture a few things on my own)

That said, I definitely appreciate the callout that tuning is an
iterative process
and the minor switch from "creates a substantial amount of I/O
traffic" to "may create...".

** Section 2.5.1 - Recovering Disk Space **

3). "The space dead tuples occupy must eventually be reclaimed for reuse
by new rows, to avoid unbounded growth of disk space requirements. Reclaiming
space from dead rows is VACUUM's main responsibility."

It feels like one connection you could make to the bullet list above
is in this area
and not mentioned. By freeing up space and reducing the number of pages that
need to be read for satisfying a query, vacuum and recovering disk space
(theoretically) improves query performance. Not 100% how to add it in context
of these first two paragraphs.

4) Caution: "It may be a good idea to add monitoring to alert you about this."
I hate to be pedantic about it, but I think we should spell out
"this". Do we have
a pointer in documentation to what kinds of things to monitor for? Am monitoring
long-running transactions or some metric that shows me that VACUUM is being
"held back"? I know what you mean, but it's not clear to me how to do the right
thing in my environment here.

5) The plethora of tips/notes/warnings.
As you and others have mentioned, as presented these really have no context
for me. Individually they are good/helpful information, but it's
really hard to make
a connection to what I should "do" about it.

It seems to me that this would be a good place to put a subsection which is
something like, "A note about reclaiming disk space" or something. In my
experience, most people hear about and end up using VACUUM FULL because
things got out of control and they want to get into a better spot (I have been
in that boat). I think with a small section that says, in essence,
"hey, now that
you understand why/how vacuum reclaims disk resources normally, if you're
in a position where things aren't in a good state, this is what you need to know
if you want to reclaim space from a really inefficient table"

For me, at least, I think it would be easier to read/grok what you're sharing in
these callouts.

6) One last missing piece that very well might be in another page not referenced
(I obviously need to get the PG16 docs pulled and built locally so
that I can have
better overall reference. My apologies).

In my experience, one of the biggest issues with the thresholds and recovering
space is the idea of tuning individual tables, not just the entire
database. 5/10/20%
might be fine for most tables, but it's usually the really active ones
that need the
tuning, specifically lowering the thresholds. That doesn't come across to me in
this section at all. Again, maybe I've missed something on another page and
it's all good, but it felt worth calling out.

Plus, it may provide an opportunity to bring in the threshold formulas again if
they aren't referenced elsewhere (although they probably are).

Hope that makes sense.

** Section 2.5.2: Freezing to manage... **
As stated above, the effort here overall is great IMO. I like the flow
and reduction
in alarmist tone for things like wraparound, etc. I understand more
about freezing,
aggressive and otherwise, than I did before.

7) That said, totally speaking as a non-contributor, this section is
obviously very long
for good reason. But, by the time I've gotten down to 25.2.2.3, my
brain is a bit
bewildered on where we've gotten to. That's more a comment on my capability
to process it all, but I wonder if a slightly more explicit intro
could help set the
stage at least.

"One side-effect of vacuum and transaction ID management at the row level is
that PostgreSQL would normally need to inspect each row for every query to
ensure it is visible to each requesting transaction. In order to
reduce the need to
read and inspect excessive amounts of data at query time or when normal vacuum
maintenance kicks in, VACUUM has a second job called freezing, which
accomplishes three goals: (attempting to tie in the three sections)
 * speeding up queries and vacuum operations by...
 * advancing the transaction ID space on generally static tables...
 * ensure there are always free transaction IDs available for normal
operation...
"

Maybe totally worthless and too much, but something like that might set a reader
up for just a bit more context. Then you could take most of what comes before
"2.5.2.2.1 Aggressive Vacuum" as a subsection (would require a renumber below)
with something like "2.5.2.2.1 Normal Freezing Activity"

8) Note "In PostgreSQL versions before 16..."
Showing my naivety, somehow this isn't connecting with me totally. If
it's important
to call out, then maybe we need a connecting sentence. Based on the content
above, I think you're pointing to "It's also why VACUUM will freeze all eligible
tuples from a heap page once the decision to freeze at least one tuple
is taken:"
If that's it, it's just not clear to me what's totally changed. Sorry,
more learning. :-)

---
Hope something in there is helpful.

Ryan Booz

>
> --
> Peter Geoghegan



And, of course, I forgot that I switch to text-mode after writing most
of this, so the carriage returns were unnecessary. (facepalm... sigh)

--
Ryan

On Fri, May 12, 2023 at 1:36 PM Ryan Booz <ryan@softwareandbooz.com> wrote:
>
> Thanks for the continued work, Peter. I hate to be the guy that starts this way,
> but this is my first ever response on pgsql-hackers. (insert awkward
> smile face).
> Hopefully I've followed etiquette well, but please forgive any
> missteps, and I'm
> happy for any help in making better contributions in the future.
>
> On Thu, May 11, 2023 at 9:19 PM Peter Geoghegan <pg@bowt.ie> wrote:
> >
> > On Thu, May 4, 2023 at 3:18 PM samay sharma <smilingsamay@gmail.com> wrote:
> > > What do you think about the term "Exhaustion"? Maybe something like "XID allocation exhaustion" or "Exhaustion of
allocatableXIDs"? 
> >
> > I use the term "transaction ID exhaustion" in the attached revision,
> > v4. Overall, v4 builds on the work that went into v2 and v3, by
> > continuing to polish the overhaul of everything related to freezing,
> > relfrozenxid advancement, and anti-wraparound autovacuum.
>
> Just to say on the outset, as has been said earlier in the tread by others,
> that this is herculean work. Thank you for putting the effort you have thus far.
> There's a lot of good from where I sit in the modification efforts.
> It's a heavy,
> dense topic, so there's probably never going to be a perfect way to
> get it all in,
> but some of the context early on, especially, is helpful for framing.
>
> >
> > It would be nice if it was possible to add an animation/diagram a
> > little like this one: https://tuple-freezing-demo.angusd.com (this is
> > how I tend to think about the "transaction ID space".)
>
> Indeed. With volunteer docs, illustrations/diagrams are hard for sure. But,
> this or something akin to the "clock" image I've seen elsewhere when
> describing the transaction ID space would probably be helpful if it were ever
> possible. In fact, there's just a lot about the MVCC stuff in general that
> would benefit from diagrams. But alas, I guess that's why we have some
> good go-to community talks/slide decks. :-)
>
> > v4 also limits use of the term "wraparound" to places that directly
> > discuss anti-wraparound autovacuums (plus one place in xact.sgml,
> > where discussion of "true unsigned integer wraparound" and related
> > implementation details has been moved). Otherwise we use the term
> > "transaction ID exhaustion", which is pretty much the user-facing name
> > for "xidStopLimit". I feel that this is a huge improvement, for the
> > reason given to Greg earlier. I'm flexible on the details, but I feel
> > strongly that we should minimize use of the term wraparound wherever
> > it might have the connotation of "the past becoming the future". This
> > is not a case of inventing a new terminology for its own sake. If
> > anybody is skeptical I ask that they take a look at what I came up
> > with before declaring it a bad idea. I have made that as easy as
> > possible, by once again attaching a prebuilt routine-vacuuming.html.
>
> Thanks again for doing this. Really helpful for doc newbies like me that
> want to help but are still working through the process. Really helpful
> and appreciated.
>
> >
> >
> > Other changes in v4, compared to v3:
> >
> > * Improved discussion of the differences between non-aggressive and
> > aggressive VACUUM.
>
> This was helpful for me and not something I've previously put much thought
> into. Helpful context that is missing from the current docs.
>
> > * Explains "catch-up freezing" performed by aggressive VACUUMs directly.
> >
> > "Catch-up" freezing is the really important "consequence" -- something
> > that emerges from how each type of VACUUM behaves over time. It is an
> > indirect consequence of the behaviors. I would like to counter the
> > perception that some users have about freezing only happening during
> > aggressive VACUUMs (or anti-wraparound autovacuums). But more than
> > that, talking about catch-up freezing seems essential because it is
> > the single most important difference.
> >
>
> Similarly, this was helpful overall context of various things
> happening with freezing.
>
> > * Much improved handling of the discussion of anti-wraparound
> > autovacuum, and how it relates to aggressive VACUUMs, following
> > feedback from Samay.
> >
> > There is now only fairly minimal overlap in the discussion of
> > aggressive VACUUM and anti-wraparound autovacuuming. We finish the
> > discussion of aggressive VACUUM just after we start discussing
> > anti-wraparound autovacuum. This transition works well, because it
> > enforces the idea that anti-wraparound autovacuum isn't really special
> > compared to any other aggressive autovacuum. This was something that
> > Samay expressed particularly concern about: making anti-wraparound
> > autovacuums sound less scary. Though it's also a concern I had from
> > the outset, based on practical experience and interactions with people
> > that have much less knowledge of Postgres than I do.
>
> Agree. This flows fairly well and helps the user understand that each
> "next step"
> in the vacuum/freezing process has a distinct job based on previous work.
>
> >
> > * Anti-wraparound autovacuum is now mostly discussed as something that
> > happens to static or mostly-static tables....
> > ...This moves discussion of anti-wraparound av in the direction of:
> > "Anti-wraparound autovacuum is a special type of autovacuum. Its
> > purpose is to ensure that relfrozenxid advances when no earlier VACUUM
> > could advance it in passing — often because no VACUUM has run against
> > the table for an extended period."
> >
>
> Again, learned something new here, at least in how I think about it and talk
> with others. In total, I do think these changes make wraparound/exhaustion
> seem less "the sky is falling".
>
> > * Added a couple of "Tips" about instrumentation that appears in the
> > server log whenever autovacuum reports on a VACUUM operation.
> >
> > * Much improved "Truncating Transaction Status Information" subsection.
> >
> > My explanation of the ways in which autovacuum_freeze_max_age can
> > affect the storage overhead of commit/abort status in pg_xact is much
> > clearer than it was in v3 -- pg_xact truncation is now treated as
> > something loosely related to the global config of anti-wraparound
> > autovacuum, which makes most sense.
> >
> This one isn't totally sinking in with me yet. Need another read.
>
> > It took a great deal of effort to find a structure that covered
> > everything, and that highlighted all of the important relationships
> > without going too far, while at the same time not being a huge mess.
> > That's what I feel I've arrived at with v4.
>
> In most respects I agree with the overall flow of changes w.r.t the current doc.
> Focusing on all of this as something that should normally just be happening
> as part of autovacuum is helpful. Working through it as an order of operations
> (and I'm just assuming this is the general order) feels like it ties
> things together
> a lot more. I honestly come away from this document with more of a "I understand
> the process" feel than I did previously.
>
> For now, I'd add the following few comments on the intro section,
> 2.5.1 and 2.5.2. I
> haven't gotten to the bottom sections yet for much feedback.
>
> Intro Comments:
> 1) "The autovacuum daemon automatically schedules maintenance tasks based on
> workload requirements." feels at tension with "Autovacuum scheduling
> is controlled
> via threshold settings."
>
> Owing to the lingering belief that many users have whereby hosting providers
> have magically enabled Postgres to do all of this for you, there is
> still a need to
> actively deal with these thresholds based on load. That is, as far as
> I understand,
> Postgres doesn't automatically adjust based on load. Someone/thing
> still has to modify
> the thresholds as load and data size changes.
>
> If the "workload requirements" is pointing towards aggressive
> freezing/wraparound
> tasks that happen regardless of thresholds, then for me at least that
> isn't clear
> in that sentence and it feels like there's an implication that
> Postgres/autovacuum
> is going to magically adjust overall vacuum work based on database workload.
>
> 2) "The intended audience is database administrators that wish to
> perform more advanced
>  autovacuum tuning, with any of the following goals in mind:"
>
> I love calling out the audience in some way. That's really helpful, as are the
> stated goals in the bullet list. However, as someone feeling pretty novice
> after reading all of this, I can't honestly connect how the content on this page
> helps me to more advanced tuning. I have a much better idea how freezing,
> in particular, works (yay!), but I'm feeling a bit dense how almost anything
> here helps me tune vacuum, at least as it relates to the bullets.
>
> I'm sure you have a connection in mind for each, and certainly understanding the
> inner workings of what's happening under the covers is tremendously beneficial,
> but when I search for "response" or "performance" in this document, it refers
> back to another page (not included in this patch) that talks about the
> thresholds.
>
> It might be as simple as adding something to the end of each bullet to draw
> that relationship, but as is, it's hard for me to do it mentally (although I can
> conjecture a few things on my own)
>
> That said, I definitely appreciate the callout that tuning is an
> iterative process
> and the minor switch from "creates a substantial amount of I/O
> traffic" to "may create...".
>
> ** Section 2.5.1 - Recovering Disk Space **
>
> 3). "The space dead tuples occupy must eventually be reclaimed for reuse
> by new rows, to avoid unbounded growth of disk space requirements. Reclaiming
> space from dead rows is VACUUM's main responsibility."
>
> It feels like one connection you could make to the bullet list above
> is in this area
> and not mentioned. By freeing up space and reducing the number of pages that
> need to be read for satisfying a query, vacuum and recovering disk space
> (theoretically) improves query performance. Not 100% how to add it in context
> of these first two paragraphs.
>
> 4) Caution: "It may be a good idea to add monitoring to alert you about this."
> I hate to be pedantic about it, but I think we should spell out
> "this". Do we have
> a pointer in documentation to what kinds of things to monitor for? Am monitoring
> long-running transactions or some metric that shows me that VACUUM is being
> "held back"? I know what you mean, but it's not clear to me how to do the right
> thing in my environment here.
>
> 5) The plethora of tips/notes/warnings.
> As you and others have mentioned, as presented these really have no context
> for me. Individually they are good/helpful information, but it's
> really hard to make
> a connection to what I should "do" about it.
>
> It seems to me that this would be a good place to put a subsection which is
> something like, "A note about reclaiming disk space" or something. In my
> experience, most people hear about and end up using VACUUM FULL because
> things got out of control and they want to get into a better spot (I have been
> in that boat). I think with a small section that says, in essence,
> "hey, now that
> you understand why/how vacuum reclaims disk resources normally, if you're
> in a position where things aren't in a good state, this is what you need to know
> if you want to reclaim space from a really inefficient table"
>
> For me, at least, I think it would be easier to read/grok what you're sharing in
> these callouts.
>
> 6) One last missing piece that very well might be in another page not referenced
> (I obviously need to get the PG16 docs pulled and built locally so
> that I can have
> better overall reference. My apologies).
>
> In my experience, one of the biggest issues with the thresholds and recovering
> space is the idea of tuning individual tables, not just the entire
> database. 5/10/20%
> might be fine for most tables, but it's usually the really active ones
> that need the
> tuning, specifically lowering the thresholds. That doesn't come across to me in
> this section at all. Again, maybe I've missed something on another page and
> it's all good, but it felt worth calling out.
>
> Plus, it may provide an opportunity to bring in the threshold formulas again if
> they aren't referenced elsewhere (although they probably are).
>
> Hope that makes sense.
>
> ** Section 2.5.2: Freezing to manage... **
> As stated above, the effort here overall is great IMO. I like the flow
> and reduction
> in alarmist tone for things like wraparound, etc. I understand more
> about freezing,
> aggressive and otherwise, than I did before.
>
> 7) That said, totally speaking as a non-contributor, this section is
> obviously very long
> for good reason. But, by the time I've gotten down to 25.2.2.3, my
> brain is a bit
> bewildered on where we've gotten to. That's more a comment on my capability
> to process it all, but I wonder if a slightly more explicit intro
> could help set the
> stage at least.
>
> "One side-effect of vacuum and transaction ID management at the row level is
> that PostgreSQL would normally need to inspect each row for every query to
> ensure it is visible to each requesting transaction. In order to
> reduce the need to
> read and inspect excessive amounts of data at query time or when normal vacuum
> maintenance kicks in, VACUUM has a second job called freezing, which
> accomplishes three goals: (attempting to tie in the three sections)
>  * speeding up queries and vacuum operations by...
>  * advancing the transaction ID space on generally static tables...
>  * ensure there are always free transaction IDs available for normal
> operation...
> "
>
> Maybe totally worthless and too much, but something like that might set a reader
> up for just a bit more context. Then you could take most of what comes before
> "2.5.2.2.1 Aggressive Vacuum" as a subsection (would require a renumber below)
> with something like "2.5.2.2.1 Normal Freezing Activity"
>
> 8) Note "In PostgreSQL versions before 16..."
> Showing my naivety, somehow this isn't connecting with me totally. If
> it's important
> to call out, then maybe we need a connecting sentence. Based on the content
> above, I think you're pointing to "It's also why VACUUM will freeze all eligible
> tuples from a heap page once the decision to freeze at least one tuple
> is taken:"
> If that's it, it's just not clear to me what's totally changed. Sorry,
> more learning. :-)
>
> ---
> Hope something in there is helpful.
>
> Ryan Booz
>
> >
> > --
> > Peter Geoghegan



On Fri, May 12, 2023 at 10:36 AM Ryan Booz <ryan@softwareandbooz.com> wrote:
> Just to say on the outset, as has been said earlier in the tread by others,
> that this is herculean work. Thank you for putting the effort you have thus far.

Thanks!

> > It would be nice if it was possible to add an animation/diagram a
> > little like this one: https://tuple-freezing-demo.angusd.com (this is
> > how I tend to think about the "transaction ID space".)
>
> Indeed. With volunteer docs, illustrations/diagrams are hard for sure. But,
> this or something akin to the "clock" image I've seen elsewhere when
> describing the transaction ID space would probably be helpful if it were ever
> possible. In fact, there's just a lot about the MVCC stuff in general that
> would benefit from diagrams. But alas, I guess that's why we have some
> good go-to community talks/slide decks. :-)

A picture is worth a thousand words. This particular image may be
worth even more, though.

It happens to be *exactly* what I'd have done if I was tasked with
coming up with an animation that conveys the central ideas. Obviously
I brought this image up because I think that it would be great if we
could find a way to do something like that directly (not impossible,
there are a few images already). However, there is a less obvious
reason why I brought it to your attention: it's a very intuitive way
of understanding what I actually intend to convey through words -- at
least as far as talk about the cluster-wide XID space is concerned. It
might better equip you to review the patch series.

Sure, the animation will make the general idea clearer to just about
anybody -- that's a big part of what I like about it. But it also
captures the nuance that might matter to experts (e.g., the oldest XID
moves forward in jerky discrete jumps, while the next/unallocated XID
moves forward in a smooth, continuous fashion). So it works on
multiple levels, for multiple audiences/experience levels, without any
conflicts -- which is no small thing.

Do my words make you think of something a little like the animation?
If so, good.

> Thanks again for doing this. Really helpful for doc newbies like me that
> want to help but are still working through the process. Really helpful
> and appreciated.

I think that this is the kind of thing that particularly benefits from
diversity in perspectives.

> Agree. This flows fairly well and helps the user understand that each
> "next step"
> in the vacuum/freezing process has a distinct job based on previous work.

I'm trying to make it possible to read in short bursts, and to skim.
The easiest wins in this area will come from simply having more
individual sections/headings, and a more consistent structure. The
really difficult part is coming up with prose that can sort of work
for all audiences at the same time -- without alienating anybody.

Here is an example of what I mean:

The general idea of freezing can reasonably be summarized as "a
process that VACUUM uses to make pages self-contained (no need to do
pg_xact lookups anymore), that also has a role in avoiding transaction
ID exhaustion". That is a totally reasonable beginner-level (well,
relative-beginner-level) understanding of freezing. It *isn't* dumbed
down. You, as a beginner, have a truly useful take-away. At the same
time, you have avoided learning anything that you'll need to unlearn
some day. If I can succeed in doing that, I'll feel a real sense of
accomplishment.

> > * Much improved "Truncating Transaction Status Information" subsection.
> >
> > My explanation of the ways in which autovacuum_freeze_max_age can
> > affect the storage overhead of commit/abort status in pg_xact is much
> > clearer than it was in v3 -- pg_xact truncation is now treated as
> > something loosely related to the global config of anti-wraparound
> > autovacuum, which makes most sense.
> >
> This one isn't totally sinking in with me yet. Need another read.

"Truncating Transaction Status Information" is explicitly supposed to
matter much less than the rest of the stuff on freezing. The main
benefit that the DBA can expect from understanding this content is how
to save a few GB of disk space for pg_xact, which isn't particularly
likely to be possible, and is very very unlikely to be of any real
consequence, compared to everything else. If you were reading the
revised "Routine Vacuuming" as the average DBA, what you'd probably
have ended up doing is just not reading this part at all. And that
would probably be the ideal outcome. It's roughly the opposite of what
you'll get right now, by the way (bizarrely, the current docs place a
great deal of emphasis on this).

(Of course I welcome your feedback here too. Just giving you the context.)

> > It took a great deal of effort to find a structure that covered
> > everything, and that highlighted all of the important relationships
> > without going too far, while at the same time not being a huge mess.
> > That's what I feel I've arrived at with v4.
>
> In most respects I agree with the overall flow of changes w.r.t the current doc.
> Focusing on all of this as something that should normally just be happening
> as part of autovacuum is helpful. Working through it as an order of operations
> (and I'm just assuming this is the general order) feels like it ties
> things together
> a lot more. I honestly come away from this document with more of a "I understand
> the process" feel than I did previously.

That's great news. It might be helpful to give you more context about
the particular approach I've taken here, and how it falls short of
what I'd ideally like to do, in my own mind.

There are some rather strange things that happen to be true about
VACUUM and freezing today, that definitely influenced the way I
structured the docs. I can imagine an improved version of VACUUM that
is not so different to the real VACUUM that we have today (one that
still has freezing as we know it), that still has a much simpler UI --
some policy-based process for deciding which pages to freeze that was
much smarter than a simple trigger. If we were living in a world where
VACUUM actually worked like that, then I'd have been able to come up
with a structure that is a lot closer to what you might have been
hoping for from this patch series. At the very least, I'd have been
able to add some "TL;DR" text at the start of each section, that just
gave the main practical takeaway.

Take vacuum_freeze_min_age. It's a *really* bad design, even on its
own terms, even if we assume that nothing can change about how
freezing works. Yet it's probably still the most important
freezing-related GUC, even in Postgres 16. History matters here. The
GUC was invented in a world before the visibility map existed. When
the visibility map was invented, aggressive VACUUM was also invented
(before then the name for what we now call "aggressive VACUUM" was
actually just "VACUUM"). This development utterly changed the way that
vacuum_freeze_min_age actually works, but we still talk about it as if
its idea of "age" can be considered in isolation, as a universal
catch-all that can be tuned iteratively. The reality is that it is
interpreted in a way that is *hopelessly* tied to other things.

This isn't a minor point. There are really bizarre implications, with
real practical consequences. For example, suppose you want to make
autovacuums run more often against a simple append-only table -- so
you lower autovacuum_vacuum_insert_scale_factor with that in mind.
It's entirely possible that you'll now do *less* useful work, even
though you specifically set out to vacuum more aggressively! This is
due to the way the GUCs interact with each other, of course: the more
often VACUUM runs, the less likely it is that it'll find XIDs before
vacuum_freeze_min_age to trigger freezing during any individual VACUUM
operation, the less useful work you'll do (you'll just accumulate
unfrozen all-visible pages until you finally have an aggressive
VACUUM).

This is exactly as illogical as it sounds. Postgres 16 will be the
first version that even shows instrumentation around freezing at all
in the log reports from autovacuum. This will be a real eye-opener, I
suspect -- I predict that people will be surprised at how freezing
works with their workload, when they finally have the opportunity to
see it for themselves.

> Owing to the lingering belief that many users have whereby hosting providers
> have magically enabled Postgres to do all of this for you, there is
> still a need to
> actively deal with these thresholds based on load. That is, as far as
> I understand,
> Postgres doesn't automatically adjust based on load. Someone/thing
> still has to modify
> the thresholds as load and data size changes.

Well, vacuum_freeze_min_age (anything based on XID age) runs into the
following problem: what is the relationship between XID age, and
freezing work? This is a question whose answer is much too
complicated, suggesting that it's just the wrong question. There is
absolutely no reason to expect a linear relationship (or anything like
it) between XIDs consumed and WAL required to freeze rows from those
XIDs. It's a totally chaotic thing.

The reason for this is: of course it is, why wouldn't it be? On Monday
you'll do a bulk load, and 1 XID will write 1TB to one table. On
Tuesday, there might be only one row per XID consumed, with millions
and millions of rows inserted. This is 100% common sense, and yet is
kinda at odds with the whole idea of basing the decision to freeze on
age (as if vacuum_freeze_min_age didn't have enough problems
already!).

For now, I think our best bet is to signal the importance of avoiding
disaster to intermediate users, and signal the importance of iterative
tuning to advanced users.

> If the "workload requirements" is pointing towards aggressive
> freezing/wraparound
> tasks that happen regardless of thresholds, then for me at least that
> isn't clear
> in that sentence and it feels like there's an implication that
> Postgres/autovacuum
> is going to magically adjust overall vacuum work based on database workload.

That's a good point.

> 2) "The intended audience is database administrators that wish to
> perform more advanced
>  autovacuum tuning, with any of the following goals in mind:"
>
> I love calling out the audience in some way. That's really helpful, as are the
> stated goals in the bullet list. However, as someone feeling pretty novice
> after reading all of this, I can't honestly connect how the content on this page
> helps me to more advanced tuning.

You're right to point that out; the actual content here was written
half-heartedly, in part because it depends on the dead-tuple-space
patch, which is not my focus at all right now.

Here is what I'd like the message to be, roughly:

1. This isn't something that you read once. You read it in small
bites. You come back to it from time to time (or you will if you need
to).

At one point Samay said: "I'll give my perspective as someone who has
not read the vacuum code but have learnt most of what I know about
autovacuum / vacuuming by reading the "Routine Vacuuming" page 10s of
times". I fully expect that a minority of users will want to do the
same with these revised docs. The content is very much not supposed to
be read through in one sitting (not if you expect to get any value out
of it). It is very calorie dense, and I don't think that that's really
a problem to be solved.

You have a much better chance of getting value out of it if you as a
user refer back to it as problems emerge. Some things may only click
after the second or third read, based on the experience of trying to
put something else into action in production.

2.  If you don't like that it's calorie dense, then that's probably
okay -- just don't read past the parts that seem useful.

3. There are one or two exceptions (e.g., the "Tip" about freezing for
append-only tables), but overall there isn't going to be a simple
formula to follow -- the closest thing might be "don't bother doing
anything until it proves necessary".

This is because too much depends on individual workload requirements.
It is also partly due to it just being really hard to tune things like
vacuum_freeze_min_age very well right now.

4. It's an applied process. The emphasis should be on solving
practical, observed problems that are directly observed -- this isn't
a cookbook (though there are a couple of straightforward recipes,
covering one or two specific things).

> ** Section 2.5.1 - Recovering Disk Space **

It should be noted that what I've done in this area is quite
incomplete. I have only really done structural things here, and some
of these may not be much good.

> It feels like one connection you could make to the bullet list above
> is in this area
> and not mentioned. By freeing up space and reducing the number of pages that
> need to be read for satisfying a query, vacuum and recovering disk space
> (theoretically) improves query performance. Not 100% how to add it in context
> of these first two paragraphs.

It's hard, because it's not so much that vacuuming improves query
performance. It's more like *not* vacuuming hurts it. The exact point
that it starts to hurt is rather hard to predict -- and there might be
little point in trying to predict it with precision.

I tend to think that I'd probably be better off saying nothing about
query response times. Or saying something negative (what to definitely
avoid), not something positive (what to do) -- I would expect it to
generalize a lot better that way.

> 4) Caution: "It may be a good idea to add monitoring to alert you about this."
> I hate to be pedantic about it, but I think we should spell out
> "this". Do we have
> a pointer in documentation to what kinds of things to monitor for? Am monitoring
> long-running transactions or some metric that shows me that VACUUM is being
> "held back"? I know what you mean, but it's not clear to me how to do the right
> thing in my environment here.

Will do.

> 5) The plethora of tips/notes/warnings.
> As you and others have mentioned, as presented these really have no context
> for me. Individually they are good/helpful information, but it's
> really hard to make
> a connection to what I should "do" about it.

Yeah, I call that out in the relevant commit message of the patch as
bad, as temporary.

> It seems to me that this would be a good place to put a subsection which is
> something like, "A note about reclaiming disk space" or something. In my
> experience, most people hear about and end up using VACUUM FULL because
> things got out of control and they want to get into a better spot (I have been
> in that boat). I think with a small section that says, in essence,
> "hey, now that
> you understand why/how vacuum reclaims disk resources normally, if you're
> in a position where things aren't in a good state, this is what you need to know
> if you want to reclaim space from a really inefficient table"
>
> For me, at least, I think it would be easier to read/grok what you're sharing in
> these callouts.

That's the kind of thing that I had planned on with VACUUM FULL,
actually. You know, once I'm done with freezing. There is passing
mention of this in the relevant commit message.

> In my experience, one of the biggest issues with the thresholds and recovering
> space is the idea of tuning individual tables, not just the entire
> database. 5/10/20%
> might be fine for most tables, but it's usually the really active ones
> that need the
> tuning, specifically lowering the thresholds. That doesn't come across to me in
> this section at all. Again, maybe I've missed something on another page and
> it's all good, but it felt worth calling out.

I think that that's true. The rules are kind of different for larger tables.

> read and inspect excessive amounts of data at query time or when normal vacuum
> maintenance kicks in, VACUUM has a second job called freezing, which
> accomplishes three goals: (attempting to tie in the three sections)
>  * speeding up queries and vacuum operations by...
>  * advancing the transaction ID space on generally static tables...
>  * ensure there are always free transaction IDs available for normal
> operation...
> "
>
> Maybe totally worthless and too much, but something like that might set a reader
> up for just a bit more context. Then you could take most of what comes before
> "2.5.2.2.1 Aggressive Vacuum" as a subsection (would require a renumber below)
> with something like "2.5.2.2.1 Normal Freezing Activity"

I think I know what you mean. What I've tried to do here is start with
freezing, and describe it as something that has immediate benefits,
that can be understood as useful, independently of its role in
advancing relfrozenxid later on. So now you wonder: what specific
benefits do I get?

It's hard to be too concrete about those benefits, because you have
things like hint bits. I could say something like that, but I think
I'd have to hedge too much, because you also have hint bits, that help
query response times in roughly the same way (albeit less reliably,
albeit without being set on physical replication standbys when they're
set on the primary).

> 8) Note "In PostgreSQL versions before 16..."
> Showing my naivety, somehow this isn't connecting with me totally. If
> it's important
> to call out, then maybe we need a connecting sentence. Based on the content
> above, I think you're pointing to "It's also why VACUUM will freeze all eligible
> tuples from a heap page once the decision to freeze at least one tuple
> is taken:"
> If that's it, it's just not clear to me what's totally changed. Sorry,
> more learning. :-)

In Postgres 15, vacuum_freeze_min_age was applied in a way that only
froze whatever XIDs could be frozen from the page -- so if you had
half the tuples that were older, and half that were younger, you'd
only freeze the older half. Even when it might have cost you
practically nothing to freeze them all in one go. Now, as the text
you've quoted points out, vacuum_freeze_min_age triggers freezing at
the level of whole pages, including for new XIDs (though only if
they're eligible to be frozen, meaning that everybody agrees that
they're all visible now). So vacuum_freeze_min_age picks pages to
freeze, not individual tuples to freeze (this optimization is so
obvious that it's a little surprising that it took as long as it did
to get in).

Page-level freezing justifies the following statement from the patch,
for example:

"It doesn't matter if it was vacuum_freeze_table_age or
vacuum_multixact_freeze_table_age that made VACUUM use its aggressive
strategy. Every aggressive VACUUM will advance relfrozenxid and
relminmxid by applying the same generic policy that controls which
pages are frozen."

Now, since freezing works at the level of physical heap pages in 16,
the thing that triggers aggressive VACUUM matters less (just as the
thing that triggers freezing of individual pages matters much less --
freezing is freezing). There is minimal risk of freezing the same page
3 times during each of 3 different aggressive VACUUMs. To a much
greater extent, 3 aggressive VACUUMs isn't that different to only 1
aggressive VACUUM for those pages that were already "settled" from the
start. As a result, the addition of page-level freezing made
vacuum_freeze_min_age somewhat less bad -- in 16, its behavior was a
little less dependent on the phase of the moon (especially during
aggressive VACUUMs).

I really value stuff like that -- cases where you as a user can think
of something as independent to some other thing that you also need to
tune. There needs to be a lot more such improvements, but at least we
have this one now.

--
Peter Geoghegan



On Thu, 11 May 2023 at 16:41, Peter Geoghegan <pg@bowt.ie> wrote:
>
> > I say "exhaustion" or "overload" are meaningless because their meaning
> > is entirely dependent on context. It's not like memory exhaustion or
> > i/o overload where it's a finite resource and it's just the sheer
> > amount in use that matters.
>
> But transaction IDs are a finite resource, in the sense that you can
> never have more than about 2.1 billion distinct unfrozen XIDs at any
> one time. "Transaction ID exhaustion" is therefore a lot more
> descriptive of the underlying problem. It's a lot better than
> wraparound, which, as I've said, is inaccurate in two major ways:

I realize that's literally true that xids are a finite resource. But
that's not what people think of when you talk about exhausting a
finite resource.

It's not like there are 2 billion XIDs in a big pool being used and
returned and as long as you don't use too many XIDs leaving the pool
empty you're ok. When people talk about resource exhaustion they
imagine that they just need a faster machine or some other way of just
putting more XIDs in the pool so they can keep using them at a faster
rate.

I really think focusing on changing one term of art for another,
neither of which is at all meaningful without an extensive technical
explanation helps anyone. All it does is hide all the existing
explanations that are all over the internet from them.

> 1. Most cases involving xidStopLimit (or even single-user mode data
> corruption) won't involve any kind of physical integer wraparound.

Fwiw I've never actually bumped into anyone talking about integer
overflow (which isn't usually called "wraparound" anyways). And in any
case it's not a terrible misconception, it at least gives users a
reasonable model for how XID space consumption works. In fact it's not
even entirely wrong -- it's not the XID itself that's overflowing but
the difference between the XID and the frozen XID.

-- 
greg



On Sat, May 13, 2023 at 7:47 PM Greg Stark <stark@mit.edu> wrote:
> It's not like there are 2 billion XIDs in a big pool being used and
> returned and as long as you don't use too many XIDs leaving the pool
> empty you're ok. When people talk about resource exhaustion they
> imagine that they just need a faster machine or some other way of just
> putting more XIDs in the pool so they can keep using them at a faster
> rate.
>
> I really think focusing on changing one term of art for another,
> neither of which is at all meaningful without an extensive technical
> explanation helps anyone. All it does is hide all the existing
> explanations that are all over the internet from them.

Have you read the documentation in question recently? The first two
paragraphs, in particular:

https://www.postgresql.org/docs/devel/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

As I keep pointing out, we literally introduce the whole topic of
freezing/wraparound by telling users that VACUUM needs to avoid
wraparound to stop your database from becoming corrupt. Which is when
"the past becomes the future", or in simple terms data corruption
(without any qualification about single user mode being required to
corrupt the DB). Users think that that's what "wraparound" means
because we've taught them to think that. We've already been giving
users "an extensive technical explanation" for many years -- one that
happens to be both harmful and factually incorrect.

I agree that users basically don't care about unsigned vs signed vs
whatever. But there is a sense that that matters, because the docs
have made it matter. That's my starting point. That's the damage that
I'm trying to undo.

> > 1. Most cases involving xidStopLimit (or even single-user mode data
> > corruption) won't involve any kind of physical integer wraparound.
>
> Fwiw I've never actually bumped into anyone talking about integer
> overflow (which isn't usually called "wraparound" anyways). And in any
> case it's not a terrible misconception, it at least gives users a
> reasonable model for how XID space consumption works. In fact it's not
> even entirely wrong -- it's not the XID itself that's overflowing but
> the difference between the XID and the frozen XID.

Even your "not entirely wrong" version is entirely wrong. What you
describe literally cannot happen (outside of single user mode),
because xidStopLimit stops it from happening. To me your argument
seems similar to arguing that it's okay to call chemotherapy "cancer"
on the grounds that "cancer" refers to something that you really ought
to avoid in the first place in any case, which makes the whole
distinction irrelevant to non-oncologists.

That said, I concede that the term wraparound is too established to
just get rid of now. The only viable way forward now may be to
encourage users to think about it in the way that you suppose they
must already think about it. That is, to prominently point out that
"wraparound" actually refers to a protective mode of operation where
XID allocations are temporarily disallowed. And not pretty much
nothing to do with "wraparound" of the kind that the user may be
familiar with from other contexts. Including (and especially) all
earlier versions of the Postgres docs.

--
Peter Geoghegan



On Sun, May 14, 2023 at 1:59 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Have you read the documentation in question recently? The first two
> paragraphs, in particular:
>
> https://www.postgresql.org/docs/devel/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
>
> As I keep pointing out, we literally introduce the whole topic of
> freezing/wraparound by telling users that VACUUM needs to avoid
> wraparound to stop your database from becoming corrupt. Which is when
> "the past becomes the future"

I went through the history of maintenance.sgml. "Routine Vacuuming"
dates back to 2001. Sure enough, our current "25.1.5. Preventing
Transaction ID Wraparound Failures" introductory paragraphs (the ones
that I find so misleading and alarmist) appear in the original
version, too. But in 2001, they weren't alarmist -- they were
proportionate to the risk that existed at the time. This becomes
totally clear once you see the original. In particular, once you see
the current introductory paragraphs next to another paragraph in the
original 2001 version:


https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=doc/src/sgml/maintenance.sgml;h=7629c5bd541e25b88241e030ca86a243e964e4b7;hb=c5b7f67fcc8c4a01c82660eb0996a3c697fac283#l245

The later paragraph follows up by saying: "In practice this [somewhat
regular vacuuming] isn't an onerous requirement, but since the
consequences of failing to meet it can be ___complete data loss___
(not just wasted disk space or slow performance), some special
provisions...". This means that my particular interpretation of the
25.1.5. introductory paragraphs are absolutely consistent with the
original intent from the time they were written. I'm now more
confident than ever that all of the stuff about "catastrophic data
loss" should have been removed in 2005 or 2006 at the latest. It
*almost* was removed around that time, but for whatever reason it
wasn't removed in full. And for whatever reason it didn't quite
register with anybody in a position to do much about it.

--
Peter Geoghegan