Thread: What is "wraparound failure", really?

What is "wraparound failure", really?

From
Peter Geoghegan
Date:
The wraparound failsafe mechanism added by commit 1e55e7d1 had minimal
documentation -- just a basic description of how the GUCs work. I
think that it certainly merits some discussion under "25.1. Routine
Vacuuming" -- more specifically under "25.1.5. Preventing Transaction
ID Wraparound Failures". One reason why this didn't happen in the
original commit was that I just didn't know where to start with it.
The docs in question have said this since 2006's commit 48188e16 first
added autovacuum_freeze_max_age:

"The sole disadvantage of increasing autovacuum_freeze_max_age (and
vacuum_freeze_table_age along with it) is that the pg_xact and
pg_commit_ts subdirectories of the database cluster will take more
space..."

This sentence seems completely unreasonable to me. It seems to just
ignore the huge disadvantage of increasing autovacuum_freeze_max_age:
the *risk* that the system will stop being able to allocate new XIDs
because GetNewTransactionId() errors out with "database is not
accepting commands to avoid wraparound data loss...". Sure, it's
possible to take a lot of risk here without it ever blowing up in your
face. And if it doesn't blow up then the downside really is zero. This
is hardly a sensible way to talk about this important risk. Or any
risk at all.

At first I thought that the sentence was not just misguided -- it
seemed downright bizarre. I thought that it was directly at odds with
the title "Preventing Transaction ID Wraparound Failures". I thought
that the whole point of this section was how not to have a wraparound
failure (as I understand the term), and yet we seem to deliberately
ignore the single most important practical aspect of making sure that
that doesn't happen. But I now suspect that the basic definitions have
been mixed up in a subtle but important way.

What the documentation calls a "wraparound failure" seems to be rather
different to what I thought that that meant. As I said, I thought that
that meant the condition of being unable to get new transaction IDs
(at least until the DBA runs VACUUM in single user mode). But the
documentation in question seems to actually define it as "the
condition of an old MVCC snapshot failing to see a version from the
distant past, because somehow an XID wraparound suddenly makes it look
as if it's in the distant future rather than in the past". It's
actually talking about a subtly different thing, so the "sole
disadvantage" sentence is not actually bizarre. It does still seem
impractical and confusing, though.

I strongly suspect that my interpretation of what "wraparound failure"
means is actually the common one. Of course the system is never under
any circumstances allowed to give totally wrong answers to queries, no
matter what -- users should be able to take that much for granted.
What users care about here is sensibly managing XIDs as a resource --
preventing "XID exhaustion" while being conservative, but not
ridiculously conservative. Could the documentation be completely
misleading users here?

I have two questions:

1. Do I have this right? Is there really confusion about what a
"wraparound failure" means, or is the confusion mine alone?

2. How do I go about integrating discussion of the failsafe here?
Anybody have thoughts on that?

-- 
Peter Geoghegan



Re: What is "wraparound failure", really?

From
Andrew Dunstan
Date:
On 6/27/21 4:36 PM, Peter Geoghegan wrote:
> The wraparound failsafe mechanism added by commit 1e55e7d1 had minimal
> documentation -- just a basic description of how the GUCs work. I
> think that it certainly merits some discussion under "25.1. Routine
> Vacuuming" -- more specifically under "25.1.5. Preventing Transaction
> ID Wraparound Failures". One reason why this didn't happen in the
> original commit was that I just didn't know where to start with it.
> The docs in question have said this since 2006's commit 48188e16 first
> added autovacuum_freeze_max_age:
>
> "The sole disadvantage of increasing autovacuum_freeze_max_age (and
> vacuum_freeze_table_age along with it) is that the pg_xact and
> pg_commit_ts subdirectories of the database cluster will take more
> space..."
>
> This sentence seems completely unreasonable to me. It seems to just
> ignore the huge disadvantage of increasing autovacuum_freeze_max_age:
> the *risk* that the system will stop being able to allocate new XIDs
> because GetNewTransactionId() errors out with "database is not
> accepting commands to avoid wraparound data loss...". Sure, it's
> possible to take a lot of risk here without it ever blowing up in your
> face. And if it doesn't blow up then the downside really is zero. This
> is hardly a sensible way to talk about this important risk. Or any
> risk at all.
>
> At first I thought that the sentence was not just misguided -- it
> seemed downright bizarre. I thought that it was directly at odds with
> the title "Preventing Transaction ID Wraparound Failures". I thought
> that the whole point of this section was how not to have a wraparound
> failure (as I understand the term), and yet we seem to deliberately
> ignore the single most important practical aspect of making sure that
> that doesn't happen. But I now suspect that the basic definitions have
> been mixed up in a subtle but important way.
>
> What the documentation calls a "wraparound failure" seems to be rather
> different to what I thought that that meant. As I said, I thought that
> that meant the condition of being unable to get new transaction IDs
> (at least until the DBA runs VACUUM in single user mode). But the
> documentation in question seems to actually define it as "the
> condition of an old MVCC snapshot failing to see a version from the
> distant past, because somehow an XID wraparound suddenly makes it look
> as if it's in the distant future rather than in the past". It's
> actually talking about a subtly different thing, so the "sole
> disadvantage" sentence is not actually bizarre. It does still seem
> impractical and confusing, though.
>
> I strongly suspect that my interpretation of what "wraparound failure"
> means is actually the common one. Of course the system is never under
> any circumstances allowed to give totally wrong answers to queries, no
> matter what -- users should be able to take that much for granted.
> What users care about here is sensibly managing XIDs as a resource --
> preventing "XID exhaustion" while being conservative, but not
> ridiculously conservative. Could the documentation be completely
> misleading users here?
>
> I have two questions:
>
> 1. Do I have this right? Is there really confusion about what a
> "wraparound failure" means, or is the confusion mine alone?
>
> 2. How do I go about integrating discussion of the failsafe here?
> Anybody have thoughts on that?
>


AIUI, actual wraparound (i.e. an xid crossing the event horizon so it
appears to be in the future) is no longer possible. But it once was a
very real danger. Maybe the docs haven't quite caught up.


In practical terms, there is an awful lot of head room between the
default for autovacuum_freeze_max_age and any danger of major
anti-wraparound measures. Say you increase it to 1bn from the default
200m. That still leaves you ~1bn transactions of headroom.


cheers


andrew




--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: What is "wraparound failure", really?

From
Thomas Munro
Date:
On Mon, Jun 28, 2021 at 8:36 AM Peter Geoghegan <pg@bowt.ie> wrote:
> "The sole disadvantage of increasing autovacuum_freeze_max_age (and
> vacuum_freeze_table_age along with it) is that the pg_xact and
> pg_commit_ts subdirectories of the database cluster will take more
> space..."

Just by the way, if we're updating this sentence, it continues
"because it must store..." but it should surely be "because they must
store...".



Re: What is "wraparound failure", really?

From
Peter Geoghegan
Date:
On Sun, Jun 27, 2021 at 4:23 PM Andrew Dunstan <andrew@dunslane.net> wrote:
> AIUI, actual wraparound (i.e. an xid crossing the event horizon so it
> appears to be in the future) is no longer possible. But it once was a
> very real danger. Maybe the docs haven't quite caught up.

This was added a few years after freezing was first invented, which
was arguably the last time that the design fundamentally changed. I
think we all agree that it's fundamentally not okay to give wrong
answers to queries -- it doesn't even need to be stated in the docs
IMV. So why does this section of the docs spend so much time talking
about something that fundamentally cannot happen? Why not have it
focus on the bad outcome that there is a real risk of instead? Namely
the risk of the system refusing to allow new XIDs (as a means of
avoiding the wrong answers when all else fails).

It's hard to talk about the new failsafe in this section of the docs
now, since it's unclear whether it exists to advise the user on ways
of avoiding the "can't allocate XIDs" failure mode. It could be
interpreted that way, or it could just be explaining and/or justifying
the existence of the failure mode. That seems like a real problem.

> In practical terms, there is an awful lot of head room between the
> default for autovacuum_freeze_max_age and any danger of major
> anti-wraparound measures. Say you increase it to 1bn from the default
> 200m. That still leaves you ~1bn transactions of headroom.

I agree that in practice that's often fine. But my point is that there
is another very good reason to not increase autovacuum_freeze_max_age,
contrary to what the docs say (actually there is a far better reason
than truncating clog). Namely, increasing it will generally increase
the risk of VACUUM not finishing in time. If that happens the user
gets the "can't allocate XIDs" failure mode (which is what I have
called wraparound failure up until now), which is one of the worst
things that can happen. This makes the inability to truncate clog look
like a totally trivial issue in comparison.

Reasonable people can disagree about when and how increasing
autovacuum_freeze_max_age becomes truly reckless. However, I don't
think that anybody would be willing to argue that setting it to the
maximum of 2 billion could ever make sense in production, to go with
the obvious extreme case. The benefits that you get from such a high
setting over and above what you get with a moderately high setting
(perhaps 1 - 1.5 billion) are really quite small, while the risk
shoots up fast past a certain point.

Regardless of what the nuances of increasing autovacuum_freeze_max_age
are, stating that the sole disadvantage is that you cannot truncate
clog and other SLRUs is clearly wrong.

-- 
Peter Geoghegan



Re: What is "wraparound failure", really?

From
Masahiko Sawada
Date:
On Mon, Jun 28, 2021 at 5:36 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> The wraparound failsafe mechanism added by commit 1e55e7d1 had minimal
> documentation -- just a basic description of how the GUCs work. I
> think that it certainly merits some discussion under "25.1. Routine
> Vacuuming" -- more specifically under "25.1.5. Preventing Transaction
> ID Wraparound Failures". One reason why this didn't happen in the
> original commit was that I just didn't know where to start with it.
> The docs in question have said this since 2006's commit 48188e16 first
> added autovacuum_freeze_max_age:
>
> "The sole disadvantage of increasing autovacuum_freeze_max_age (and
> vacuum_freeze_table_age along with it) is that the pg_xact and
> pg_commit_ts subdirectories of the database cluster will take more
> space..."
>
> This sentence seems completely unreasonable to me. It seems to just
> ignore the huge disadvantage of increasing autovacuum_freeze_max_age:
> the *risk* that the system will stop being able to allocate new XIDs
> because GetNewTransactionId() errors out with "database is not
> accepting commands to avoid wraparound data loss...". Sure, it's
> possible to take a lot of risk here without it ever blowing up in your
> face. And if it doesn't blow up then the downside really is zero. This
> is hardly a sensible way to talk about this important risk. Or any
> risk at all.
>
> At first I thought that the sentence was not just misguided -- it
> seemed downright bizarre. I thought that it was directly at odds with
> the title "Preventing Transaction ID Wraparound Failures". I thought
> that the whole point of this section was how not to have a wraparound
> failure (as I understand the term), and yet we seem to deliberately
> ignore the single most important practical aspect of making sure that
> that doesn't happen. But I now suspect that the basic definitions have
> been mixed up in a subtle but important way.
>
> What the documentation calls a "wraparound failure" seems to be rather
> different to what I thought that that meant. As I said, I thought that
> that meant the condition of being unable to get new transaction IDs
> (at least until the DBA runs VACUUM in single user mode). But the
> documentation in question seems to actually define it as "the
> condition of an old MVCC snapshot failing to see a version from the
> distant past, because somehow an XID wraparound suddenly makes it look
> as if it's in the distant future rather than in the past". It's
> actually talking about a subtly different thing, so the "sole
> disadvantage" sentence is not actually bizarre. It does still seem
> impractical and confusing, though.
>
> I strongly suspect that my interpretation of what "wraparound failure"
> means is actually the common one. Of course the system is never under
> any circumstances allowed to give totally wrong answers to queries, no
> matter what -- users should be able to take that much for granted.
> What users care about here is sensibly managing XIDs as a resource --
> preventing "XID exhaustion" while being conservative, but not
> ridiculously conservative. Could the documentation be completely
> misleading users here?
>
> I have two questions:
>
> 1. Do I have this right? Is there really confusion about what a
> "wraparound failure" means, or is the confusion mine alone?
>
> 2. How do I go about integrating discussion of the failsafe here?
> Anybody have thoughts on that?

Looking through the doc again, it seems to me that there is no
explicit explanation for the worst situation. It might be true in
principle that “XID wraparound failure” means catastrophic data loss
due to XID wraparound. But it doesn’t actually happen since we
disallow to allocate new XID three million XID before the wraparound.
In other words, entering the read-only mode is the worst situation in
PostgreSQL in terms of XID consumption. There is some description of
refusing to start any new transactions at the end of section 25.1.5
but it seems neither enough nor accurate. It describes the read-only
mode from only the aspect of a safeguard but not from the aspect of
the situation where we want to avoid. Explicitly describing also the
latter aspect could give weight to both the description of failsafe
mode, especially why we skip some operations to speed up increasing
relfrozenxid in that mode, and another disadvantage of increasing
autovacuum_freeze_max_age.
Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



Re: What is "wraparound failure", really?

From
Andrew Dunstan
Date:
On 6/28/21 2:39 AM, Peter Geoghegan wrote:
> On Sun, Jun 27, 2021 at 4:23 PM Andrew Dunstan <andrew@dunslane.net> wrote:
>

>> In practical terms, there is an awful lot of head room between the
>> default for autovacuum_freeze_max_age and any danger of major
>> anti-wraparound measures. Say you increase it to 1bn from the default
>> 200m. That still leaves you ~1bn transactions of headroom.
> I agree that in practice that's often fine. But my point is that there
> is another very good reason to not increase autovacuum_freeze_max_age,
> contrary to what the docs say (actually there is a far better reason
> than truncating clog). Namely, increasing it will generally increase
> the risk of VACUUM not finishing in time. If that happens the user
> gets the "can't allocate XIDs" failure mode (which is what I have
> called wraparound failure up until now), which is one of the worst
> things that can happen. This makes the inability to truncate clog look
> like a totally trivial issue in comparison.
>
> Reasonable people can disagree about when and how increasing
> autovacuum_freeze_max_age becomes truly reckless. However, I don't
> think that anybody would be willing to argue that setting it to the
> maximum of 2 billion could ever make sense in production, to go with
> the obvious extreme case. The benefits that you get from such a high
> setting over and above what you get with a moderately high setting
> (perhaps 1 - 1.5 billion) are really quite small, while the risk
> shoots up fast past a certain point.
>
> Regardless of what the nuances of increasing autovacuum_freeze_max_age
> are, stating that the sole disadvantage is that you cannot truncate
> clog and other SLRUs is clearly wrong.
>


Sure, I'm not suggesting the docs can't have some improvement.

This is one of those things that in my experience most people don't get.
Indeed, I didn't really get it either until I had to explain it with
some clarity to a very confused customer. And I find it's best explained
by showing what bad results are being avoided by it. Freezing is one of
those almost useless things you just have to do. It doesn't help that
it's tangled up with VACUUM, so when you explain that it's not about
reclaiming dead space heads start to explode.

But if you're really worried about people setting
autovacuum_freeze_max_age too high, then maybe we should be talking
about capping it at a lower level rather than adjusting the docs that
most users don't read.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: What is "wraparound failure", really?

From
Noah Misch
Date:
On Mon, Jun 28, 2021 at 08:51:50AM -0400, Andrew Dunstan wrote:
> On 6/28/21 2:39 AM, Peter Geoghegan wrote:
> > I agree that in practice that's often fine. But my point is that there
> > is another very good reason to not increase autovacuum_freeze_max_age,
> > contrary to what the docs say (actually there is a far better reason
> > than truncating clog). Namely, increasing it will generally increase
> > the risk of VACUUM not finishing in time.

Yep, that doc section's priorities are out of date.

> But if you're really worried about people setting
> autovacuum_freeze_max_age too high, then maybe we should be talking
> about capping it at a lower level rather than adjusting the docs that
> most users don't read.

If a GUC minimum or maximum feels like a mainstream choice, it's probably too
strict.  Hence, I think the current maximum is fine.  At 93% of the XID space,
it's not risk-averse, but it's not absurd.



Re: What is "wraparound failure", really?

From
Robert Haas
Date:
On Mon, Jun 28, 2021 at 8:52 AM Andrew Dunstan <andrew@dunslane.net> wrote:
> But if you're really worried about people setting
> autovacuum_freeze_max_age too high, then maybe we should be talking
> about capping it at a lower level rather than adjusting the docs that
> most users don't read.

The problem is that the setting is measuring something that is a
pretty poor proxy for the thing we actually care about. It's measuring
the XID age at which we're going to start forcing vacuums on tables
that don't otherwise need to be vacuumed, but the thing we care about
is the XID age at which those vacuums are going to *finish*. Now maybe
you think that's a minor difference, and if your tables are small, it
is, but if they're really big, it's not. If you have only tables that
are say 1GB in size and your system is otherwise well-configured, you
could probably crank autovacuum_freeze_max_age up all the way to the
max without a problem. But if you have 1TB tables, you are going to
need a lot more headroom. The exact amount of headroom you need
depends especially on the size of your largest tables, but also on how
well-distributed the relfrozenxid values are, and on the total sizes
of all your tables, on your I/O subsystem, on your XID consumption
rate, on your vacuum delay settings, and on whether you want to make
any allowance for the rare but possible scenario where vacuum dies to
an ERROR. This means that in practice nobody knows whether a
particular setting of autovacuum_freeze_max_age on a particular system
is safe or not, except in the absolutely most obvious cases. Capping
it at a lower level would prevent some people from doing things that
are perfectly safe and still not prevent other people from doing
things that are horribly dangerous.

I think what we really need here is some kind of deadline-based
scheduler. As Peter says, the problem is that we might run out of
XIDs. The system should be constantly thinking about that and taking
appropriate emergency actions to make sure it doesn't happen. Right
now it's really pretty chill about the possibility of looming
disaster. Imagine that you hire a babysitter and tell them to get the
kids out of the house if there's a fire. While you're out, a volcano
erupts down the block. A giant cloud of ash forms and there's lava
everywhere, even touching the house, which begins to smolder, but the
babysitter just sits there and watches TV. As soon as the first flames
appear, the babysitter stops watching TV, gets the kids, and tries to
leave the premises. That's our autovacuum scheduler! It has no
inclination or ability to see the future; it makes decisions entirely
based on the present state of things. In a lot of cases that's OK, but
sometimes it leads to a completely ridiculous outcome.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: What is "wraparound failure", really?

From
Peter Geoghegan
Date:
On Wed, Jun 30, 2021 at 6:46 AM Robert Haas <robertmhaas@gmail.com> wrote:
> The problem is that the setting is measuring something that is a
> pretty poor proxy for the thing we actually care about. It's measuring
> the XID age at which we're going to start forcing vacuums on tables
> that don't otherwise need to be vacuumed, but the thing we care about
> is the XID age at which those vacuums are going to *finish*. Now maybe
> you think that's a minor difference, and if your tables are small, it
> is, but if they're really big, it's not. If you have only tables that
> are say 1GB in size and your system is otherwise well-configured, you
> could probably crank autovacuum_freeze_max_age up all the way to the
> max without a problem. But if you have 1TB tables, you are going to
> need a lot more headroom.

I 100% agree with all of that. However, I can't help but notice that
your argument seems to work best as an argument against how freezing
works in general. The scheduling is way too complex because we're
fundamentally trying to model something that is way too complex and
nonlinear by its very nature. It's true that we can do a better job by
continually updating our understanding of the state of the system
dynamically, during each VACUUM. But maybe we should get rid of
freezing instead. Is it really so hard to do that, in the grand scheme
of things?

We have tuple freezing because we need it to solve a problem with the
"physical database" (not the "logical database"). Namely the problem
of having 32-bit XIDs in tuple headers when 64-bit XIDs are
theoretically what we need. I'm not actually in favor of 64-bit XIDs
in tuple headers (or anything like it), but I am in favor of at least
solving the problem with a true "physical database" level solution.
The definition of freezing unnecessarily couples how we handle the XID
issue with GC by VACUUM, which makes everything much more fragile. A
frozen tuple must necessarily be visible to any possible MVCC
snapshot. That's really fragile, in many different ways. It's also
unnecessary.

Why should XID wraparound be a problem for the entire system? Why not
just make it a problem for any very old MVCC snapshots that are
*actually* about to be affected? Some kind of "snapshot too old"
approach seems quite possible. I think that we can do a lot better
than freezing within the confines of the current heapam design (or the
design prior to the introduction of freezing ~20 years ago). Once
aborted XIDs are removed eagerly, a strict "logical vs physical"
separation of concerns can be imposed.

I'm sorry to go on about this again and again, but it really does seem
related to what you're saying. The current freezing design is hard to
model because it's inherently fragile.

> I think what we really need here is some kind of deadline-based
> scheduler. As Peter says, the problem is that we might run out of
> XIDs. The system should be constantly thinking about that and taking
> appropriate emergency actions to make sure it doesn't happen. Right
> now it's really pretty chill about the possibility of looming
> disaster. Imagine that you hire a babysitter and tell them to get the
> kids out of the house if there's a fire. While you're out, a volcano
> erupts down the block. A giant cloud of ash forms and there's lava
> everywhere, even touching the house, which begins to smolder, but the
> babysitter just sits there and watches TV. As soon as the first flames
> appear, the babysitter stops watching TV, gets the kids, and tries to
> leave the premises. That's our autovacuum scheduler! It has no
> inclination or ability to see the future; it makes decisions entirely
> based on the present state of things. In a lot of cases that's OK, but
> sometimes it leads to a completely ridiculous outcome.

Yeah, it's still pretty absurd, even with the failsafe.

To extend your analogy, in the real world the babysitter can afford to
make very conservative assumptions about whether or not the house is
about to catch fire. In practice the chances of that happening on any
given day are certainly very low -- it'll probably never come close to
happening even once. And there is an inherent asymmetry, since of
course the cost of a false positive is that the friends reunion
episode is unnecessarily cut short, which is totally inconsequential
compared to the cost of a false negative. If there wasn't such a big
asymmetry then what we'd probably do is not even think about what the
babysitter does -- we just wouldn't care at all.

Anyway, I'll try to come up with a way of rewording this section of
the docs that mostly preserves its existing structure, but makes it
possible to talk about the failsafe. The current structure of this
section of the docs is needlessly ambiguous, but I think that that can
be fixed without changing too much. FWIW I have heard things that
suggest that some users believe that modern PostgreSQL can actually
allow "the past to look like the future" in some cases -- probably
because of the wording here. This area of the system certainly is
scary, but it's not quite that scary.

-- 
Peter Geoghegan