Thread: improving wraparound behavior

improving wraparound behavior

From

Robert Haas

Date:

03 May 2019, 20:26:46

I spent a significant chunk of today burning through roughly 2^31 XIDs
just to see what would happen.  My test setup consisted of
autovacuum=off plus a trivial prepared transaction plus a lot of this:

+        BeginInternalSubTransaction("txid_burn");
+        (void) GetCurrentTransactionId();
+        ReleaseCurrentSubTransaction();

Observations:

1. As soon as the XID of the prepared transaction gets old enough to
trigger autovacuum, autovacuum goes nuts.  It vacuums everything in
the database over and over again, but that does no good, because the
prepared transaction holds back the XID horizon.  There are previous
reports of this and related problems, such as this one from 2014:

http://postgr.es/m/CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=YPyFPQ@mail.gmail.com

That thread got hung up on the question of prioritization: if there's
a lot of stuff that needs to be autovacuumed, which stuff should we do
first?  But I think that we overlooked a related issue, which is that
there's no point in autovacuuming a table in the first place if doing
so won't advance help advance relfrozenxid and/or relminmxid.  The
autovacuum launcher will happily compute a force limit that is newer
than OldestXmin and decide on that basis to route a worker to a
particular database, and that worker will then compute a force limit
that is newer than OldestXmin examine relations in that database and
decide to vacuum them, and then the vacuum operation itself will
decide on a similar basis that it's going to be aggressive vacuum.
But we can't actually remove any tuples that are newer than
OldestXmin, so we have no actual hope of accomplishing anything by
that aggressive vacuum.  I am not sure exactly how to fix this,
because the calculation we use to determine the XID that can be used
to vacuum a specific table is pretty complex; how can the postmaster
know whether it's going to be able to make any progress in *any* table
in some database to which it's not even connected?  But it's surely
crazy to just keep doing something over and over that can't possibly
work.

2. Once you get to the point where you start to emit errors when
attempting to assign an XID, you can still run plain old VACUUM
because it doesn't consume an XID ... except that if it tries to
truncate the relation, then it will take AccessExclusiveLock, which
has to be logged, which forces an XID assignment, which makes VACUUM
fail.  So if you burn through XIDs until the system gets to this
point, and then you roll back the prepared transaction that caused the
problem in the first place, autovacuum sits there trying to vacuum
tables in a tight loop and fails over and over again as soon as hits a
table that it thinks needs to be truncated.  This seems really lame,
and also easily fixed.

Attached is a patch that disables vacuum truncation if xidWarnLimit
has been reached.  With this patch, in my testing, autovacuum is able
to recover the system once the prepared transaction has been rolled
back.  Without this patch, not only does that not happen, but if you
had a database with enough relations that need truncation, you could
conceivably cause XID wraparound just from running a database-wide
VACUUM, the one tool you have available to avoid XID wraparound.  I
think that this amounts to a back-patchable bug fix.

(One could argue that truncation should be disabled sooner than this,
like when we've exceed autovacuum_freeze_max_age, or later than this,
like when we hit xidStopLimit, but I think xidWarnLimit is probably
the best compromise.)

3. The message you get when you hit xidStopLimit seems like bad advice to me:

ERROR:  database is not accepting commands to avoid wraparound data
loss in database "%s"
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions,
or drop stale replication slots.

Why do we want people to stop the postmaster and vacuum that database
in single user mode?  Why not just run VACUUM in multi-user mode, or
let autovacuum take care of the problem?  Granted, if VACUUM is going
to fail in multi-user mode, and if switching to single-user mode is
going to make it succeed, then it's a good suggestion.  But it seems
that it doesn't fail in multi-user mode, unless it tries to truncate
something, which is a bug we should fix.  Telling people to go to
single-user mode where they can continue to assign XIDs even though
they have almost no XIDs left seems extremely dangerous, actually.

Also, I think that old prepared transactions and stale replication
slots should be emphasized more prominently.  Maybe something like:

HINT:  Commit or roll back old prepared transactions, drop stale
replication slots, or kill long-running sessions.
Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

0001-vacuum-Don-t-try-to-truncate-if-the-XID-warn-limit-h.patch

Re: improving wraparound behavior

From

Andres Freund

Date:

03 May 2019, 20:47:27

Hi,

On 2019-05-03 16:26:46 -0400, Robert Haas wrote:
> 2. Once you get to the point where you start to emit errors when
> attempting to assign an XID, you can still run plain old VACUUM
> because it doesn't consume an XID ... except that if it tries to
> truncate the relation, then it will take AccessExclusiveLock, which
> has to be logged, which forces an XID assignment, which makes VACUUM
> fail.  So if you burn through XIDs until the system gets to this
> point, and then you roll back the prepared transaction that caused the
> problem in the first place, autovacuum sits there trying to vacuum
> tables in a tight loop and fails over and over again as soon as hits a
> table that it thinks needs to be truncated.  This seems really lame,
> and also easily fixed.
> 
> Attached is a patch that disables vacuum truncation if xidWarnLimit
> has been reached.  With this patch, in my testing, autovacuum is able
> to recover the system once the prepared transaction has been rolled
> back.  Without this patch, not only does that not happen, but if you
> had a database with enough relations that need truncation, you could
> conceivably cause XID wraparound just from running a database-wide
> VACUUM, the one tool you have available to avoid XID wraparound.  I
> think that this amounts to a back-patchable bug fix.
> 
> (One could argue that truncation should be disabled sooner than this,
> like when we've exceed autovacuum_freeze_max_age, or later than this,
> like when we hit xidStopLimit, but I think xidWarnLimit is probably
> the best compromise.)

I'd actually say the proper fix would be to instead move the truncation
to *after* finishing updating relfrozenxid etc.  If we truncate, the
additional cost of another in-place pg_class update, to update relpages,
is basically insignificant.  And the risk of errors, or being cancelled,
during truncation is much higher than before (due to the AEL).

> Also, I think that old prepared transactions and stale replication
> slots should be emphasized more prominently.  Maybe something like:
> 
> HINT:  Commit or roll back old prepared transactions, drop stale
> replication slots, or kill long-running sessions.
> Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.

I think it'd be good to instead compute what the actual problem is. It'd
not be particularly hard to show some these in the errdetail:

1) the xid horizon (xid + age) of the problematic database; potentially,
   if connected to that database, additionally compute what the oldest
   xid is (although that's computationally potentially too expensive)
2) the xid horizon (xid + age) due to prepared transactions, and the
   oldest transaction's name
3) the xid horizon (xid + age) due to replication slot, and the "oldest"
   slot's name
4) the xid horizon (xid + age) and pid for the connection with the
   oldest snapshot.

I think that'd allow users much much easier to pinpoint what's going on.

In fact, I think we probably should additionally add a function that can
display the above. That'd make it much easier to write monitoring
queries.

IMO we also ought to compute the *actual* relfrozenxid/relminmxid for a
table. I.e. the oldest xid actually present. It's pretty common for most
tables to have effective horizons that are much newer than what
GetOldestXmin()/vacuum_set_xid_limits() can return. Obviously we can do
so only when scanning all non-frozen pages. But being able to record
"more aggressive" horizons would often save unnecessary work.  And it
ought to not be hard.  I think especially for regular non-freeze,
non-wraparound vacuums that'll often result in a much newer relfrozenxid
(as we'll otherwise just have GetOldestXmin() - vacuum_freeze_min_age).

Greetings,

Andres Freund

Re: improving wraparound behavior

From

Robert Haas

Date:

03 May 2019, 22:42:35

On Fri, May 3, 2019 at 4:47 PM Andres Freund <andres@anarazel.de> wrote:
> I'd actually say the proper fix would be to instead move the truncation
> to *after* finishing updating relfrozenxid etc.  If we truncate, the
> additional cost of another in-place pg_class update, to update relpages,
> is basically insignificant.  And the risk of errors, or being cancelled,
> during truncation is much higher than before (due to the AEL).

That would prevent the ERROR from impeding relfrozenxid advancement,
but it does not prevent the error itself, nor the XID consumption.  If
autovacuum is hitting that ERROR, it will spew errors in the log but
succeed in advancing relfrozenxid anyway.  I don't think that's as
nice as the behavior I proposed, but it's certainly better than the
status quo.  If you are hitting that error due to a manual VACUUM,
under your proposal, you'll stop the manual VACUUM as soon as you hit
the first table where this happens, which is not what you want.
You'll also keep consuming XIDs, which is not what you want either,
especially if you are in single-user mode because the number of
remaining XIDs is less that a million.

> > Also, I think that old prepared transactions and stale replication
> > slots should be emphasized more prominently.  Maybe something like:
> >
> > HINT:  Commit or roll back old prepared transactions, drop stale
> > replication slots, or kill long-running sessions.
> > Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.
>
> I think it'd be good to instead compute what the actual problem is. It'd
> not be particularly hard to show some these in the errdetail:
>
> 1) the xid horizon (xid + age) of the problematic database; potentially,
>    if connected to that database, additionally compute what the oldest
>    xid is (although that's computationally potentially too expensive)
> 2) the xid horizon (xid + age) due to prepared transactions, and the
>    oldest transaction's name
> 3) the xid horizon (xid + age) due to replication slot, and the "oldest"
>    slot's name
> 4) the xid horizon (xid + age) and pid for the connection with the
>    oldest snapshot.
>
> I think that'd allow users much much easier to pinpoint what's going on.
>
> In fact, I think we probably should additionally add a function that can
> display the above. That'd make it much easier to write monitoring
> queries.

I think that the error and hint that you get from
GetNewTransactionId() has to be something that we can generate very
quickly, without doing anything that might hang on cluster with lots
of databases or lots of relations; but if there's useful detail we can
display there, that's good.  With a view, it's more OK if it takes a
long time on a big cluster.

> IMO we also ought to compute the *actual* relfrozenxid/relminmxid for a
> table. I.e. the oldest xid actually present. It's pretty common for most
> tables to have effective horizons that are much newer than what
> GetOldestXmin()/vacuum_set_xid_limits() can return. Obviously we can do
> so only when scanning all non-frozen pages. But being able to record
> "more aggressive" horizons would often save unnecessary work.  And it
> ought to not be hard.  I think especially for regular non-freeze,
> non-wraparound vacuums that'll often result in a much newer relfrozenxid
> (as we'll otherwise just have GetOldestXmin() - vacuum_freeze_min_age).

Sure, that would make sense.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: improving wraparound behavior

From

Tom Lane

Date:

03 May 2019, 22:46:10

Robert Haas <robertmhaas@gmail.com> writes:
> I spent a significant chunk of today burning through roughly 2^31 XIDs
> just to see what would happen. ...

> 2. Once you get to the point where you start to emit errors when
> attempting to assign an XID, you can still run plain old VACUUM
> because it doesn't consume an XID ... except that if it tries to
> truncate the relation, then it will take AccessExclusiveLock, which
> has to be logged, which forces an XID assignment, which makes VACUUM
> fail.

Yeah.  I tripped over that earlier this week in connection with the
REINDEX business: taking an AEL only forces XID assignment when
wal_level is above "minimal", so it's easy to come to the wrong
conclusions depending on your test environment.  I suspect that
previous testing of wraparound behavior (yes there has been some)
didn't see this effect because the default wal_level didn't use to
cause it to happen.  But anyway, it's there now and I agree we'd
better do something about it.

My brain is too fried from release-note-writing to have any trustworthy
opinion right now about whether your patch is the best way.

            regards, tom lane

Re: improving wraparound behavior

From

Andres Freund

Date:

04 May 2019, 00:45:16

Hi,

On 2019-05-03 18:42:35 -0400, Robert Haas wrote:
> On Fri, May 3, 2019 at 4:47 PM Andres Freund <andres@anarazel.de> wrote:
> > I'd actually say the proper fix would be to instead move the truncation
> > to *after* finishing updating relfrozenxid etc.  If we truncate, the
> > additional cost of another in-place pg_class update, to update relpages,
> > is basically insignificant.  And the risk of errors, or being cancelled,
> > during truncation is much higher than before (due to the AEL).
> 
> That would prevent the ERROR from impeding relfrozenxid advancement,
> but it does not prevent the error itself, nor the XID consumption.  If
> autovacuum is hitting that ERROR, it will spew errors in the log but
> succeed in advancing relfrozenxid anyway.  I don't think that's as
> nice as the behavior I proposed, but it's certainly better than the
> status quo.  If you are hitting that error due to a manual VACUUM,
> under your proposal, you'll stop the manual VACUUM as soon as you hit
> the first table where this happens, which is not what you want.
> You'll also keep consuming XIDs, which is not what you want either,
> especially if you are in single-user mode because the number of
> remaining XIDs is less that a million.

Part of my opposition to just disabling it when close to a wraparound,
is that it still allows to get close to wraparound because of truncation
issues. IMO preventing getting closer to wraparound is more important
than making it more "comfortable" to be in a wraparound situation.

The second problem I see is that even somebody close to a wraparound
might have an urgent need to free up some space. So I'm a bit wary of
just disabling it.

Wonder if there's a reasonable way that'd allow to do the WAL logging
for the truncation without using an xid. One way would be to just get
rid of the lock on the primary as previously discussed. But we could
also drive the locking through the WAL records that do the actual
truncation - then there'd not be a need for an xid.  It's probably not a
entirely trivial change, but I don't think it'd be too bad?


> > > Also, I think that old prepared transactions and stale replication
> > > slots should be emphasized more prominently.  Maybe something like:
> > >
> > > HINT:  Commit or roll back old prepared transactions, drop stale
> > > replication slots, or kill long-running sessions.
> > > Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.
> >
> > I think it'd be good to instead compute what the actual problem is. It'd
> > not be particularly hard to show some these in the errdetail:
> >
> > 1) the xid horizon (xid + age) of the problematic database; potentially,
> >    if connected to that database, additionally compute what the oldest
> >    xid is (although that's computationally potentially too expensive)

s/oldest xid/oldest relfrozenxid/


> > 2) the xid horizon (xid + age) due to prepared transactions, and the
> >    oldest transaction's name
> > 3) the xid horizon (xid + age) due to replication slot, and the "oldest"
> >    slot's name
> > 4) the xid horizon (xid + age) and pid for the connection with the
> >    oldest snapshot.
> >
> > I think that'd allow users much much easier to pinpoint what's going on.
> >
> > In fact, I think we probably should additionally add a function that can
> > display the above. That'd make it much easier to write monitoring
> > queries.
> 
> I think that the error and hint that you get from
> GetNewTransactionId() has to be something that we can generate very
> quickly, without doing anything that might hang on cluster with lots
> of databases or lots of relations; but if there's useful detail we can
> display there, that's good.  With a view, it's more OK if it takes a
> long time on a big cluster.

Yea, I agree it has to be reasonably fast. But all of the above, with
the exception of the optional "oldest table", should be cheap enough to
compute. Sure, a scan through PGXACT isn't cheap, but in comparison to
an ereport() and an impending shutdown it's peanuts. In contrast to
scanning a pg_class that could be many many gigabytes.

Greetings,

Andres Freund

Re: improving wraparound behavior

From

Robert Haas

Date:

04 May 2019, 01:36:24

On Fri, May 3, 2019 at 8:45 PM Andres Freund <andres@anarazel.de> wrote:
> Part of my opposition to just disabling it when close to a wraparound,
> is that it still allows to get close to wraparound because of truncation
> issues.

Sure ... it would definitely be better if vacuum didn't consume XIDs
when it truncates.  On the other hand, only a minority of VACUUM
operations will truncate, so I don't think there's really a big
problem in practice here.

> IMO preventing getting closer to wraparound is more important
> than making it more "comfortable" to be in a wraparound situation.

I think that's a false dichotomy.  It's impossible to create a
situation where no user ever gets into a wraparound situation, unless
we're prepared to do things like automatically drop replication slots
and automatically roll back (or commit?) prepared transactions.  So,
while it is good to prevent a user from getting into a wraparound
situation where we can, it is ALSO good to make it easy to recover
from those situations as painlessly as possible when they do happen.

> The second problem I see is that even somebody close to a wraparound
> might have an urgent need to free up some space. So I'm a bit wary of
> just disabling it.

I would find that ... really surprising.  If you have < 11 million
XIDs left before your data gets eaten by a grue, and you file a bug
report complaining that vacuum won't truncate your tables until you
catch up on vacuuming a bit, I am prepared to offer you no sympathy at
all.  I mean, I'm not going to say that we couldn't invent more
complicated behavior, at least on master, like making the new VACUUM
(TRUNCATE) object ternary-valued: default is on when you have more
than 11 million XIDs remaining and off otherwise, but you can override
either value by saying VACUUM (TRUNCATE { ON | OFF }).  But is that
really a thing?  People have less than 11 million XIDs left and
they're like "forget anti-wraparound, I want to truncate away some
empty pages"?

> Wonder if there's a reasonable way that'd allow to do the WAL logging
> for the truncation without using an xid. One way would be to just get
> rid of the lock on the primary as previously discussed. But we could
> also drive the locking through the WAL records that do the actual
> truncation - then there'd not be a need for an xid.  It's probably not a
> entirely trivial change, but I don't think it'd be too bad?

Beats me.  For me, this is just a bug, not an excuse to redesign
vacuum truncation.  Before Hot Standby, when you got into severe
wraparound trouble, you could vacuum all your tables without consuming
any XIDs.  Now you can't.  That's bad, and I think we should come up
with some kind of back-patchable solution to that problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: improving wraparound behavior

From

Stephen Frost

Date:

04 May 2019, 02:03:18

Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> I am not sure exactly how to fix this,
> because the calculation we use to determine the XID that can be used
> to vacuum a specific table is pretty complex; how can the postmaster
> know whether it's going to be able to make any progress in *any* table
> in some database to which it's not even connected?  But it's surely
> crazy to just keep doing something over and over that can't possibly
> work.

I definitely agree that it's foolish to keep doing something that isn't
going to work, and it seems like a pretty large part of the issue here
is that we don't have enough information to be more intelligent because
we aren't connected to the database that needs the work to be done.

Now, presuming we're talking about 'new feature work' here to try and
address this, and not something that we think we can back-patch, I had
another thought.

I certainly get that having lots of extra processes around can be a
waste of resources... but I don't recall a lot of people complaining
about the autovacuum launcher process using up lots of resources
unnecessairly.

Perhaps we should consider having the 'global' autovacuum launcher, when
it's decided that a database needs work to be done, launch a 'per-DB'
launcher which manages launching autovacuum processes for that database?
If the launcher is still running then it's because there's still work to
be done on that database and the 'global' autovacuum launcher can skip
it.  If the 'per-DB' launcher runs out of things to do, and the database
it's working on is no longer in a danger zone, then it exits.

There are certainly some other variations on this idea and I don't know
that it's really better than keeping more information in shared
memory or something else, but it seems like part of the issue is that
the thing firing off the processes hasn't got enough info to do so
intelligently and maybe we could fix that by having per-DB launchers
that are actually connected to a DB.

> 2. Once you get to the point where you start to emit errors when
> attempting to assign an XID, you can still run plain old VACUUM
> because it doesn't consume an XID ... except that if it tries to
> truncate the relation, then it will take AccessExclusiveLock, which
> has to be logged, which forces an XID assignment, which makes VACUUM
> fail.  So if you burn through XIDs until the system gets to this
> point, and then you roll back the prepared transaction that caused the
> problem in the first place, autovacuum sits there trying to vacuum
> tables in a tight loop and fails over and over again as soon as hits a
> table that it thinks needs to be truncated.  This seems really lame,
> and also easily fixed.
>
> Attached is a patch that disables vacuum truncation if xidWarnLimit
> has been reached.  With this patch, in my testing, autovacuum is able
> to recover the system once the prepared transaction has been rolled
> back.  Without this patch, not only does that not happen, but if you
> had a database with enough relations that need truncation, you could
> conceivably cause XID wraparound just from running a database-wide
> VACUUM, the one tool you have available to avoid XID wraparound.  I
> think that this amounts to a back-patchable bug fix.

Sounds reasonable to me but I've not looked at the patch at all.

> 3. The message you get when you hit xidStopLimit seems like bad advice to me:
>
> ERROR:  database is not accepting commands to avoid wraparound data
> loss in database "%s"
> HINT:  Stop the postmaster and vacuum that database in single-user mode.
> You might also need to commit or roll back old prepared transactions,
> or drop stale replication slots.
>
> Why do we want people to stop the postmaster and vacuum that database
> in single user mode?  Why not just run VACUUM in multi-user mode, or
> let autovacuum take care of the problem?  Granted, if VACUUM is going
> to fail in multi-user mode, and if switching to single-user mode is
> going to make it succeed, then it's a good suggestion.  But it seems
> that it doesn't fail in multi-user mode, unless it tries to truncate
> something, which is a bug we should fix.  Telling people to go to
> single-user mode where they can continue to assign XIDs even though
> they have almost no XIDs left seems extremely dangerous, actually.
>
> Also, I think that old prepared transactions and stale replication
> slots should be emphasized more prominently.  Maybe something like:
>
> HINT:  Commit or roll back old prepared transactions, drop stale
> replication slots, or kill long-running sessions.
> Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.

I agree that a better message would definitely be good and that
recommending single-user isn't a terribly useful thing to do.  I might
have misunderstood it, but it sounded like Andres was proposing a new
function which would basically tell you what's holding back the xid
horizon and that sounds fantastic and would be great to include in this
message, if possible.

As in:

HINT:  Run the function pg_what_is_holding_xmin_back() to identify what
is preventing autovacuum from progressing and address it.

Or some such.

Thanks,

Stephen

Attachment

signature.asc

Re: improving wraparound behavior

From

Andres Freund

Date:

04 May 2019, 02:06:20

Hi,

On 2019-05-03 21:36:24 -0400, Robert Haas wrote:
> On Fri, May 3, 2019 at 8:45 PM Andres Freund <andres@anarazel.de> wrote:
> > Part of my opposition to just disabling it when close to a wraparound,
> > is that it still allows to get close to wraparound because of truncation
> > issues.
> 
> Sure ... it would definitely be better if vacuum didn't consume XIDs
> when it truncates.  On the other hand, only a minority of VACUUM
> operations will truncate, so I don't think there's really a big
> problem in practice here.

I've seen a number of out-of-xid shutdowns precisely because of
truncations. Initially because autovacuum commits suicide because
somebody else wants a conflicting lock, later because there's so much
dead space that people kill (auto)vacuum to get rid of the exclusive
locks.

> > IMO preventing getting closer to wraparound is more important
> > than making it more "comfortable" to be in a wraparound situation.
> 
> I think that's a false dichotomy.  It's impossible to create a
> situation where no user ever gets into a wraparound situation, unless
> we're prepared to do things like automatically drop replication slots
> and automatically roll back (or commit?) prepared transactions.  So,
> while it is good to prevent a user from getting into a wraparound
> situation where we can, it is ALSO good to make it easy to recover
> from those situations as painlessly as possible when they do happen.

Sure, but I've seen a number of real-world cases of xid wraparound
shutdowns related to truncations, and no real world problem due to
truncations assigning an xid.

> > The second problem I see is that even somebody close to a wraparound
> > might have an urgent need to free up some space. So I'm a bit wary of
> > just disabling it.
> 
> I would find that ... really surprising.  If you have < 11 million
> XIDs left before your data gets eaten by a grue, and you file a bug
> report complaining that vacuum won't truncate your tables until you
> catch up on vacuuming a bit, I am prepared to offer you no sympathy at
> all.

I've seen wraparound issues triggered by auto-vacuum generating so much
WAL that the system ran out of space, crash-restart, repeat. And being
unable to reclaim space could make that even harder to tackle.

> > Wonder if there's a reasonable way that'd allow to do the WAL logging
> > for the truncation without using an xid. One way would be to just get
> > rid of the lock on the primary as previously discussed. But we could
> > also drive the locking through the WAL records that do the actual
> > truncation - then there'd not be a need for an xid.  It's probably not a
> > entirely trivial change, but I don't think it'd be too bad?
> 
> Beats me.  For me, this is just a bug, not an excuse to redesign
> vacuum truncation.  Before Hot Standby, when you got into severe
> wraparound trouble, you could vacuum all your tables without consuming
> any XIDs.  Now you can't.  That's bad, and I think we should come up
> with some kind of back-patchable solution to that problem.

I agree we need to do at least a minimal version that can be
backpatched.

I don't think we necessarily need a new WAL record for what I'm
describing above (as XLOG_SMGR_TRUNCATE already carries information
about which forks are truncated, we could just have it acquire the
exclusive lock), and I don't think we'd need a ton of code for eliding
the WAL logged lock either.  Think the issue with backpatching would be
that we can't remove the logged lock, without creating hazards for
standbys running older versions of postgres.

Greetings,

Andres Freund

Re: improving wraparound behavior

From

Stephen Frost

Date:

04 May 2019, 02:11:08

Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> I don't think we necessarily need a new WAL record for what I'm
> describing above (as XLOG_SMGR_TRUNCATE already carries information
> about which forks are truncated, we could just have it acquire the
> exclusive lock), and I don't think we'd need a ton of code for eliding
> the WAL logged lock either.  Think the issue with backpatching would be
> that we can't remove the logged lock, without creating hazards for
> standbys running older versions of postgres.

While it's pretty rare, I don't believe this would be the only case of
"you need to upgrade your replicas before your primary" due to changes
in WAL.  Of course, we need to make sure that we actually figure out
that the WAL being sent is something that the replica doesn't know how
to properly handle because it's from a newer primary; we can't simply do
the wrong thing in that case.

Thanks,

Stephen

Attachment

signature.asc

Re: improving wraparound behavior

From

Andres Freund

Date:

04 May 2019, 02:13:26

Hi,

On 2019-05-03 22:11:08 -0400, Stephen Frost wrote:
> * Andres Freund (andres@anarazel.de) wrote:
> > I don't think we necessarily need a new WAL record for what I'm
> > describing above (as XLOG_SMGR_TRUNCATE already carries information
> > about which forks are truncated, we could just have it acquire the
> > exclusive lock), and I don't think we'd need a ton of code for eliding
> > the WAL logged lock either.  Think the issue with backpatching would be
> > that we can't remove the logged lock, without creating hazards for
> > standbys running older versions of postgres.
> 
> While it's pretty rare, I don't believe this would be the only case of
> "you need to upgrade your replicas before your primary" due to changes
> in WAL.  Of course, we need to make sure that we actually figure out
> that the WAL being sent is something that the replica doesn't know how
> to properly handle because it's from a newer primary; we can't simply do
> the wrong thing in that case.

Don't think this is severe enough to warrant a step like this.  I think
for the back-patch case we could live with
a) move truncation to after the rest of vacuum
b) don't truncate if it'd error out anyway, but log an error

Greetings,

Andres Freund

Re: improving wraparound behavior

From

Andres Freund

Date:

04 May 2019, 02:30:15

Hi,

On 2019-05-03 22:03:18 -0400, Stephen Frost wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
> > I am not sure exactly how to fix this,
> > because the calculation we use to determine the XID that can be used
> > to vacuum a specific table is pretty complex; how can the postmaster
> > know whether it's going to be able to make any progress in *any* table
> > in some database to which it's not even connected?  But it's surely
> > crazy to just keep doing something over and over that can't possibly
> > work.
>
> I definitely agree that it's foolish to keep doing something that isn't
> going to work, and it seems like a pretty large part of the issue here
> is that we don't have enough information to be more intelligent because
> we aren't connected to the database that needs the work to be done.
>
> Now, presuming we're talking about 'new feature work' here to try and
> address this, and not something that we think we can back-patch, I had
> another thought.
>
> I certainly get that having lots of extra processes around can be a
> waste of resources... but I don't recall a lot of people complaining
> about the autovacuum launcher process using up lots of resources
> unnecessairly.
>
> Perhaps we should consider having the 'global' autovacuum launcher, when
> it's decided that a database needs work to be done, launch a 'per-DB'
> launcher which manages launching autovacuum processes for that database?
> If the launcher is still running then it's because there's still work to
> be done on that database and the 'global' autovacuum launcher can skip
> it.  If the 'per-DB' launcher runs out of things to do, and the database
> it's working on is no longer in a danger zone, then it exits.
>
> There are certainly some other variations on this idea and I don't know
> that it's really better than keeping more information in shared
> memory or something else, but it seems like part of the issue is that
> the thing firing off the processes hasn't got enough info to do so
> intelligently and maybe we could fix that by having per-DB launchers
> that are actually connected to a DB.

This sounds a lot more like a wholesale redesign than some small
incremental work.  Which I think we should do, but we probably ought to
do something more minimal before the resources for something like this
are there. Perfect being the enemy of the good, and all that.


> I might have misunderstood it, but it sounded like Andres was
> proposing a new function which would basically tell you what's holding
> back the xid horizon and that sounds fantastic and would be great to
> include in this message, if possible.
>
> As in:
>
> HINT:  Run the function pg_what_is_holding_xmin_back() to identify what
> is preventing autovacuum from progressing and address it.

I was basically thinking of doing *both*, amending the message, *and*
having a new UDF.

Basically, instead of the current:

            char       *oldest_datname = get_database_name(oldest_datoid);

            /* complain even if that DB has disappeared */
            if (oldest_datname)
                ereport(WARNING,
                        (errmsg("database \"%s\" must be vacuumed within %u transactions",
                                oldest_datname,
                                xidWrapLimit - xid),
                         errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
                                 "You might also need to commit or roll back old prepared transactions, or drop stale
replicationslots.")));
 
            else
                ereport(WARNING,
                        (errmsg("database with OID %u must be vacuumed within %u transactions",
                                oldest_datoid,
                                xidWrapLimit - xid),
                         errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
                                 "You might also need to commit or roll back old prepared transactions, or drop stale
replicationslots.")));
 
        }

which is dramatically unhelpful, because often vacuuming won't do squat
(because of old [prepared] transaction, replication connection, or slot).

I'm thinking that we'd do something roughly like (in actual code) for
GetNewTransactionId():

    TransactionId dat_limit = ShmemVariableCache->oldestXid;
    TransactionId slot_limit = Min(replication_slot_xmin, replication_slot_catalog_xmin);
    Transactionid walsender_limit;
    Transactionid prepared_xact_limit;
    Transactionid backend_limit;

    ComputeOldestXminFromProcarray(&walsender_limit, &prepared_xact_limit, &backend_limit);

    if (IsOldest(dat_limit))
       ereport(elevel,
               errmsg("close to xid wraparound, held back by database %s"),
               errdetail("current xid %u, horizon for database %u, shutting down at %u"),
               errhint("..."));
    else if (IsOldest(slot_limit))
      ereport(elevel, errmsg("close to xid wraparound, held back by replication slot %s"),
              ...);

where IsOldest wouldn't actually compare plainly numerically, but would
actually prefer showing the slot, backend, walsender, prepared_xact, as
long as they are pretty close to the dat_limit - as in those cases
vacuuming wouldn't actually solve the issue, unless the other problems
are addressed first (as autovacuum won't compute a cutoff horizon that's
newer than any of those).

and for the function I was thinking of an SRF that'd return roughly rows
like:
* "database horizon", xid, age(xid), database oid, database name
* "slot horizon", xid, age(xid), NULL, slot name
* "backend horizon", xid, age(xid), backend pid, query string?
* "walsender horizon", xid, age(xid), backend pid, connection string?
* "anti-wraparound-vacuums-start", xid, NULL, NULL
* "current xid", xid, NULL, NULL
* "xid warn limit", xid, NULL, NULL
* "xid shutdown limit", xid, NULL, NULL
* "xid wraparound limit", xid, NULL, NULL

Not sure if an SRF is really the best approach, but it seems like it'd
be the easist way to show a good overview of the problems.  I don't know
how many times this'd have prevented support escalations, but it'd be
many.

Greetings,

Andres Freund

Re: improving wraparound behavior

From

Stephen Frost

Date:

04 May 2019, 02:41:11

Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2019-05-03 22:03:18 -0400, Stephen Frost wrote:
> > * Robert Haas (robertmhaas@gmail.com) wrote:
> > > I am not sure exactly how to fix this,
> > > because the calculation we use to determine the XID that can be used
> > > to vacuum a specific table is pretty complex; how can the postmaster
> > > know whether it's going to be able to make any progress in *any* table
> > > in some database to which it's not even connected?  But it's surely
> > > crazy to just keep doing something over and over that can't possibly
> > > work.
> >
> > I definitely agree that it's foolish to keep doing something that isn't
> > going to work, and it seems like a pretty large part of the issue here
> > is that we don't have enough information to be more intelligent because
> > we aren't connected to the database that needs the work to be done.
> >
> > Now, presuming we're talking about 'new feature work' here to try and
> > address this, and not something that we think we can back-patch, I had
> > another thought.
> >
> > I certainly get that having lots of extra processes around can be a
> > waste of resources... but I don't recall a lot of people complaining
> > about the autovacuum launcher process using up lots of resources
> > unnecessairly.
> >
> > Perhaps we should consider having the 'global' autovacuum launcher, when
> > it's decided that a database needs work to be done, launch a 'per-DB'
> > launcher which manages launching autovacuum processes for that database?
> > If the launcher is still running then it's because there's still work to
> > be done on that database and the 'global' autovacuum launcher can skip
> > it.  If the 'per-DB' launcher runs out of things to do, and the database
> > it's working on is no longer in a danger zone, then it exits.
> >
> > There are certainly some other variations on this idea and I don't know
> > that it's really better than keeping more information in shared
> > memory or something else, but it seems like part of the issue is that
> > the thing firing off the processes hasn't got enough info to do so
> > intelligently and maybe we could fix that by having per-DB launchers
> > that are actually connected to a DB.
>
> This sounds a lot more like a wholesale redesign than some small
> incremental work.  Which I think we should do, but we probably ought to
> do something more minimal before the resources for something like this
> are there. Perfect being the enemy of the good, and all that.

I suppose it is a pretty big change in the base autovacuum launcher to
be something that's run per database instead and then deal with the
coordination between the two...  but I can't help but feel like it
wouldn't be that much *work*.  I'm not against doing something smaller
but was something smaller actually proposed for this specific issue..?

> > I might have misunderstood it, but it sounded like Andres was
> > proposing a new function which would basically tell you what's holding
> > back the xid horizon and that sounds fantastic and would be great to
> > include in this message, if possible.
> >
> > As in:
> >
> > HINT:  Run the function pg_what_is_holding_xmin_back() to identify what
> > is preventing autovacuum from progressing and address it.
>
> I was basically thinking of doing *both*, amending the message, *and*
> having a new UDF.
>
> Basically, instead of the current:
>
>             char       *oldest_datname = get_database_name(oldest_datoid);
>
>             /* complain even if that DB has disappeared */
>             if (oldest_datname)
>                 ereport(WARNING,
>                         (errmsg("database \"%s\" must be vacuumed within %u transactions",
>                                 oldest_datname,
>                                 xidWrapLimit - xid),
>                          errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
>                                  "You might also need to commit or roll back old prepared transactions, or drop stale
replicationslots."))); 
>             else
>                 ereport(WARNING,
>                         (errmsg("database with OID %u must be vacuumed within %u transactions",
>                                 oldest_datoid,
>                                 xidWrapLimit - xid),
>                          errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
>                                  "You might also need to commit or roll back old prepared transactions, or drop stale
replicationslots."))); 
>         }
>
> which is dramatically unhelpful, because often vacuuming won't do squat
> (because of old [prepared] transaction, replication connection, or slot).
>
> I'm thinking that we'd do something roughly like (in actual code) for
> GetNewTransactionId():
>
>     TransactionId dat_limit = ShmemVariableCache->oldestXid;
>     TransactionId slot_limit = Min(replication_slot_xmin, replication_slot_catalog_xmin);
>     Transactionid walsender_limit;
>     Transactionid prepared_xact_limit;
>     Transactionid backend_limit;
>
>     ComputeOldestXminFromProcarray(&walsender_limit, &prepared_xact_limit, &backend_limit);
>
>     if (IsOldest(dat_limit))
>        ereport(elevel,
>                errmsg("close to xid wraparound, held back by database %s"),
>                errdetail("current xid %u, horizon for database %u, shutting down at %u"),
>                errhint("..."));
>     else if (IsOldest(slot_limit))
>       ereport(elevel, errmsg("close to xid wraparound, held back by replication slot %s"),
>               ...);
>
> where IsOldest wouldn't actually compare plainly numerically, but would
> actually prefer showing the slot, backend, walsender, prepared_xact, as
> long as they are pretty close to the dat_limit - as in those cases
> vacuuming wouldn't actually solve the issue, unless the other problems
> are addressed first (as autovacuum won't compute a cutoff horizon that's
> newer than any of those).

Where the errhint() above includes a recommendation to run the SRF
described below, I take it?

Also, should this really be an 'else if', or should it be just a set of
'if()'s, thereby giving users more info right up-front?  If there's one
thing I quite dislike, it's cases where you're playing whack-a-mole over
and over because the system says "X is bad!" and when you fix X, it just
turns around and says "Y is bad!", even though it had all the info it
needed to say "X and Y are bad!" right up-front.

> and for the function I was thinking of an SRF that'd return roughly rows
> like:
> * "database horizon", xid, age(xid), database oid, database name
> * "slot horizon", xid, age(xid), NULL, slot name
> * "backend horizon", xid, age(xid), backend pid, query string?
> * "walsender horizon", xid, age(xid), backend pid, connection string?
> * "anti-wraparound-vacuums-start", xid, NULL, NULL
> * "current xid", xid, NULL, NULL
> * "xid warn limit", xid, NULL, NULL
> * "xid shutdown limit", xid, NULL, NULL
> * "xid wraparound limit", xid, NULL, NULL
>
> Not sure if an SRF is really the best approach, but it seems like it'd
> be the easist way to show a good overview of the problems.  I don't know
> how many times this'd have prevented support escalations, but it'd be
> many.

Yeah, I'm not sure if an SRF makes the most sense here...  but I'm also
not sure that it's a bad idea either.  Once it's written, it'll be a lot
easier to critique the specific interface. ;)

Thanks!

Stephen

Attachment

signature.asc

Re: improving wraparound behavior

From

Andres Freund

Date:

04 May 2019, 02:47:42

Hi,

On 2019-05-03 22:41:11 -0400, Stephen Frost wrote:
> I suppose it is a pretty big change in the base autovacuum launcher to
> be something that's run per database instead and then deal with the
> coordination between the two...  but I can't help but feel like it
> wouldn't be that much *work*.  I'm not against doing something smaller
> but was something smaller actually proposed for this specific issue..?

I think it'd be fairly significant. And that we should redo it from
scratch if we go there - because what we have isn't worth using as a
basis.


> > I'm thinking that we'd do something roughly like (in actual code) for
> > GetNewTransactionId():
> > 
> >     TransactionId dat_limit = ShmemVariableCache->oldestXid;
> >     TransactionId slot_limit = Min(replication_slot_xmin, replication_slot_catalog_xmin);
> >     Transactionid walsender_limit;
> >     Transactionid prepared_xact_limit;
> >     Transactionid backend_limit;
> > 
> >     ComputeOldestXminFromProcarray(&walsender_limit, &prepared_xact_limit, &backend_limit);
> > 
> >     if (IsOldest(dat_limit))
> >        ereport(elevel,
> >                errmsg("close to xid wraparound, held back by database %s"),
> >                errdetail("current xid %u, horizon for database %u, shutting down at %u"),
> >                errhint("..."));
> >     else if (IsOldest(slot_limit))
> >       ereport(elevel, errmsg("close to xid wraparound, held back by replication slot %s"),
> >               ...);
> > 
> > where IsOldest wouldn't actually compare plainly numerically, but would
> > actually prefer showing the slot, backend, walsender, prepared_xact, as
> > long as they are pretty close to the dat_limit - as in those cases
> > vacuuming wouldn't actually solve the issue, unless the other problems
> > are addressed first (as autovacuum won't compute a cutoff horizon that's
> > newer than any of those).
> 
> Where the errhint() above includes a recommendation to run the SRF
> described below, I take it?

Not necessarily. I feel conciseness is important too, and this would be
the most imporant thing to tackle.


> Also, should this really be an 'else if', or should it be just a set of
> 'if()'s, thereby giving users more info right up-front?

Possibly? But it'd also make it even harder to read the log / the system
to keep up with logging, because we already log *so* much when close to
wraparound.

If we didn't order it, it'd be hard for users to figure out which to
address first. If we ordered it, people have to further up in the log to
figure out which is the most urgent one (unless we reverse the order,
which is odd too).


Greetings,

Andres Freund

Re: improving wraparound behavior

From

Stephen Frost

Date:

04 May 2019, 03:08:44

Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2019-05-03 22:41:11 -0400, Stephen Frost wrote:
> > I suppose it is a pretty big change in the base autovacuum launcher to
> > be something that's run per database instead and then deal with the
> > coordination between the two...  but I can't help but feel like it
> > wouldn't be that much *work*.  I'm not against doing something smaller
> > but was something smaller actually proposed for this specific issue..?
>
> I think it'd be fairly significant. And that we should redo it from
> scratch if we go there - because what we have isn't worth using as a
> basis.

Alright, what I'm hearing here is that we should probably have a
dedicated thread for this discussion, if someone has the cycles to spend
on it.  I'm not against that.

> > > I'm thinking that we'd do something roughly like (in actual code) for
> > > GetNewTransactionId():
> > >
> > >     TransactionId dat_limit = ShmemVariableCache->oldestXid;
> > >     TransactionId slot_limit = Min(replication_slot_xmin, replication_slot_catalog_xmin);
> > >     Transactionid walsender_limit;
> > >     Transactionid prepared_xact_limit;
> > >     Transactionid backend_limit;
> > >
> > >     ComputeOldestXminFromProcarray(&walsender_limit, &prepared_xact_limit, &backend_limit);
> > >
> > >     if (IsOldest(dat_limit))
> > >        ereport(elevel,
> > >                errmsg("close to xid wraparound, held back by database %s"),
> > >                errdetail("current xid %u, horizon for database %u, shutting down at %u"),
> > >                errhint("..."));
> > >     else if (IsOldest(slot_limit))
> > >       ereport(elevel, errmsg("close to xid wraparound, held back by replication slot %s"),
> > >               ...);
> > >
> > > where IsOldest wouldn't actually compare plainly numerically, but would
> > > actually prefer showing the slot, backend, walsender, prepared_xact, as
> > > long as they are pretty close to the dat_limit - as in those cases
> > > vacuuming wouldn't actually solve the issue, unless the other problems
> > > are addressed first (as autovacuum won't compute a cutoff horizon that's
> > > newer than any of those).
> >
> > Where the errhint() above includes a recommendation to run the SRF
> > described below, I take it?
>
> Not necessarily. I feel conciseness is important too, and this would be
> the most imporant thing to tackle.

I'm imagining a relatively rare scenario, just to be clear, where
"pretty close to the dat_limit" would apply to more than just one thing.

> > Also, should this really be an 'else if', or should it be just a set of
> > 'if()'s, thereby giving users more info right up-front?
>
> Possibly? But it'd also make it even harder to read the log / the system
> to keep up with logging, because we already log *so* much when close to
> wraparound.

Yes, we definitely log a *lot*, and probably too much since other
critical messages might get lost in the noise.

> If we didn't order it, it'd be hard for users to figure out which to
> address first. If we ordered it, people have to further up in the log to
> figure out which is the most urgent one (unless we reverse the order,
> which is odd too).

This makes me think we should both order it and combine it into one
message... but that'd then be pretty difficult to deal with,
potentially, from a translation standpoint and just from a "wow, that's
a huge log message", which is kind of the idea behind the SRF- to give
you all that info in a more easily digestible manner.

Not sure I've got any great ideas on how to improve on this.  I do think
that if we know that there's multiple different things that are within a
small number of xids of the oldest xmin then we should notify the user
about all of them, either directly in the error messages or by referring
them to the SRF, so they have the opportunity to address them all, or
at least know about them all.  As mentioned though, it's likely to be a
quite rare thing to run into, so you'd have to be extra unlucky to even
hit this case and perhaps the extra code complication just isn't worth
it.

Thanks,

Stephen

Attachment

signature.asc

Re: improving wraparound behavior

From

Euler Taveira

Date:

04 May 2019, 03:11:32

Em sex, 3 de mai de 2019 às 17:27, Robert Haas <robertmhaas@gmail.com> escreveu:
>
> HINT:  Commit or roll back old prepared transactions, drop stale
> replication slots, or kill long-running sessions.
> Ensure that autovacuum is progressing, or run a manual database-wide VACUUM.
>
First of all, +1 for this patch. But after reading this email, I
couldn't resist to expose an idea about stale XID horizon. [A cup of
tea...] I realized that there is no automatic way to recover from old
prepared transactions, stale replication slots or even long-running
sessions when we have a wraparound situation. Isn't it the case to add
a parameter that recover from stale XID horizon? I mean if we reach a
critical situation (xidStopLimit), free resource that is preventing
the XID advance (and hope autovacuum have some time to prevent a
stop-assigning-xids situation). The situation is analogous to OOM
killer that it kills some processes that are starving resources.


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: improving wraparound behavior

From

Andres Freund

Date:

06 May 2019, 07:06:00

Hi,

On 2019-05-03 23:08:44 -0400, Stephen Frost wrote:
> This makes me think we should both order it and combine it into one
> message... but that'd then be pretty difficult to deal with,
> potentially, from a translation standpoint and just from a "wow, that's
> a huge log message", which is kind of the idea behind the SRF- to give
> you all that info in a more easily digestible manner.
> 
> Not sure I've got any great ideas on how to improve on this.  I do think
> that if we know that there's multiple different things that are within a
> small number of xids of the oldest xmin then we should notify the user
> about all of them, either directly in the error messages or by referring
> them to the SRF, so they have the opportunity to address them all, or
> at least know about them all.  As mentioned though, it's likely to be a
> quite rare thing to run into, so you'd have to be extra unlucky to even
> hit this case and perhaps the extra code complication just isn't worth
> it.

I think just having an actual reason for the problem would be so much
better than the current status, that I'd tackle the "oops, just about
everything is screwed, here's the reasons in order" case separately (or
just plain never).

Greetings,

Andres Freund

Re: improving wraparound behavior

From

Stephen Frost

Date:

06 May 2019, 11:09:55

Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2019-05-03 23:08:44 -0400, Stephen Frost wrote:
> > As mentioned though, it's likely to be a
> > quite rare thing to run into, so you'd have to be extra unlucky to even
> > hit this case and perhaps the extra code complication just isn't worth
> > it.
>
> I think just having an actual reason for the problem would be so much
> better than the current status, that I'd tackle the "oops, just about
> everything is screwed, here's the reasons in order" case separately (or
> just plain never).

I certainly agree with that, but if we really think it's so rare that we
don't feel that it's worth worrying about then I'd say we should remove
the 'else' in the 'else if', as I initially suggested, since the chances
of users getting more than one is quite rare and more than two would
be.. impressive, which negates the concern you raised earlier about
there being a bunch of these messages that make it hard for users to
reason about what is the most critical.  Removing the 'else' would make
it strictly less code (albeit not much, but it certainly isn't adding to
the code) and would be more informative for the user who does end up
having two cases happen at (or nearly) the same time.

Thanks,

Stephen

Attachment

signature.asc