Thread: BUG #13002: VACUUM to prevent wraparound blocks TRUNCATE

BUG #13002: VACUUM to prevent wraparound blocks TRUNCATE

From
maciek@heroku.com
Date:
The following bug has been logged on the website:

Bug reference:      13002
Logged by:          Maciek Sakrejda
Email address:      maciek@heroku.com
PostgreSQL version: 9.3.6
Operating system:   Ubuntu 14.04
Description:

If autovacuum is running a VACUUM to prevent wraparound on a certain table,
it looks like it blocks TRUNCATE of that table, which would obviate the need
for the VACUUM in the first place (and could happen much more quickly). This
seems like a usability wart at best--one has to instead kill the autovacuum
(e.g., via pg_cancel_backend) and then TRUNCATE. This is especially
inconvenient when the user trying to truncate does not have permission to
kill (or even see!) the wraparound VACUUM. Would it be possible to have
autovacuum yield to a TRUNCATE in this situation?

Re: BUG #13002: VACUUM to prevent wraparound blocks TRUNCATE

From
Bruce Momjian
Date:
On Wed, Apr  8, 2015 at 06:41:55PM +0000, maciek@heroku.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      13002
> Logged by:          Maciek Sakrejda
> Email address:      maciek@heroku.com
> PostgreSQL version: 9.3.6
> Operating system:   Ubuntu 14.04
> Description:
>
> If autovacuum is running a VACUUM to prevent wraparound on a certain table,
> it looks like it blocks TRUNCATE of that table, which would obviate the need
> for the VACUUM in the first place (and could happen much more quickly). This
> seems like a usability wart at best--one has to instead kill the autovacuum
> (e.g., via pg_cancel_backend) and then TRUNCATE. This is especially
> inconvenient when the user trying to truncate does not have permission to
> kill (or even see!) the wraparound VACUUM. Would it be possible to have
> autovacuum yield to a TRUNCATE in this situation?

My guess is this would be hard to do.  VACUUM and TRUNCATE request
conflicting locks, so you would have to embed in the locking code that
the _outcome_ of one should allow it to override the lock of the other.
I can't think of anywhere that we allow that.  Frankly, you can't really
just ignore the lock --- you would have to cancel the vacuum by sending
a signal or something.  I know we have cases where vacuum aborts if it
can't get a lock or something, but I can't remembrer any case where we
externally cancel vacuum like that.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Re: BUG #13002: VACUUM to prevent wraparound blocks TRUNCATE

From
Alvaro Herrera
Date:
maciek@heroku.com wrote:

> If autovacuum is running a VACUUM to prevent wraparound on a certain table,
> it looks like it blocks TRUNCATE of that table, which would obviate the need
> for the VACUUM in the first place (and could happen much more quickly). This
> seems like a usability wart at best--one has to instead kill the autovacuum
> (e.g., via pg_cancel_backend) and then TRUNCATE.

If you had truncated the table before the issue became an emergency,
vacuum would have gave way to truncate.  But when autovacuum sees that
the problem is too serious to ignore, it doesn't give way to anything;
if it did, it would fall prey of the same problem it's trying to avoid.

> This is especially inconvenient when the user trying to truncate does
> not have permission to kill (or even see!) the wraparound VACUUM.
> Would it be possible to have autovacuum yield to a TRUNCATE in this
> situation?

Sounds like we would have to special-case truncate in the deadlock
checker or something like that.  Doesn't seem particularly palatable.

The way this works is that deadlock checker returns "you're blocked but
the culprit is autovacuum" when not in emergency mode; then the other
process sends a signal to autovac which commits suicide.  To fix this
case we would have to add another special code "you're blocked but the
culprit is an emergency vacuum", then the other process sees itself as
truncate then sends signal anyway.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #13002: VACUUM to prevent wraparound blocks TRUNCATE

From
"David G. Johnston"
Date:
On Wed, Apr 8, 2015 at 12:21 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

> maciek@heroku.com wrote:
>
> > If autovacuum is running a VACUUM to prevent wraparound on a certain
> table,
> > it looks like it blocks TRUNCATE of that table, which would obviate the
> need
> > for the VACUUM in the first place (and could happen much more quickly).
> This
> > seems like a usability wart at best--one has to instead kill the
> autovacuum
> > (e.g., via pg_cancel_backend) and then TRUNCATE.
>
> If you had truncated the table before the issue became an emergency,
> vacuum would have gave way to truncate.  But when autovacuum sees that
> the problem is too serious to ignore, it doesn't give way to anything;
> if it did, it would fall prey of the same problem it's trying to avoid.
>
> > This is especially inconvenient when the user trying to truncate does
> > not have permission to kill (or even see!) the wraparound VACUUM.
> > Would it be possible to have autovacuum yield to a TRUNCATE in this
> > situation?
>
> Sounds like we would have to special-case truncate in the deadlock
> checker or something like that.  Doesn't seem particularly palatable.
>
> The way this works is that deadlock checker returns "you're blocked but
> the culprit is autovacuum" when not in emergency mode; then the other
> process sends a signal to autovac which commits suicide.  To fix this
> case we would have to add another special code "you're blocked but the
> culprit is an emergency vacuum", then the other process sees itself as
> truncate then sends signal anyway.
>

=E2=80=8BRelated question: does a truncated table effectively reset its wra=
paround
counter back to zero?

Put another way: what is the use case we are trying=E2=80=8B

=E2=80=8Bto support here?

If the table in question is being truncated regularly it doesn't seem like
this situation should arise.  If it is not a regularly truncated table the
possibility of a random truncation being run simultaneously with a
wraparound VACUUM seems quite small.  If the truncate is being run
administratively then whomever is running that should at least be able to
see the wraparound vacuum: whether they can get superuser rights and do
something about it is another matter.  Even then, supposing that TRUNCATE
is a viable means to speed up a wraparound vacuum seems a bit of a stretch
in the general case.  Since pg_cancel_backend is usable in this situation
putting code into core to handle this scenario does not seem like a good
idea.

David J.

Re: BUG #13002: VACUUM to prevent wraparound blocks TRUNCATE

From
Alvaro Herrera
Date:
David G. Johnston wrote:

> ​Related question: does a truncated table effectively reset its wraparound
> counter back to zero?

Yes, the relfrozenxid is set to RecentXmin (see ExecuteTruncate).  Not
quite zero age but unless you have really old transactions around,
should be close enough.

> If the table in question is being truncated regularly it doesn't seem like
> this situation should arise.

Correct.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #13002: VACUUM to prevent wraparound blocks TRUNCATE

From
Jeff Janes
Date:
On Wed, Apr 8, 2015 at 12:21 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

> maciek@heroku.com wrote:
>
> > If autovacuum is running a VACUUM to prevent wraparound on a certain
> table,
> > it looks like it blocks TRUNCATE of that table, which would obviate the
> need
> > for the VACUUM in the first place (and could happen much more quickly).
> This
> > seems like a usability wart at best--one has to instead kill the
> autovacuum
> > (e.g., via pg_cancel_backend) and then TRUNCATE.
>
> If you had truncated the table before the issue became an emergency,
> vacuum would have gave way to truncate.



It is pretty hard to argue that the passage of 200 million transactions
constitutes some kind of emergency.



> But when autovacuum sees that
> the problem is too serious to ignore, it doesn't give way to anything;
> if it did, it would fall prey of the same problem it's trying to avoid.
>

It doesn't give way to anything, except "autovacuum_vacuum_cost_delay".

I think we need to get off of the idea that is somehow an emergency for a
database system to have survived 200 million transactions.  Or at least, we
need to provide a non-emergency way to do necessary routine maintenance.
 (In this case since a truncation is desired in the first place, you could
consider that we do provide a way even if it is only available in
hindsight, but in general we do not.)

Right now we declare it an emergency that some guy needs to change his oil,
shutdown half the traffic in the city during the "emergency", declare that
all the emergency vehicles responding to this emergency are only allowed to
move at 1/3 the speed limit, and make it illegal (or at least ineffective)
for anyone to change their oil calmly and preemptively.



> > This is especially inconvenient when the user trying to truncate does
> > not have permission to kill (or even see!) the wraparound VACUUM.
> > Would it be possible to have autovacuum yield to a TRUNCATE in this
> > situation?
>
> Sounds like we would have to special-case truncate in the deadlock
> checker or something like that.  Doesn't seem particularly palatable.
>

Yeah, it would be a messy case to implement, but if there were a general
way to know the locker's intentions there are other places this would come
in handy as well.

Cheers,

Jeff

Re: BUG #13002: VACUUM to prevent wraparound blocks TRUNCATE

From
Alvaro Herrera
Date:
Jeff Janes wrote:
> On Wed, Apr 8, 2015 at 12:21 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
>
> > maciek@heroku.com wrote:
> >
> > > If autovacuum is running a VACUUM to prevent wraparound on a certain
> > table,
> > > it looks like it blocks TRUNCATE of that table, which would obviate the
> > need
> > > for the VACUUM in the first place (and could happen much more quickly).
> > This
> > > seems like a usability wart at best--one has to instead kill the
> > autovacuum
> > > (e.g., via pg_cancel_backend) and then TRUNCATE.
> >
> > If you had truncated the table before the issue became an emergency,
> > vacuum would have gave way to truncate.
>
> It is pretty hard to argue that the passage of 200 million transactions
> constitutes some kind of emergency.

Well, admittedly 200 million is not a lot of transactions, but the fact
remains that it's the configured max freeze age which is what causes the
whole problem to start with.  If you raise the limit to 2 billion, the
same thing will happen, only it will be ten times less frequent.  We
already have a lower limit to freeze tuples; if tables are scanned in
whole every 150 million transactions, as autovacuum would normally do,
you wouldn't see an effective shutdown at 200 million transactions.
Those 50 million xacts should be enough breathing room, shouldn't they.

The issue only arises if you're constantly causing autovac workers to
terminate because of DDL or some other processing that locks the table
away from vacuum.

> > But when autovacuum sees that
> > the problem is too serious to ignore, it doesn't give way to anything;
> > if it did, it would fall prey of the same problem it's trying to avoid.
>
> It doesn't give way to anything, except "autovacuum_vacuum_cost_delay".

That in itself might be a mistake.  Perhaps it makes sense to have
emergency vacuums ignore the cost delay, but for that we need to ensure
that emergency vacuums are rare events.

> I think we need to get off of the idea that is somehow an emergency for a
> database system to have survived 200 million transactions.  Or at least, we
> need to provide a non-emergency way to do necessary routine maintenance.
>  (In this case since a truncation is desired in the first place, you could
> consider that we do provide a way even if it is only available in
> hindsight, but in general we do not.)
>
> Right now we declare it an emergency that some guy needs to change his oil,
> shutdown half the traffic in the city during the "emergency", declare that
> all the emergency vehicles responding to this emergency are only allowed to
> move at 1/3 the speed limit, and make it illegal (or at least ineffective)
> for anyone to change their oil calmly and preemptively.

Hilarious.

But the analogy needs a bit more elaboration in order to explain the
current situation.  The guy wants to change oil at 150 million
transactions, and this would cause no trouble, but every time he pulls
over to do so, there's a traffic cop who prevents him from doing so and
tells him to get back on the street and move on.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #13002: VACUUM to prevent wraparound blocks TRUNCATE

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> Right now we declare it an emergency that some guy needs to change his oil,
> shutdown half the traffic in the city during the "emergency", declare that
> all the emergency vehicles responding to this emergency are only allowed to
> move at 1/3 the speed limit, and make it illegal (or at least ineffective)
> for anyone to change their oil calmly and preemptively.

We could do without hyperbolic and wholly misleading analogies.  (In this
one, for example, you're omitting that the guy causing the problem is a
couple years overdue to have changed his oil, and will blow a gasket and
block the entire highway if he drives further.)

Getting back to the case at hand, I don't think that making an exception
for TRUNCATE would be a good idea even if there were no technical
obstacles.  It's perfectly possible for a TRUNCATE to be rolled back; so
if the TRUNCATE were to kick an emergency autovac off the table lock and
then fail, you would end up with no improvement in the situation at all.

            regards, tom lane

Re: BUG #13002: VACUUM to prevent wraparound blocks TRUNCATE

From
Jeff Janes
Date:
On Wed, Apr 8, 2015 at 1:09 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

> Jeff Janes wrote:
> > On Wed, Apr 8, 2015 at 12:21 PM, Alvaro Herrera <
> alvherre@2ndquadrant.com>
> > wrote:
> >
> > > maciek@heroku.com wrote:
> > >
> > > > If autovacuum is running a VACUUM to prevent wraparound on a certain
> > > table,
> > > > it looks like it blocks TRUNCATE of that table, which would obviate
> the
> > > need
> > > > for the VACUUM in the first place (and could happen much more
> quickly).
> > > This
> > > > seems like a usability wart at best--one has to instead kill the
> > > autovacuum
> > > > (e.g., via pg_cancel_backend) and then TRUNCATE.
> > >
> > > If you had truncated the table before the issue became an emergency,
> > > vacuum would have gave way to truncate.
> >
> > It is pretty hard to argue that the passage of 200 million transactions
> > constitutes some kind of emergency.
>
> Well, admittedly 200 million is not a lot of transactions, but the fact
> remains that it's the configured max freeze age which is what causes the
> whole problem to start with.  If you raise the limit to 2 billion, the
> same thing will happen, only it will be ten times less frequent.


Right, and this is my gripe.  If it were increased to 2 billion, then it
expiring actually would be an emergency.  At its default value, it is not
an emergency, but we act as if it were, and we provide no way to do
otherwise.  The choices are between suffering true emergencies at 2 billion
transactions, or fake emergencies at 200 million (or, as you point out, 150
million).  Of course I can pick some value other than 2 billion and 200
million, but there is no good time to have a fake emergency.  Or a real one.


> We
> already have a lower limit to freeze tuples; if tables are scanned in
> whole every 150 million transactions, as autovacuum would normally do,
> you wouldn't see an effective shutdown at 200 million transactions.
> Those 50 million xacts should be enough breathing room, shouldn't they.
>

But that, too, is an "emergency".  As far as I can tell, there is no way to
increase the frozen ID without first declaring an emergency.  If an
ordinary autovac happens to touch every page, it still doesn't increase
frozen ID.

Cheers,

Jeff