Thread: How about a option to disable autovacuum cancellation on lock conflict?

How about a option to disable autovacuum cancellation on lock conflict?

From
Andres Freund
Date:
Hi,

I've more than once seen that autovacuums on certain tables never
succeed because regular exclusive (or similar) lockers cause it to give
way/up before finishing.  Usually if some part of the application uses
explicit exclusive locks.

In general I think it's quite imortant that autovacuum bheaves that
way. But I think it might be worhtwile to offer an option to disable
that behaviour. If some piece of application logic requires exclusive
locks and that leads to complete starvation of autovacuum, there's
really nothing that can be done but to manually schedule vacuums right
now.

I can see two possible solutions:

1) Add a reloption that allows to configure whether autovacuum gives way  to locks acquired by user backends.
2) Add a second set of autovacuum_*_scale_factor variables that governs  a threshhold after which autovacuum doesn't
getcancelled anymore.
 

Opinions?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: How about a option to disable autovacuum cancellation on lock conflict?

From
Magnus Hagander
Date:
<p dir="ltr"><br /> On Nov 29, 2014 9:23 AM, "Andres Freund" <<a
href="mailto:andres@2ndquadrant.com">andres@2ndquadrant.com</a>>wrote:<br /> ><br /> > Hi,<br /> ><br />
>I've more than once seen that autovacuums on certain tables never<br /> > succeed because regular exclusive (or
similar)lockers cause it to give<br /> > way/up before finishing.  Usually if some part of the application uses<br
/>> explicit exclusive locks.<br /> ><br /> > In general I think it's quite imortant that autovacuum bheaves
that<br/> > way. But I think it might be worhtwile to offer an option to disable<br /> > that behaviour. If some
pieceof application logic requires exclusive<br /> > locks and that leads to complete starvation of autovacuum,
there's<br/> > really nothing that can be done but to manually schedule vacuums right<br /> > now.<br /> ><br
/>> I can see two possible solutions:<br /> ><br /> > 1) Add a reloption that allows to configure whether
autovacuumgives way<br /> >    to locks acquired by user backends.<br /> > 2) Add a second set of
autovacuum_*_scale_factorvariables that governs<br /> >    a threshhold after which autovacuum doesn't get cancelled
anymore.<br/> ><br /> > Opinions?<p dir="ltr">I definitely think having such a tunable would be very useful, in
edgecases (so as you say the default should definitely be what it is today). <p dir="ltr">Another "trigger option"
couldbe to say "you may terminate autovaccum this many times before forcing one through", rather than triggers on tuple
count.But tuples is probably a better choice, as it gives more dynamics - unless we want to do both.<p
dir="ltr">/Magnus<br /> 

Re: How about a option to disable autovacuum cancellation on lock conflict?

From
Jim Nasby
Date:
On 11/29/14, 2:22 AM, Andres Freund wrote:
> Hi,
>
> I've more than once seen that autovacuums on certain tables never
> succeed because regular exclusive (or similar) lockers cause it to give
> way/up before finishing.  Usually if some part of the application uses
> explicit exclusive locks.
>
> In general I think it's quite imortant that autovacuum bheaves that
> way. But I think it might be worhtwile to offer an option to disable
> that behaviour. If some piece of application logic requires exclusive
> locks and that leads to complete starvation of autovacuum, there's
> really nothing that can be done but to manually schedule vacuums right
> now.
>
> I can see two possible solutions:
>
> 1) Add a reloption that allows to configure whether autovacuum gives way
>     to locks acquired by user backends.
> 2) Add a second set of autovacuum_*_scale_factor variables that governs
>     a threshhold after which autovacuum doesn't get cancelled anymore.
>
> Opinions?

What do you mean by "never succeed"? Is it skipping a large number of pages? Might re-trying the locks within the same
vacuumhelp, or are the user locks too persistent?
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: How about a option to disable autovacuum cancellation on lock conflict?

From
Robert Haas
Date:
On Sat, Nov 29, 2014 at 11:46 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> What do you mean by "never succeed"? Is it skipping a large number of pages?
> Might re-trying the locks within the same vacuum help, or are the user locks
> too persistent?

You are confused.  He's talking about the relation-level lock that
vacuum attempts to take before doing any work at all on a given table,
not the per-page cleanup locks that it takes while processing each
page.  If the relation-level lock can't be acquired, the whole table
is skipped.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: How about a option to disable autovacuum cancellation on lock conflict?

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Sat, Nov 29, 2014 at 11:46 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> > What do you mean by "never succeed"? Is it skipping a large number of pages?
> > Might re-trying the locks within the same vacuum help, or are the user locks
> > too persistent?
> 
> You are confused.  He's talking about the relation-level lock that
> vacuum attempts to take before doing any work at all on a given table,
> not the per-page cleanup locks that it takes while processing each
> page.  If the relation-level lock can't be acquired, the whole table
> is skipped.

Almost there.  Autovacuum takes the relation-level lock, starts
processing.  Some time later, another process wants a lock that
conflicts with the one autovacuum has.  This is flagged by the deadlock
detector, and a signal is sent to autovacuum, which commits suicide.

If the table is large, the time window for this to happen is large also;
there might never be a time window large enough between two lock
acquisitions for one autovacuum run to complete in a table.  This
starves the table from vacuuming completely, until things are bad enough
that an emergency vacuum is forced.  By then, the bloat is disastrous.

I think it's that suicide that Andres wants to disable.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: How about a option to disable autovacuum cancellation on lock conflict?

From
Josh Berkus
Date:
On 12/02/2014 10:35 AM, Alvaro Herrera wrote:
> If the table is large, the time window for this to happen is large also;
> there might never be a time window large enough between two lock
> acquisitions for one autovacuum run to complete in a table.  This
> starves the table from vacuuming completely, until things are bad enough
> that an emergency vacuum is forced.  By then, the bloat is disastrous.
> 
> I think it's that suicide that Andres wants to disable.

A much better solution for this ... and one which would solve a *lot* of
other issues with vacuum and autovacuum ... would be to give vacuum a
way to track which blocks an incomplete vacuum had already visited.
This would be even more valuable for freeze.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: How about a option to disable autovacuum cancellation on lock conflict?

From
Andres Freund
Date:
On 2014-12-02 11:02:07 -0800, Josh Berkus wrote:
> On 12/02/2014 10:35 AM, Alvaro Herrera wrote:
> > If the table is large, the time window for this to happen is large also;
> > there might never be a time window large enough between two lock
> > acquisitions for one autovacuum run to complete in a table.  This
> > starves the table from vacuuming completely, until things are bad enough
> > that an emergency vacuum is forced.  By then, the bloat is disastrous.
> > 
> > I think it's that suicide that Andres wants to disable.

Correct.

> A much better solution for this ... and one which would solve a *lot* of
> other issues with vacuum and autovacuum ... would be to give vacuum a
> way to track which blocks an incomplete vacuum had already visited.
> This would be even more valuable for freeze.

That's pretty much a different problem. Yes, some more persistent would
be helpful - although it'd need to be *much* more than which pages it
has visited - but you'd still be vulnerable to the same issue.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: How about a option to disable autovacuum cancellation on lock conflict?

From
Josh Berkus
Date:
On 12/02/2014 11:08 AM, Andres Freund wrote:
> On 2014-12-02 11:02:07 -0800, Josh Berkus wrote:
>> On 12/02/2014 10:35 AM, Alvaro Herrera wrote:
>>> If the table is large, the time window for this to happen is large also;
>>> there might never be a time window large enough between two lock
>>> acquisitions for one autovacuum run to complete in a table.  This
>>> starves the table from vacuuming completely, until things are bad enough
>>> that an emergency vacuum is forced.  By then, the bloat is disastrous.
>>>
>>> I think it's that suicide that Andres wants to disable.
> 
> Correct.
> 
>> A much better solution for this ... and one which would solve a *lot* of
>> other issues with vacuum and autovacuum ... would be to give vacuum a
>> way to track which blocks an incomplete vacuum had already visited.
>> This would be even more valuable for freeze.
> 
> That's pretty much a different problem. Yes, some more persistent would
> be helpful - although it'd need to be *much* more than which pages it
> has visited - but you'd still be vulnerable to the same issue.

If we're trying to solve the problem that vacuums of large, high-update
tables never complete, it's solving the same problem.  And in a much
better way.

And yeah, doing a vacuum placeholder wouldn't be simple, but it's the
only solution I can think of that's worthwhile.  Just disabling the
vacuum releases sharelock behavior puts the user in the situation of
deciding between maintenance and uptime.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: How about a option to disable autovacuum cancellation on lock conflict?

From
Andres Freund
Date:
On 2014-12-02 11:12:40 -0800, Josh Berkus wrote:
> On 12/02/2014 11:08 AM, Andres Freund wrote:
> > On 2014-12-02 11:02:07 -0800, Josh Berkus wrote:
> >> On 12/02/2014 10:35 AM, Alvaro Herrera wrote:
> >>> If the table is large, the time window for this to happen is large also;
> >>> there might never be a time window large enough between two lock
> >>> acquisitions for one autovacuum run to complete in a table.  This
> >>> starves the table from vacuuming completely, until things are bad enough
> >>> that an emergency vacuum is forced.  By then, the bloat is disastrous.
> >>>
> >>> I think it's that suicide that Andres wants to disable.
> > 
> > Correct.
> > 
> >> A much better solution for this ... and one which would solve a *lot* of
> >> other issues with vacuum and autovacuum ... would be to give vacuum a
> >> way to track which blocks an incomplete vacuum had already visited.
> >> This would be even more valuable for freeze.
> > 
> > That's pretty much a different problem. Yes, some more persistent would
> > be helpful - although it'd need to be *much* more than which pages it
> > has visited - but you'd still be vulnerable to the same issue.
> 
> If we're trying to solve the problem that vacuums of large, high-update
> tables never complete, it's solving the same problem.

Which isn't what I'm talking about.

The problem is that vacuum is cancelled if a conflicting lock request is
acquired. Plain updates don't do that. But there's workloads where you
need more heavyweight updates, and then it can easily happen.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: How about a option to disable autovacuum cancellation on lock conflict?

From
Jeff Janes
Date:
On Tue, Dec 2, 2014 at 11:12 AM, Josh Berkus <josh@agliodbs.com> wrote:
On 12/02/2014 11:08 AM, Andres Freund wrote:
> On 2014-12-02 11:02:07 -0800, Josh Berkus wrote:
>> On 12/02/2014 10:35 AM, Alvaro Herrera wrote:
>>> If the table is large, the time window for this to happen is large also;
>>> there might never be a time window large enough between two lock
>>> acquisitions for one autovacuum run to complete in a table.  This
>>> starves the table from vacuuming completely, until things are bad enough
>>> that an emergency vacuum is forced.  By then, the bloat is disastrous.
>>>
>>> I think it's that suicide that Andres wants to disable.
>
> Correct.
>
>> A much better solution for this ... and one which would solve a *lot* of
>> other issues with vacuum and autovacuum ... would be to give vacuum a
>> way to track which blocks an incomplete vacuum had already visited.
>> This would be even more valuable for freeze.
>
> That's pretty much a different problem. Yes, some more persistent would
> be helpful - although it'd need to be *much* more than which pages it
> has visited - but you'd still be vulnerable to the same issue.

If we're trying to solve the problem that vacuums of large, high-update
tables never complete, it's solving the same problem.  And in a much
better way.

And yeah, doing a vacuum placeholder wouldn't be simple, but it's the
only solution I can think of that's worthwhile.  Just disabling the
vacuum releases sharelock behavior puts the user in the situation of
deciding between maintenance and uptime.

I think it would be more promising to work on downgrading lock strengths so that fewer things conflict, and it would be not much more work than what you propose.

What operations are people doing on a regular basis that take locks which cancel vacuum?  create index?

Cheers,

Jeff

Re: How about a option to disable autovacuum cancellation on lock conflict?

From
Andres Freund
Date:
On 2014-12-02 11:23:31 -0800, Jeff Janes wrote:
> I think it would be more promising to work on downgrading lock strengths so
> that fewer things conflict, and it would be not much more work than what
> you propose.

I think you *massively* underestimate the effort required to to lower
lock levels. There's some very ugly corners you have to think about to
do so. Just look at how long it took to implement the lock level
reductions for ALTER TABLE - and those were the simpler cases.

> What operations are people doing on a regular basis that take locks
> which cancel vacuum?  create index?

Locking tables against modifications in this case.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: How about a option to disable autovacuum cancellation on lock conflict?

From
Jeff Janes
Date:
On Tue, Dec 2, 2014 at 11:41 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-12-02 11:23:31 -0800, Jeff Janes wrote:
> I think it would be more promising to work on downgrading lock strengths so
> that fewer things conflict, and it would be not much more work than what
> you propose.

I think you *massively* underestimate the effort required to to lower
lock levels. There's some very ugly corners you have to think about to
do so. Just look at how long it took to implement the lock level
reductions for ALTER TABLE - and those were the simpler cases.

Or maybe I overestimate how hard it would be to make vacuum restartable.  You would have to save a massive amount of state (upto maintenance_work_mem tid list, the block you left off on both the table and all of the indexes in that table), and you would somehow have to validate that saved state against any changes that might have occurred to the table or the indexes while it was saved and you were not holding the lock, which seems like it would almost as full of corner cases as weakening the lock in the first place.  Aren't they logically the same thing?  If we could drop the lock and take it up again later, maybe the answer is not to save the state, but just to pause the vacuum until the lock becomes free again, in effect saving the state in situ.  That would allow autovac worker to be held hostage to anyone taking a lock, though.

The only easy way to do it that I see is to have it only stop at the end of a index-cleaning cycle, which probably takes too long to block for.  Or record a restart point at the end of each index-cleaning cycle, and then when it yields the lock it abandons all work since the last cycle end, rather than since the beginning.  That would be better than what we have, but seems like a far cry from actual restarting from any point.
 

> What operations are people doing on a regular basis that take locks
> which cancel vacuum?  create index?

Locking tables against modifications in this case.

So in "share mode", then?  I don't think there is any reason that there can't be a lock mode that conflicts with "ROW EXCLUSIVE" but not "SHARE UPDATE EXCLUSIVE".  Basically something that conflicts with logical changes, but not with physical changes.

Cheers,

Jeff

Re: How about a option to disable autovacuum cancellation on lock conflict?

From
Andres Freund
Date:
On 2014-12-02 12:22:42 -0800, Jeff Janes wrote:
> Or maybe I overestimate how hard it would be to make vacuum
> restartable.

That's a massive project. Which is why I'm explicitly *not* suggesting
that. What I instead suggest is a separate threshhold after which vacuum
isn't going to abort automaticlaly after a lock conflict. So after that
threshold just behave like anti wraparound vacuum already does.

Maybe autovacuum_vacuum/analyze_force_threshold or similar. If set to
zero, the default, that behaviour is disabled. If set to a positive
value it's an absolute one, if negative it's a factor of the normal
autovacuum_vacuum/analyze_threshold.


Greetings,

Andres Freund



Re: How about a option to disable autovacuum cancellation on lock conflict?

From
Jim Nasby
Date:
On 12/2/14, 2:22 PM, Jeff Janes wrote:
> Or maybe I overestimate how hard it would be to make vacuum restartable.  You would have to save a massive amount of
state(upto maintenance_work_mem tid list, the block you left off on both the table and all of the indexes in that
table),and you would somehow have to validate that saved state against any changes that might have occurred to the
tableor the indexes while it was saved and you were not holding the lock, which seems like it would almost as full of
cornercases as weakening the lock in the first place.  Aren't they logically the same thing?  If we could drop the lock
andtake it up again later, maybe the answer is not to save the state, but just to pause the vacuum until the lock
becomesfree again, in effect saving the state in situ.  That would allow autovac worker to be held hostage to anyone
takinga lock, though.
 

Yeah, rather than messing with any of that, I think it would make a lot more sense to split vacuum into smaller
operationsthat don't require such a huge chunk of time.
 

> The only easy way to do it that I see is to have it only stop at the end of a index-cleaning cycle, which probably
takestoo long to block for.  Or record a restart point at the end of each index-cleaning cycle, and then when it yields
thelock it abandons all work since the last cycle end, rather than since the beginning.  That would be better than what
wehave, but seems like a far cry from actual restarting from any point.
 

Now that's not a bad idea. This would basically mean just saving a block number in pg_class after every intermediate
indexclean and then setting that back to zero when we're done with that relation, right?
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com