Re: canceling autovacuum task woes - Mailing list pgsql-hackers

From Tom Lane
Subject Re: canceling autovacuum task woes
Date
Msg-id 12540.1343184774@sss.pgh.pa.us
Whole thread Raw
In response to Re: canceling autovacuum task woes  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Jul 24, 2012, at 4:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... This means that with respect to (a), the connection
>> from the process doing the kill to the AV proc may be inadequately
>> documented by this patch, and with respect to (b), there might well be
>> cases where we found an AV proc somewhere in the graph traversal but
>> it's not actually guilty of blocking the current process ... especially
>> not after the queue reorderings that we may have done.  I think I'd be
>> happier with that code if it restricted its AV targets to procs that
>> *directly* block the current process, which not incidentally would make
>> this amount of log detail sufficient.

> Uggh.  Well, that certainly sounds like something that could cause spurious cancels - or excessively fast ones, since
presumablyif we limit it to things that directly block the current process, you'll always allow the full
deadlock_timeoutbefore nuking the autovac worker.  So +1 for changing that. 

I think something as simple as the attached would do the trick.  I've
verified that this still allows the expected cancel cases, though of
course it's harder to prove that it gives any benefit.

> Does an edge in this context mean any lock, or just an ungranted one?  I assume the latter, which still leaves the
questionof where the edges are coming from in the first place. 

The deadlock code follows "hard" edges, which mean "A wants a lock that
B has already got", as well as "soft" edges, which mean "A wants a lock
that B also wants, and B is ahead of A in the queue for it".  We don't
kill autovacs that are the end of soft edges, which I think is good,
but the fact that we follow them at all makes it a little unclear what
we might reach recursively.  Your point about determinacy of the timeout
is probably even a better argument for only allowing the direct blockee
to wield the knife.

            regards, tom lane

diff --git a/src/backend/storage/lmgr/deadlock.c b/src/backend/storage/lmgr/deadlock.c
index 288186a6ceebf40d5a43638c7be5be35c12dd3b3..6620a3d571d769f75e6ae3fbbe4a0ed2cf271720 100644
*** a/src/backend/storage/lmgr/deadlock.c
--- b/src/backend/storage/lmgr/deadlock.c
*************** FindLockCycleRecurse(PGPROC *checkProc,
*** 527,551 ****
                  if ((proclock->holdMask & LOCKBIT_ON(lm)) &&
                      (conflictMask & LOCKBIT_ON(lm)))
                  {
-                     /*
-                      * Look for a blocking autovacuum. There can be more than
-                      * one in the deadlock cycle, in which case we just pick a
-                      * random one.    We stash the autovacuum worker's PGPROC so
-                      * that the caller can send a cancel signal to it, if
-                      * appropriate.
-                      *
-                      * Note we read vacuumFlags without any locking.  This is
-                      * OK only for checking the PROC_IS_AUTOVACUUM flag,
-                      * because that flag is set at process start and never
-                      * reset; there is logic elsewhere to avoid canceling an
-                      * autovacuum that is working for preventing Xid
-                      * wraparound problems (which needs to read a different
-                      * vacuumFlag bit), but we don't do that here to avoid
-                      * grabbing ProcArrayLock.
-                      */
-                     if (pgxact->vacuumFlags & PROC_IS_AUTOVACUUM)
-                         blocking_autovacuum_proc = proc;
-
                      /* This proc hard-blocks checkProc */
                      if (FindLockCycleRecurse(proc, depth + 1,
                                               softEdges, nSoftEdges))
--- 527,532 ----
*************** FindLockCycleRecurse(PGPROC *checkProc,
*** 559,565 ****

                          return true;
                      }
!                     /* If no deadlock, we're done looking at this proclock */
                      break;
                  }
              }
--- 540,575 ----

                          return true;
                      }
!
!                     /*
!                      * No deadlock here, but see if this proc is an autovacuum
!                      * that is directly hard-blocking our own proc.  If so,
!                      * report it so that the caller can send a cancel signal
!                      * to it, if appropriate.  If there's more than one such
!                      * proc (probably not possible given that autovacuums all
!                      * take similar lock types), it's indeterminate which one
!                      * will be reported.
!                      *
!                      * We don't touch autovacuums that are indirectly blocking
!                      * us; it's up to the direct blockee to take action.  This
!                      * rule simplifies understanding the behavior and ensures
!                      * that an autovacuum won't be canceled with less than
!                      * deadlock_timeout grace period.
!                      *
!                      * Note we read vacuumFlags without any locking.  This is
!                      * OK only for checking the PROC_IS_AUTOVACUUM flag,
!                      * because that flag is set at process start and never
!                      * reset.  There is logic elsewhere to avoid canceling an
!                      * autovacuum that is working to prevent XID wraparound
!                      * problems (which needs to read a different vacuumFlag
!                      * bit), but we don't do that here to avoid grabbing
!                      * ProcArrayLock.
!                      */
!                     if (checkProc == MyProc &&
!                         pgxact->vacuumFlags & PROC_IS_AUTOVACUUM)
!                         blocking_autovacuum_proc = proc;
!
!                     /* We're done looking at this proclock */
                      break;
                  }
              }

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [BUGS] BUG #6748: sequence value may be conflict in some cases
Next
From: Alexander Law
Date:
Subject: Re: BUG #6742: pg_dump doesn't convert encoding of DB object names to OS encoding