Thread: Lock freeze ? in MVCC
Hello all, The following example causes the hang of concurrent transactions(99/04/26 snapshot). session-1 => create table tt (id int4); session-1 => begin; session-1 => insert into tt values (1); session-2 => begin; session-2 => insert into tt values (2); session-3 => begin; session-3 => lock table tt; (blocked) session-1 => update tt set id=1 where id=1; (blocked) session-2 => end; session-2 returns immediately,but session-3 and session-1 are still blocked This phenomenon seems to be caused by LockResolveCon flicts() or DeadLockCheck(). Both session-1 and session-2 acquire RowExclusive locks by insert operations(InitPlan() in execMain.c). The AccessExclusive lock of session-3 is queued waiting for the release of above locks. When the update operation of session-1 is executed,the second RowExclusive lock is rejected by LockResolve Conflicts() and queued after the AccessExclusive lock of session-3. The state is like deadlock but DeadLockCheck() doesn't regard the state as deadlock. Thanks. Hiroshi Inoue Inoue@tpf.co.jp
OK, let me comment on this. It does not to see this as a deadlock because session 3 really doesn't have a lock at the point it is hanging. A deadlock would be if 1 has a lock that 3 is waiting for, and 3 has a lock 1 is waiting for. Hold on, I think I see what you are saying now. It seems the locking code assume table-level locking, while the new code now has MVCC. I better look at this. This could be ugly to fix. I look for matching lock structure pointers in different backends(lock.c), but now I see that 1 and 2 both are waiting for table tt, but they have different locks structures, because they are different types of locks. Yikes. Maybe I can hack something in there, but I can't imagine how yet. Maybe Vadim will have a hint. --------------------------------------------------------------------------- session-1 => create table tt (id int4); session-1 => begin; session-1 => insert into tt values (1); session-2 => begin; session-2 => insert into tt values (2); session-3 => begin; session-3 => lock table tt; (blocked) session-1 => update tt set id=1 where id=1; (blocked) session-2 => end; session-2 returns immediately,but session-3 and session-1 are still blocked This phenomenon seems to be caused by LockResolveCon flicts() or DeadLockCheck(). Both session-1 and session-2 acquire RowExclusive locks by insert operations(InitPlan() in execMain.c). The AccessExclusive lock of session-3 is queued waiting for the release of above locks. When the update operation of session-1 is executed,the second RowExclusive lock is rejected by LockResolve Conflicts() and queued after the AccessExclusive lock of session-3. The state is like deadlock but DeadLockCheck() doesn't regard the state as deadlock. Thanks. Hiroshi Inoue Inoue@tpf.co.jp -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> -----Original Message----- > From: Bruce Momjian [mailto:maillist@candle.pha.pa.us] > Sent: Tuesday, April 27, 1999 1:33 PM > To: Hiroshi Inoue > Cc: pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] Lock freeze ? in MVCC > > > OK, let me comment on this. It does not to see this as a deadlock > because session 3 really doesn't have a lock at the point it is hanging. > A deadlock would be if 1 has a lock that 3 is waiting for, and 3 has a > lock 1 is waiting for. > > Hold on, I think I see what you are saying now. It seems the locking > code assume table-level locking, while the new code now has MVCC. I > better look at this. This could be ugly to fix. I look for matching I think it's a problem of table-level locking(MVCC has 8 levels of table- locking and even select operations acquire AccessShareLock's.) Moreover I found it's not the problem of MVCC only. In fact I found the following case in 6.4.2. session-1 => create table tt (id int4); session-1 => begin; session-1 => select * from tt; session-2 => begin; session-2 => select * from tt; session-3 => begin; session-3 => lock table tt; (blocked) session-1 => select * from tt; (blocked) session-2 => end; session-2 returns immediately,but session-3 and session-1 are still blocked Now I'm suspicious about the following code in LockResolveConflicts(). /* * We can control runtime this option. Default is lockReadPriority=0 */ if (!lockReadPriority) { /* ------------------------ * If someone with a greater priorityis waiting for the lock, * do not continue and share the lock, even if we can. bjm * ------------------------ */ int myprio = LockMethodTable[lockmethod]->ct l->prio[lockmode]; PROC_QUEUE *waitQueue = &(lock->waitProcs); PROC *topproc = (PROC *) MAKE_PTR(waitQueue->links.prev); if (waitQueue->size && topproc->prio > myprio) { XID_PRINT("LockResolveConflicts:higher priority proc wa iting", result); return STATUS_FOUND; } } After I removed above code on trial,select operations in my example case are not blocked. Comments ? Thanks. Hiroshi Inoue Inoue@tpf.co.jp
Hiroshi Inoue wrote: > > Now I'm suspicious about the following code in LockResolveConflicts(). > > /* > * We can control runtime this option. Default is lockReadPriority=0 > */ > if (!lockReadPriority) > { > /* ------------------------ > * If someone with a greater priority is waiting for the > lock, > * do not continue and share the lock, even if we can. bjm > * ------------------------ You're right Hiroshi - this must be changed: if we already have some lock with priority X and new requested lock has priority Y, Y <= X, then lock must be granted. Also, I would get rid of lockReadPriority stuff... Bruce, what do you think? Vadim
> Hiroshi Inoue wrote: > > > > Now I'm suspicious about the following code in LockResolveConflicts(). > > > > /* > > * We can control runtime this option. Default is lockReadPriority=0 > > */ > > if (!lockReadPriority) > > { > > /* ------------------------ > > * If someone with a greater priority is waiting for the > > lock, > > * do not continue and share the lock, even if we can. bjm > > * ------------------------ > > You're right Hiroshi - this must be changed: > > if we already have some lock with priority X and new requested > lock has priority Y, Y <= X, then lock must be granted. > > Also, I would get rid of lockReadPriority stuff... > > Bruce, what do you think? This sounds correct. I thought I needed to have the queue ordering changed so that row-level locks are queued before table-level locks, because there could be cases of lock escalation from row-level to table-level. However, it seems the problem is that readers don't share locks if writers are waiting. With table-level locks, you never escalated a read lock because you had already locked the entire table, while now you do. Perhaps we can tell the system not to share read locks unless you are sharing your own lock due to a lock escalation. lockReadPriority() was probably added by Massimo to disable this "don't share a readlock if another writer is waiting" behavior. While disabling this behavior my be useful in some cases, but in the general case may be a problem of starving writers if there are too many readers. However, it is my understanding that we don't have readers sleeping on locks anymore, but I may be wrong. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > > > > if we already have some lock with priority X and new requested > > lock has priority Y, Y <= X, then lock must be granted. > > > > Also, I would get rid of lockReadPriority stuff... > > > > Bruce, what do you think? > > This sounds correct. I thought I needed to have the queue ordering > changed so that row-level locks are queued before table-level locks, > because there could be cases of lock escalation from row-level to > table-level. > > However, it seems the problem is that readers don't share locks if > writers are waiting. With table-level locks, you never escalated a read > lock because you had already locked the entire table, while now you do. > Perhaps we can tell the system not to share read locks unless you are > sharing your own lock due to a lock escalation. There is no row-level locks: all locks over tables are table-level ones, btree & hash use page-level locks, but never do page->table level lock escalation. However, I'm not sure that proposed changes will help in the next case: session-1 => begin; session-1 => insert into tt values (1); --RowExclusiveLock session-2 => begin; session-2 => insert into tt values (2); --RowExclusiveLock session-3 => begin; session-3 => lock table tt; --AccessExclusiveLock (conflicts with 1 & 2) ^ session-1 => lock table tt in share mode; --ShareLock (conflicts with 2 & 3) ^ This is deadlock situation and must be handled by DeadLockCheck(). Vadim
[Charset koi8-r unsupported, filtering to ASCII...] > Bruce Momjian wrote: > > > > > > > > if we already have some lock with priority X and new requested > > > lock has priority Y, Y <= X, then lock must be granted. > > > > > > Also, I would get rid of lockReadPriority stuff... > > > > > > Bruce, what do you think? > > > > This sounds correct. I thought I needed to have the queue ordering > > changed so that row-level locks are queued before table-level locks, > > because there could be cases of lock escalation from row-level to > > table-level. > > > > However, it seems the problem is that readers don't share locks if > > writers are waiting. With table-level locks, you never escalated a read > > lock because you had already locked the entire table, while now you do. > > Perhaps we can tell the system not to share read locks unless you are > > sharing your own lock due to a lock escalation. > > There is no row-level locks: all locks over tables are > table-level ones, btree & hash use page-level locks, but > never do page->table level lock escalation. > > However, I'm not sure that proposed changes will help in the next case: > > session-1 => begin; > session-1 => insert into tt values (1); --RowExclusiveLock > > session-2 => begin; > session-2 => insert into tt values (2); --RowExclusiveLock > > session-3 => begin; > session-3 => lock table tt; --AccessExclusiveLock > (conflicts with 1 & 2) > ^ > session-1 => lock table tt in share mode; --ShareLock > (conflicts with 2 & 3) > ^ > This is deadlock situation and must be handled by > DeadLockCheck(). OK, I think the problem with the code is that I am preventing a process from getting a lock if there is a process of higher priority waiting(a writer). However, I never check to see if the current process already holds some kind of lock on the table. If I change the code so this behaviour will be prevented if the process already holds a lock on the table, would that fix it? In fact, maybe I should always allow it to get the lock if it holds any other locks. This should prevent some deadlocks. It would put processes at the end of the queue only if they already have no locks, which I think makes sense, because putting him at the end of the queue means all his locks are kept while he sits in the queue. Comments? The fix would be easy, and I think it would make sense. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> -----Original Message----- > From: root@sunpine.krs.ru [mailto:root@sunpine.krs.ru]On Behalf Of Vadim > Mikheev > Sent: Wednesday, April 28, 1999 12:37 PM > To: Bruce Momjian > Cc: Inoue@tpf.co.jp; pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] Lock freeze ? in MVCC > > > Bruce Momjian wrote: > > > > > > > > if we already have some lock with priority X and new requested > > > lock has priority Y, Y <= X, then lock must be granted. > > > > > > Also, I would get rid of lockReadPriority stuff... > > > > > > Bruce, what do you think? > > > > This sounds correct. I thought I needed to have the queue ordering > > changed so that row-level locks are queued before table-level locks, > > because there could be cases of lock escalation from row-level to > > table-level. > > > > However, it seems the problem is that readers don't share locks if > > writers are waiting. With table-level locks, you never escalated a read > > lock because you had already locked the entire table, while now you do. > > Perhaps we can tell the system not to share read locks unless you are > > sharing your own lock due to a lock escalation. > > There is no row-level locks: all locks over tables are > table-level ones, btree & hash use page-level locks, but > never do page->table level lock escalation. > > However, I'm not sure that proposed changes will help in the next case: > > session-1 => begin; > session-1 => insert into tt values (1); --RowExclusiveLock > > session-2 => begin; > session-2 => insert into tt values (2); --RowExclusiveLock > > session-3 => begin; > session-3 => lock table tt; --AccessExclusiveLock > (conflicts with 1 & 2) > ^ > session-1 => lock table tt in share mode; --ShareLock > (conflicts with 2 & 3) > ^ > This is deadlock situation and must be handled by > DeadLockCheck(). > It's really a deadlock ? Certainly end/abort of session-2 doesn't wakeup session-1/session3. I think it's due to the following code in ProcLockWakeup(). while ((queue_size--) && (proc)) { /* * This proc will conflict as the previous one did, don't even * try. */ if (proc->token == last_locktype) continue; /* * This proc conflicts with locks held by others, ignored. */ if (LockResolveConflicts(lockmethod, lock, proc->token, proc->xid, (XIDLookupEnt * ) NULL) != STATUS_OK) { last_locktype = proc->token; continue; } Once LockResolveConflicts() doesn't return STATUS_OK,proc is not changed and only queue_size-- is executed(never try to wakeup other procs). After inserting the code such asproc = (PROC *) MAKE_PTR(proc->links.prev); before continue statements,ProcLockWakeup() triggerd by end/abort of session-2 could try to wakeup session-1. Comments ? Thanks. Hiroshi Inoue Inoue@tpf.co.jp
> -----Original Message----- > From: root@sunpine.krs.ru [mailto:root@sunpine.krs.ru]On Behalf Of Vadim > Mikheev > Sent: Tuesday, April 27, 1999 7:49 PM > To: Hiroshi Inoue > Cc: Bruce Momjian; pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] Lock freeze ? in MVCC > > > Hiroshi Inoue wrote: > > > > Now I'm suspicious about the following code in LockResolveConflicts(). > > > > /* > > * We can control runtime this option. Default is > lockReadPriority=0 > > */ > > if (!lockReadPriority) > > { > > /* ------------------------ > > * If someone with a greater priority is waiting for the > > lock, > > * do not continue and share the lock, even if > we can. bjm > > * ------------------------ > > You're right Hiroshi - this must be changed: > > if we already have some lock with priority X and new requested > lock has priority Y, Y <= X, then lock must be granted. > > Also, I would get rid of lockReadPriority stuff... > I found a problem to get rid of lockReadPriority stuff completely. If there's a table which is insert/update/deleted very frequenly by several processes,processes which request the high priority lock (such as vacuum) could hardly acquire the lock for the table. How about the following patch ? Thanks. Hiroshi Inoue Inoue@tpf.co.jp *** storage/lmgr/lock.c.orig Wed Apr 28 10:44:52 1999 --- storage/lmgr/lock.c Wed Apr 28 12:00:14 1999 *************** *** 815,821 **** /* * We can control runtime this option. Default is lockReadPriority=0 */ ! if (!lockReadPriority) { /* ------------------------ * If someone with a greater priority is waitingfor the lock, --- 815,821 ---- /* * We can control runtime this option. Default is lockReadPriority=0 */ ! if ((!result->nHolding) && (!lockReadPriority)) { /* ------------------------ * If someone witha greater priority is waiting for the lock,
Hiroshi Inoue wrote: > > > > > if we already have some lock with priority X and new requested > > lock has priority Y, Y <= X, then lock must be granted. > > > > Also, I would get rid of lockReadPriority stuff... > > > > I found a problem to get rid of lockReadPriority stuff completely. > If there's a table which is insert/update/deleted very frequenly by > several processes,processes which request the high priority lock > (such as vacuum) could hardly acquire the lock for the table. I didn't mean to get rid of code checking waiter locks completely. I just said that condition below if (!lockReadPriority) is unuseful any more. Read my prev letter when, imo, we have to take waiters into account. Vadim
Hiroshi Inoue wrote: > > > However, I'm not sure that proposed changes will help in the next case: > > > > session-1 => begin; > > session-1 => insert into tt values (1); --RowExclusiveLock > > > > session-2 => begin; > > session-2 => insert into tt values (2); --RowExclusiveLock > > > > session-3 => begin; > > session-3 => lock table tt; --AccessExclusiveLock > > (conflicts with 1 & 2) > > ^ > > session-1 => lock table tt in share mode; --ShareLock > > (conflicts with 2 & 3) > > ^ > > This is deadlock situation and must be handled by > > DeadLockCheck(). > > > > It's really a deadlock ? > Certainly end/abort of session-2 doesn't wakeup session-1/session3. You're right again. First, I propose the next changes in LockResolveConflicts(): if someone is waiting for lock then we must not take them into account (and skip to check for conflicts with lock holders) if 1. we already has lock with >= priority (currently, more restrictive locks have greater priority); or 2. lock requested doesn't conflict with lock of any waiters; or 3. conflicting waiter is waiting for us: its lock conflicts with locks we already hold, if any. I foresee that we also will have to change lock queue ordering code but I have to think about it more. Vadim
> -----Original Message----- > From: root@sunpine.krs.ru [mailto:root@sunpine.krs.ru]On Behalf Of Vadim > Mikheev > Sent: Wednesday, April 28, 1999 7:56 PM > To: Hiroshi Inoue > Cc: Bruce Momjian; pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] Lock freeze ? in MVCC > > > Hiroshi Inoue wrote: > > [snip] > > > This is deadlock situation and must be handled by > > > DeadLockCheck(). > > > > > > > It's really a deadlock ? > > Certainly end/abort of session-2 doesn't wakeup session-1/session3. > > You're right again. > First, I propose the next changes in LockResolveConflicts(): > > if someone is waiting for lock then we must not take them > into account (and skip to check for conflicts with lock > holders) if > > 1. we already has lock with >= priority (currently, more > restrictive locks have greater priority); > or > 2. lock requested doesn't conflict with lock of any waiters; Does this mean that the lock has a low priority ? If so,this state(2.) is hardly changed. When this waiter is wakeupd ? > or > 3. conflicting waiter is waiting for us: its lock conflicts > with locks we already hold, if any. > > I foresee that we also will have to change lock queue ordering > code but I have to think about it more. > Do you say about the following stuff in ProcSleep() ? proc = (PROC *) MAKE_PTR(waitQueue->links.prev); /* If we are a reader, and they are writers, skip past them */for (i = 0; i < waitQueue->size && proc->prio > prio; i++) proc = (PROC *) MAKE_PTR(proc->links.prev); /* The rest of the queue is FIFO, with readers first, writers last */for (; i < waitQueue->size && proc->prio <= prio; i++) proc = (PROC *) MAKE_PTR(proc->links.prev); Seems above logic is only for 2 levels of priority(READ/WRITE). But it's difficult for me to propose a different design for this. Thanks. Hiroshi Inoue Inoue@tpf.co.jp
> Do you say about the following stuff in ProcSleep() ? > > proc = (PROC *) MAKE_PTR(waitQueue->links.prev); > > /* If we are a reader, and they are writers, skip past them */ > for (i = 0; i < waitQueue->size && proc->prio > prio; i++) > proc = (PROC *) MAKE_PTR(proc->links.prev); > > /* The rest of the queue is FIFO, with readers first, writers last */ > for (; i < waitQueue->size && proc->prio <= prio; i++) > proc = (PROC *) MAKE_PTR(proc->links.prev); > > Seems above logic is only for 2 levels of priority(READ/WRITE). > But it's difficult for me to propose a different design for this. I think this is a classic priority inversion problem. If the process holds a lock and is going for another, but their is a higher priority process waiting for the lock, we have to consider that if we go to sleep, all people waiting on the lock will have to wait for me to complete in that queue, so we can either never have a process that already holds any lock from being superceeded by a higher-priority sleeping process, or we need to check the priority of all processes waiting on _our_ locks and check when pulling stuff out of the lock queue because someone of high priority could come while I am in the queue waiting. My recommendation is to not have this go-to-end of queue if there is someone higher if I already hold _any_ kind of lock. I can easily make that change if others agree. It makes the code your are questioning active ONLY if I already don't have some kind of lock. This may be the most efficient way to do things, and may not lock things up like you have seen. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Hiroshi Inoue wrote: > > > > > if someone is waiting for lock then we must not take them > > into account (and skip to check for conflicts with lock > > holders) if > > > > 1. we already has lock with >= priority (currently, more > > restrictive locks have greater priority); > > or > > 2. lock requested doesn't conflict with lock of any waiters; > > Does this mean that the lock has a low priority ? If so,this Yes and no -:) If we acquire ShareLock (prio 4) and someone with RowShareLock (2) is waiting then this means that table is locked in ExclusiveLock (or AccessExclusiveLock) mode and we'll going to sleep after lock-holder conflict test (so, we could go to sleep just after seeing that our prio is higher, without lock-holder conflict test). If we acquire RowShareLock and someone with ShareLock is waiting due to table is locked in RowExclusiveLock mode then we are allowed to continue: ShareLock waiter will be wakeupd after releasing of RowExclusiveLock and we don't conflict with any of these lock mode. > state(2.) is hardly changed. When this waiter is wakeupd ? > > > or > > 3. conflicting waiter is waiting for us: its lock conflicts > > with locks we already hold, if any. > > > > I foresee that we also will have to change lock queue ordering > > code but I have to think about it more. > > > > Do you say about the following stuff in ProcSleep() ? > > proc = (PROC *) MAKE_PTR(waitQueue->links.prev); > > /* If we are a reader, and they are writers, skip past them */ > for (i = 0; i < waitQueue->size && proc->prio > prio; i++) > proc = (PROC *) MAKE_PTR(proc->links.prev); > > /* The rest of the queue is FIFO, with readers first, writers last */ > for (; i < waitQueue->size && proc->prio <= prio; i++) > proc = (PROC *) MAKE_PTR(proc->links.prev); > > Seems above logic is only for 2 levels of priority(READ/WRITE). > But it's difficult for me to propose a different design for this. Yes. I'm not sure how useful is priority logic now. Keep thinking... Vadim