Thread: Re: [HACKERS] 'Waiting on lock'
>> * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> Yeah, I wouldn't want one per second. Do we already track how long >> we've been waiting? > > No, because we're *asleep*. You'd have to add an additional > timeout-interrupt reason. Plus there's a ton of interesting questions > about what's safe to do from an interrupt service routine. > > In fact, I am scandalized to see that someone has inserted a boatload > of elog calls into CheckDeadLock since 8.2 --- that seems entirely > unsafe. [ checks revision history... ] Attached is a patch which moves the messages to ProcSleep(). To do this I had to move the semaphore signal to unconditionally be signalled whenever CheckDeadLock() is called regardless of whether it finds a hard deadlock. I'm not 100% sure that's correct but afaik we only use semaphores to signal state changes and deal with spurious semaphore firings everywhere. Incidentally in looking at this I found that the "early deadlock detection" never seems to fire. Reading the source it seems it ought to be firing whenever we have a simple two-process deadlock. But instead I only get the timeout-based detection. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Attachment
"Gregory Stark" <stark@enterprisedb.com> writes: > Incidentally in looking at this I found that the "early deadlock detection" > never seems to fire. Reading the source it seems it ought to be firing > whenever we have a simple two-process deadlock. But instead I only get the > timeout-based detection. Ok, I understand now that early deadlock detection only kicks in when doing something like LOCK TABLE and even then only if you're deadlocking because you're upgrading a lock. So this works as intended though it's much less useful than I thought. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > Ok, I understand now that early deadlock detection only kicks in when doing > something like LOCK TABLE and even then only if you're deadlocking because > you're upgrading a lock. So this works as intended though it's much less > useful than I thought. Yeah, that path is not really intended as an optimization, it's just the simplest way to clean up correctly at that point, per the comment. The part of the whole business that actually *is* useful is where we decide we should grant ourselves the lock immediately --- if we eliminated that whole loop and just always queued at the end of the list, we'd still get the lock, but not till after the deadlock check delay had elapsed. regards, tom lane
Gregory Stark <stark@enterprisedb.com> writes: > Attached is a patch which moves the messages to ProcSleep(). To do this I had > to move the semaphore signal to unconditionally be signalled whenever > CheckDeadLock() is called regardless of whether it finds a hard deadlock. I'm > not 100% sure that's correct but afaik we only use semaphores to signal state > changes and deal with spurious semaphore firings everywhere. It's a bit itchy-seeming, but I think what you are actually doing is going over to a regime where every wait on the semaphore checks and re-waits if it didn't get the condition it wants. Before, I think we did that for every use of the semaphore except this one (which is an outgrowth of the fact that originally this was indeed the only use of the sema, and the others got shoehorned in over time). I'll check it over. regards, tom lane
Gregory Stark <stark@enterprisedb.com> writes: > Attached is a patch which moves the messages to ProcSleep(). BTW, with the messages moved out of the ISR it would be safe to make it identify the specific lock being waited on (in the same terms used in the existing deadlock messages). Is there a reason not to do that? I suppose it would eat a few more cycles ... regards, tom lane
Gregory Stark <stark@enterprisedb.com> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> In fact, I am scandalized to see that someone has inserted a boatload >> of elog calls into CheckDeadLock since 8.2 --- that seems entirely >> unsafe. [ checks revision history... ] > Attached is a patch which moves the messages to ProcSleep(). Applied with some further revisions to improve the usefulness of the log messages. There's now one issued when the deadlock timeout elapses, and another when the lock is finally obtained: LOG: process 14945 still waiting for AccessExclusiveLock on relation 86076 of database 86042 after 1003.354 ms ... LOG: process 14945 acquired AccessExclusiveLock on relation 86076 of database 86042 after 5918.002 ms although I just realized that the wording of the second one is misleading; it actually comes out when the lock wait ends, whether we acquired the lock or not. (The other possibility is that our statement was aborted, eg by SIGINT or statement_timeout.) Is it worth having two messages for the two cases? I'm tempted to just not log anything if the statement is aborted --- the cause of the abort should be reflected in some later error message, and reporting how long we waited before giving up seems not within the charter of log_lock_waits. regards, tom lane
On Tue, 2007-06-19 at 16:24 -0400, Tom Lane wrote: > Gregory Stark <stark@enterprisedb.com> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> In fact, I am scandalized to see that someone has inserted a boatload > >> of elog calls into CheckDeadLock since 8.2 --- that seems entirely > >> unsafe. [ checks revision history... ] > > > Attached is a patch which moves the messages to ProcSleep(). Thanks Greg for taking this on; it would still be in my queue now if you hadn't, so much appreciated. > Applied with some further revisions to improve the usefulness of the log > messages. There's now one issued when the deadlock timeout elapses, and > another when the lock is finally obtained: > > LOG: process 14945 still waiting for AccessExclusiveLock on relation 86076 of database 86042 after 1003.354 ms > ... > LOG: process 14945 acquired AccessExclusiveLock on relation 86076 of database 86042 after 5918.002 ms > > although I just realized that the wording of the second one is > misleading; it actually comes out when the lock wait ends, whether we > acquired the lock or not. (The other possibility is that our statement > was aborted, eg by SIGINT or statement_timeout.) > > Is it worth having two messages for the two cases? I'm tempted to just > not log anything if the statement is aborted --- the cause of the abort > should be reflected in some later error message, and reporting how long > we waited before giving up seems not within the charter of > log_lock_waits. Sounds good; thanks Tom. related TODO items: - add a WAIT n clause in same SQL locations as NOWAIT - add a lock_wait_timeout (USERSET), default = 0 (unlimited waiting) to provide better control over lock waits. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
* Simon Riggs (simon@2ndquadrant.com) wrote: > related TODO items: > - add a WAIT n clause in same SQL locations as NOWAIT > - add a lock_wait_timeout (USERSET), default = 0 (unlimited waiting) > > to provide better control over lock waits. I havn't had a chance to look over the commit yet, sorry, but it sounds like it's only logging to the log file? Was hoping to get a NOTICE back out to the user for at least the 'still waiting on lock x' log, after the deadlock detection. I'm not particular about gucs/settings/etc, whatever people feel is best, provided there's a way to get the messages in psql. :) Thanks, Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > I havn't had a chance to look over the commit yet, sorry, but it sounds > like it's only logging to the log file? Was hoping to get a NOTICE back > out to the user for at least the 'still waiting on lock x' log, after > the deadlock detection. It's elog(LOG), which you can get on the client if you want. The chance of making it NOTICE is epsilon... regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Applied with some further revisions to improve the usefulness of the log > messages. There's now one issued when the deadlock timeout elapses, and > another when the lock is finally obtained: > > LOG: process 14945 still waiting for AccessExclusiveLock on relation 86076 of database 86042 after 1003.354 ms > ... > LOG: process 14945 acquired AccessExclusiveLock on relation 86076 of database 86042 after 5918.002 ms Is it possible for unlocking the semaphore to wake another process other than our own? In which case checking log_lock_waits before signalling the semaphore arguably locks us into having log_lock_waits be PGC_POSTMASTER. Currently it's PGC_SIGHUP which is odd since it could have been USERSET in the old regime. Also, I did just think of a reason why maybe having the times in the messages could be annoying: it makes it hard to write regression tests. I suppose having the pids already interferes with regression tests though. Maybe we should do something like optionally postprocess .out files with some sed script like s/[0-9]+/###/ before running diff. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > Is it possible for unlocking the semaphore to wake another process other than > our own? In which case checking log_lock_waits before signalling the semaphore > arguably locks us into having log_lock_waits be PGC_POSTMASTER. How you figure that? > Currently it's PGC_SIGHUP which is odd since it could have been > USERSET in the old regime. Actually I changed it to SUSET yesterday. I don't see any strong reason why we should disallow different processes having different settings; however, if the DBA is trying to gather this info, random users shouldn't be able to turn it off. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> Is it possible for unlocking the semaphore to wake another process other than >> our own? In which case checking log_lock_waits before signalling the semaphore >> arguably locks us into having log_lock_waits be PGC_POSTMASTER. > > How you figure that? Well I'm not clear exactly what's going on with the semaphores here. If it's possible for to be printing the messages only as a result of another backend unlocking the semaphore then making the PGSemaphoreUnlock conditional on log_lock_waits means you can't enable log_lock_waits after startup and get deterministic behaviour because whether you get messages will depend on which other backend happens to wake you up. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> How you figure that? > Well I'm not clear exactly what's going on with the semaphores here. If it's > possible for to be printing the messages only as a result of another backend > unlocking the semaphore then making the PGSemaphoreUnlock conditional on > log_lock_waits means you can't enable log_lock_waits after startup and get > deterministic behaviour because whether you get messages will depend on which > other backend happens to wake you up. I don't see how you arrive at that conclusion. The message is printed by the backend that is waiting for (or just obtained) a lock, dependent on its own local setting of log_lock_waits, and not dependent on who woke it up. BTW, I just noticed that GUC allows deadlock_timeout to be set all the way down to zero. This seems bad --- surely the minimum value should at least be positive? As CVS HEAD stands, you're likely to get a lot of spurious/useless log messages if you have log_lock_waits = true and deadlock_timeout = 0. Do we care? regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> "Tom Lane" <tgl@sss.pgh.pa.us> writes: >>> How you figure that? > >> Well I'm not clear exactly what's going on with the semaphores here. If it's >> possible for to be printing the messages only as a result of another backend >> unlocking the semaphore then making the PGSemaphoreUnlock conditional on >> log_lock_waits means you can't enable log_lock_waits after startup and get >> deterministic behaviour because whether you get messages will depend on which >> other backend happens to wake you up. > > I don't see how you arrive at that conclusion. The message is printed > by the backend that is waiting for (or just obtained) a lock, dependent > on its own local setting of log_lock_waits, and not dependent on who > woke it up. But in your version of the patch you're not calling PGSemaphoreUnlock() unless log_lock_waits is set in the process doing the waking. Hm, I suppose it'll wake up itself when its own deadlock timer runs out anyways. So I guess the worst case is that it doesn't say anything after a soft deadlock fixup. > BTW, I just noticed that GUC allows deadlock_timeout to be set all the > way down to zero. This seems bad --- surely the minimum value should at > least be positive? As CVS HEAD stands, you're likely to get a lot of > spurious/useless log messages if you have log_lock_waits = true and > deadlock_timeout = 0. Do we care? Does that actually work? I would expect setitimer to turn off the alarm in that case. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> I don't see how you arrive at that conclusion. The message is printed >> by the backend that is waiting for (or just obtained) a lock, dependent >> on its own local setting of log_lock_waits, and not dependent on who >> woke it up. > But in your version of the patch you're not calling PGSemaphoreUnlock() unless > log_lock_waits is set in the process doing the waking. Which is always the same process: PGSemaphoreUnlock(&MyProc->sem); >> BTW, I just noticed that GUC allows deadlock_timeout to be set all the >> way down to zero. This seems bad --- surely the minimum value should at >> least be positive? As CVS HEAD stands, you're likely to get a lot of >> spurious/useless log messages if you have log_lock_waits = true and >> deadlock_timeout = 0. Do we care? > Does that actually work? I would expect setitimer to turn off the alarm in > that case. Good point, which renders it definitely broken. I propose we just tweak GUC to set a minimum deadlock_timeout of 1 msec. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Which is always the same process: > PGSemaphoreUnlock(&MyProc->sem); Aaah! I just grokked what's going on with the semaphores here. It all makes a lot more sense now. Nevermind. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On 6/19/07, Simon Riggs <simon@2ndquadrant.com> wrote: > > related TODO items: > - add a WAIT n clause in same SQL locations as NOWAIT > - add a lock_wait_timeout (USERSET), default = 0 (unlimited waiting) > > to provide better control over lock waits. > are these actual TODO items? i can't find them on the TODO list and i don't remember any discussion nor patch about this -- regards, Jaime Casanova "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs and the universe trying to produce bigger and better idiots. So far, the universe is winning." Richard Cook
On Sat, 2007-09-22 at 23:49 -0500, Jaime Casanova wrote: > On 6/19/07, Simon Riggs <simon@2ndquadrant.com> wrote: > > > > related TODO items: > > - add a WAIT n clause in same SQL locations as NOWAIT > > - add a lock_wait_timeout (USERSET), default = 0 (unlimited waiting) > > > > to provide better control over lock waits. > > > > are these actual TODO items? i can't find them on the TODO list and i > don't remember any discussion nor patch about this They are my proposals for TODO items to assist with application development. Waiting around DDL is just one reason to want those, though there are other similar issues with locking. I had one client with a program that waited for 57 hours before they noticed. statement_timeout was not an appropriate way to manage that and some form of timeout makes better sense. NOWAIT means extra programming to handle that case, rather than raising a normal error, so isn't always possible to change the app source. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On 9/24/07, Simon Riggs <simon@2ndquadrant.com> wrote: > On Sat, 2007-09-22 at 23:49 -0500, Jaime Casanova wrote: > > On 6/19/07, Simon Riggs <simon@2ndquadrant.com> wrote: > > > > > > related TODO items: > > > - add a WAIT n clause in same SQL locations as NOWAIT > > > - add a lock_wait_timeout (USERSET), default = 0 (unlimited waiting) > > > > > > to provide better control over lock waits. > > > > > > > are these actual TODO items? i can't find them on the TODO list and i > > don't remember any discussion nor patch about this > > They are my proposals for TODO items to assist with application > development. > while i'm not at all comfortable with the idea of a GUC for this, the WAIT clause seems to be useful. just out of curiosity, why the NOWAIT patch wasn't do it that way in first place, i mean like a WAIT clause and when receiving NOWAIT transform it in WAIT 0? maybe dicussion? there's concensus in adding a WAIT clause? -- regards, Jaime Casanova "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs and the universe trying to produce bigger and better idiots. So far, the universe is winning." Richard Cook
On Mon, 2007-09-24 at 21:26 -0500, Jaime Casanova wrote: > On 9/24/07, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Sat, 2007-09-22 at 23:49 -0500, Jaime Casanova wrote: > > > On 6/19/07, Simon Riggs <simon@2ndquadrant.com> wrote: > > > > > > > > related TODO items: > > > > - add a WAIT n clause in same SQL locations as NOWAIT > > > > - add a lock_wait_timeout (USERSET), default = 0 (unlimited waiting) > > > > > > > > to provide better control over lock waits. > > > > > > > > > > are these actual TODO items? i can't find them on the TODO list and i > > > don't remember any discussion nor patch about this > > > > They are my proposals for TODO items to assist with application > > development. > > > > while i'm not at all comfortable with the idea of a GUC for this, the > WAIT clause seems to be useful. > just out of curiosity, why the NOWAIT patch wasn't do it that way in > first place, i mean like a WAIT clause and when receiving NOWAIT > transform it in WAIT 0? > maybe dicussion? NOWAIT is used by Oracle. DB2 supports a lock wait timeout. What I didn't know before googling this was that SQLServer uses NOWAIT also. SQLServer also implement WAIT [n seconds] *and* a parameter called "lock wait period", which is pretty spooky. Another reason to implement this is to help avoid global deadlocks in distributed transactions (e.g. two phase). SQLServer and DB2 have more need of this than PostgreSQL, but we do still need it. > there's concensus in adding a WAIT clause? Just do it, but take careful note of any comments against things. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs <simon@2ndquadrant.com> writes: > SQLServer and DB2 have more need of this than PostgreSQL, but we do > still need it. Why? What does it do that statement_timeout doesn't do better? regards, tom lane
On Tue, 2007-09-25 at 09:16 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > SQLServer and DB2 have more need of this than PostgreSQL, but we do > > still need it. > > Why? What does it do that statement_timeout doesn't do better? If the execution time is negligible, then setting statement_timeout is the same thing as setting a lock timeout. If execution time is not negligible, then you may want to tell the difference between waiting for completion against waiting forever without doing anything useful at all. It's also easier to set an all encompassing lock timeout at User level than it is to set statement_timeout on individual transactions issued by that user. Plus, if applications are written using these concepts it is easier to port them to PostgreSQL. Not planning to work on this myself, but I think it is a valid TODO. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > On Tue, 2007-09-25 at 09:16 -0400, Tom Lane wrote: >> Simon Riggs <simon@2ndquadrant.com> writes: >> > SQLServer and DB2 have more need of this than PostgreSQL, but we do >> > still need it. >> >> Why? What does it do that statement_timeout doesn't do better? > > If the execution time is negligible, then setting statement_timeout is > the same thing as setting a lock timeout. To make this explicit, I think the typical scenario where it would make a difference is where you're running some large job in a plpgsql function. You might be processing millions of records but want for a single step of that process to not wait for a lock. You still want to process all the records you can though. So for example if you're updating all the user profiles on your system but don't want to block on any user-profiles which are locked by active users -- especially if you use database locks for user-visible operations which users can drag out for long periods of time. (Not saying I agree with that design but there are arguments for it and people do do it) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On 9/25/07, Simon Riggs <simon@2ndquadrant.com> wrote: > On Tue, 2007-09-25 at 09:16 -0400, Tom Lane wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > > > SQLServer and DB2 have more need of this than PostgreSQL, but we do > > > still need it. > > > > Why? What does it do that statement_timeout doesn't do better? > > If the execution time is negligible, then setting statement_timeout is > the same thing as setting a lock timeout. > > If execution time is not negligible, then you may want to tell the > difference between waiting for completion against waiting forever > without doing anything useful at all. > [...thinking on this a bit...] mmm... i think we can emulate WAIT number_of_seconds using the NOWAIT and a bit of logic... point for tom > > Plus, if applications are written using these concepts it is easier to > port them to PostgreSQL. > no words... point for simon... > Not planning to work on this myself, but I think it is a valid TODO. > i will make a try for 8.4 -- regards, Jaime Casanova "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs and the universe trying to produce bigger and better idiots. So far, the universe is winning." Richard Cook