Re: Strange Windows problem, lock_timeout test request - Mailing list pgsql-hackers
From | Boszormenyi Zoltan |
---|---|
Subject | Re: Strange Windows problem, lock_timeout test request |
Date | |
Msg-id | 514496DF.6070706@cybertec.at Whole thread Raw |
In response to | Re: Strange Windows problem, lock_timeout test request (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Strange Windows problem, lock_timeout test request
(Tom Lane <tgl@sss.pgh.pa.us>)
|
List | pgsql-hackers |
2013-03-15 18:53 keltezéssel, Tom Lane írta: > Boszormenyi Zoltan<zb@cybertec.at> writes: >> [ 2-lock_timeout-v33.patch ] > I looked at this patch a bit. I don't understand why you've chosen to > alter the API of the enable_timeout variants to have a bool result that > says "I didn't bother to process the timeout because it would have fired > immediately". I don't know either... > That is not an advantage for any call site that I can > see: it just means that the caller needs an extra, very difficult to > test code path to handle what would normally be handled by a timeout > interrupt. Even if it were a good API choice, you've broken a number of > existing call sites that the patch fails to touch (eg in postmaster.c > and postgres.c). It's not acceptable to define a failure return from > enable_timeout_after and then let callers assume that the failure can't > happen. > > Please undo that. Done. > Also, I'm not really enamored of the choice to use List* infrastructure > for enable_timeouts(). That doesn't appear to be especially convenient > for any caller, and what it does do is create memory-leak risks for most > of them (if they forget to free the list cells, perhaps as a result of > an error exit). I think a local-variable array of TimeoutParams[] > structs would serve better for most use-cases. Changed. However, the first member of the structure is "TimeoutId id" and a sensible end-of-array value can be -1. Some versions of GCC (maybe other compilers, too) complain if a constant is assigned to an enum which is outside of the declared values of the enum. So I added a "NO_TIMEOUT = -1" to enum TimeoutId. Comments? > Another thought is that the PGSemaphoreTimedLock implementations all > share the same bug, which is that if the "condition" callback returns > true immediately after we acquire the semaphore, they will all wrongly > return false indicating that the semaphore wasn't acquired. You are right. Fixed. > (BTW, > I do not like either the variable name "condition" or the typedef name > "TimeoutCondition" for something that's actually a callback function > pointer.) How about "TimeoutCallback" and "callback_fn"? > In the same vein, this comment change: > > * NOTE: Think not to put any shared-state cleanup after the call to > * ProcSleep, in either the normal or failure path. The lock state must > ! * be fully set by the lock grantor, or by CheckDeadLock if we give up > ! * waiting for the lock. This is necessary because of the possibility > ! * that a cancel/die interrupt will interrupt ProcSleep after someone else > ! * grants us the lock, but before we've noticed it. Hence, after granting, > ! * the locktable state must fully reflect the fact that we own the lock; > ! * we can't do additional work on return. > * > * We can and do use a PG_TRY block to try to clean up after failure, but > * this still has a major limitation: elog(FATAL) can occur while waiting > --- 1594,1606 ---- > /* > * NOTE: Think not to put any shared-state cleanup after the call to > * ProcSleep, in either the normal or failure path. The lock state must > ! * be fully set by the lock grantor, or by CheckDeadLock or by ProcSleep > ! * itself in case a timeout is detected or if we give up waiting for the lock. > ! * This is necessary because of the possibility that a cancel/die interrupt > ! * will interrupt ProcSleep after someone else grants us the lock, but > ! * before we've noticed it. Hence, after granting, the locktable state must > ! * fully reflect the fact that we own the lock; we can't do additional work > ! * on return. > * > > suggests that you've failed to grasp the fundamental race-condition > problem here. ProcSleep can't do cleanup after the fact any more than > its callers can, because otherwise it has a race condition with some > other process deciding to grant it the lock at about the same time its > timeout goes off. I think the changes in ProcSleep that alter the > state-cleanup behavior are just plain wrong, and what you need to do > is make that look more like the existing mechanisms that clean up when > statement timeout occurs. This was the most enlightening comment up to now. It seems no one else understood the timer code but you... Thanks very much. I hope the way I did it is right. I factored out the core of StatementCancelHandler() into a common function that can be used by the lock_timeout interrupt as its timeout callback function. Now the code doesn't need PGSemaphoreTimedLock(). While I was thinking about how this thing works, I recognized that I also need to set the timeout indicator to false after checking it in ProcessInterrupts(). The reason is that it's needed is this scenario: 1. lock_timeout is set and the transaction throws its error 2. lock_timeout is unset before the next transaction 3. the user presses Ctrl-C during the next transaction In this case, the second transaction would report the lock timeout error, since the indicator was still set. The other way would be to call disable_all_timeouts(false) unconditionally in ProcessInterrupts() but setting only the indicator is faster and it doesn't interfere with unknown registered timeout. Also, the reason is that it's disable_timeout_indicator() instead of a generic set_timeout_indicator() is that the generic one may be abused to produce false positives. Best regards, Zoltán Böszörményi > regards, tom lane > > -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web:http://www.postgresql-support.de http://www.postgresql.at/
Attachment
pgsql-hackers by date: