Re: [PATCH] lock_timeout and common SIGALRM framework - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [PATCH] lock_timeout and common SIGALRM framework |
Date | |
Msg-id | 16363.1348339778@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [PATCH] lock_timeout and common SIGALRM framework (Boszormenyi Zoltan <zb@cybertec.at>) |
Responses |
Re: [PATCH] lock_timeout and common SIGALRM framework
Re: [PATCH] lock_timeout and common SIGALRM framework |
List | pgsql-hackers |
Boszormenyi Zoltan <zb@cybertec.at> writes: > new version with a lot more cleanup is attached. I looked at this patch, and frankly I'm rather dismayed. It's a mess. To start at the bottom level, the changes to PGSemaphoreLock broke it, and seem probably unnecessary anyway. As coded, calling the "condition checker" breaks handling of error returns from semop(), unless the checker is careful to preserve errno, which LmgrTimeoutCondition isn't (and really shouldn't need to be anyway). More, if the checker does return true, it causes PGSemaphoreLock to utterly violate its contract: it returns to the caller without having acquired the semaphore, and without even telling the caller so. Worse, if we *did* acquire the semaphore, we might still exit via this path, since the placement of the condition check call ignores the comment a few lines up: * Once we acquire the lock, we do NOT check for an interrupt before * returning. The caller needs to be able to recordownership of the lock * before any interrupt can be accepted. We could possibly fix all this with a redesigned API contract for PGSemaphoreLock, but frankly I do not see a good reason to be tinkering with it at all. We never needed to get it involved with deadlock check handling, and I don't see why that needs to change for lock timeouts. One very good reason why monkeying with PGSemaphoreLock is wrong is that on some platforms a SIGALRM interrupt won't interrupt the semop() call, and thus control would never reach the "checker" anyway. If we're going to throw an error, it must be thrown from the interrupt handler. The whole lmgrtimeout module seems to me to be far more mechanism than is warranted, for too little added functionality. In the first place, there is nothing on the horizon suggesting that we need to let any plug-in code get control here, and if anything the delicacy of what's going on leads me to not wish to expose such a possibility. In the second place, it isn't adding any actually useful functionality, it's just agglomerating some checks. The minimum thing I would want it to do is avoid calling timeout.c multiple times, which is what would happen right now (leading to four extra syscalls per lock acquisition, which is enough new overhead to constitute a strong objection to committing this patch at all). On the whole I think we could forget lmgrtimeout and just hardwire the lock timeout and deadlock check cases. But in any case we're going to need support in timeout.c for enabling/disabling multiple timeouts at once without extra setitimer calls. I'm also not thrilled about the way in which the existing deadlock checking code has been hacked up. As an example, you added this to DeadLockReport(): + if (!DeadLockTimeoutCondition()) + return; which again causes it to violate its contract, namely to report a deadlock, in the most fundamental way -- existing callers aren't expecting it to return *at all*. Surely we can decouple the deadlock and lock timeout cases better than that; or at least if we can't it's a delusion to propose anything like lmgrtimeout in the first place. There's considerable lack of attention to updating comments, too. For instance in WaitOnLock you only bothered to update the comment immediately adjacent to the changed code, and not the two comment blocks above that, which both have specific references to deadlocks being the reason for failure. Also, the "per statement" mode for lock timeout doesn't seem to be any such thing, because it's implemented like this: + case LOCK_TIMEOUT_PER_STMT: + enable_timeout_at(LOCK_TIMEOUT, + TimestampTzPlusMilliseconds( + GetCurrentStatementStartTimestamp(), + LockTimeout)); + break; That doesn't provide anything like "you can spend at most N milliseconds waiting for locks during a statement". What it is is "if you happen to be waiting for a lock N milliseconds after the statement starts, or if you attempt to acquire any lock more than N milliseconds after the statement starts, you lose instantly". I don't think that definition actually adds any useful functionality compared to setting statement_timeout to N milliseconds, and it's certainly wrongly documented. To do what the documentation implies would require tracking and adding up the time spent waiting for locks during a statement. Which might be a good thing to do, especially if the required gettimeofday() calls could be shared with what timeout.c probably has to do anyway at start and stop of a lock wait. But this code doesn't do it. Lastly, I'm not sure where is the best place to be adding the control logic for this, but I'm pretty sure postinit.c is not it. It oughta be somewhere under storage/lmgr/, no? regards, tom lane
pgsql-hackers by date: