Thread: Problem with locks
We're seeing a problem where occasionally a process appears to be granted a lock but miss its semaphore signal. It eventually gets the 60s deadlock signal and notices that it has been granted the lock and gets back to work. I haven't gotten any further than this in the forensic analysis and it's a bit tricky as it only happens in fairly long benchmark runs. It seems to happen every few hours of running at 3000+ transactions per minute... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > We're seeing a problem where occasionally a process appears to be granted a > lock but miss its semaphore signal. Kernel bug maybe? What's the platform? regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> We're seeing a problem where occasionally a process appears to be granted a >> lock but miss its semaphore signal. > > Kernel bug maybe? What's the platform? It does sound like it given the way my description went. I was worried it may be some code path not setting waitStatus properly or the compiler caching it incorrectly somehow. But now that I check I see it's a pretty old kernel version (Linux 2.6.5) on the machine. The schedule was completely rewritten by Nick Piggin since then. I've suggested upgrading it but that may leave us in a bit of a pickle. If we update it then that throws away all the benchmark history since we can't really compare new benchmarks against old ones any more. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
"Gregory Stark" <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: > >> Gregory Stark <stark@enterprisedb.com> writes: >>> We're seeing a problem where occasionally a process appears to be granted a >>> lock but miss its semaphore signal. >> >> Kernel bug maybe? What's the platform? > > It does sound like it given the way my description went. I was worried it may > be some code path not setting waitStatus properly or the compiler caching it > incorrectly somehow. > > But now that I check I see it's a pretty old kernel version (Linux 2.6.5) For what it's worth we've reproduced the problem with 2.6.16.21 which is "only" about a year old. I want to rerun this with a shiny new 2.6.22 kernel but really 2.6.16 is recent enough that I don't know of any major bugs fixed in IPC handling since then (with the exception of hugetlb interaction which we're not using on this machine) . So now this is probably either an ongoing kernel bug affecting Postgres or it's elsewhere -- either in Postgres or GCC. I'm really concerned about this because while the behaviour with deadlock_timeout set quite high (we have it set to 60s on this machine) is bad enough -- the behaviour with it set to the default 1s is far more scary. On the default 1s timeout on a machine undergoing lock waits which are mostly under 1s you will probably never notice anything recognizably similar to this. You'll occasionally have some lock waits which last a second for no good reason but you'll never notice that. *But* if you should have a lock wait which lasts more than 1s before it's granted, then when it's granted the semaphore gets lost you're in serious doo doo. The deadlock timeout only fires once and then nothing's going to wake up that process ever again. IIRC we've actually gotten a couple reports of people claiming they've got a "deadlock" when there was no evidence of a deadlock in pg_locks. We always chalked it down to a single long-lived process holding the lock and blocking, but never did much analysis on those reports to see if that was really the case. It's quite possible we had users already observing this problem. If it's a real problem then we're in a bit of a bind. Even if we find and fix a Linux kernel problem we'll still have users on versions of the kernel prior to 2.6.23 or whatever has the bug fixed. We may be best off including an option to have the deadlock timer refire every deadlock_timeout interval instead of just firing once. Then we could print a message any time it occurs and include a HINT about upgrading to a kernel with the bug fixed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
"Gregory Stark" <stark@enterprisedb.com> writes: > "Gregory Stark" <stark@enterprisedb.com> writes: > >> "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> >>> Gregory Stark <stark@enterprisedb.com> writes: >>>> We're seeing a problem where occasionally a process appears to be granted a >>>> lock but miss its semaphore signal. >>> >>> Kernel bug maybe? What's the platform? I've written a synthetic test program to check for lost semaphore wakeups. I can't seem to produce any on my machine but haven't had a chance to run it yet on the benchmark machine that's been showing the problem. If I can't produce any lost wakeups with a program like this it looks more like it might be a Postgres or GCC bug than a Linux bug. It would be helpful if people could run this on various architectures and various versions of Linux (or other OSes). I've been running it with 40 processes for an hour, but even shorter runs would be useful. It will drive the load on your machine through the roof but doesn't cause any i/o. $ gcc -Wall ipctest.c -lrt $ ./a.out 40 3600 -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Attachment
Gregory Stark wrote: > "Gregory Stark" <stark@enterprisedb.com> writes: > >> "Gregory Stark" <stark@enterprisedb.com> writes: >> >>> "Tom Lane" <tgl@sss.pgh.pa.us> writes: >>> >>>> Gregory Stark <stark@enterprisedb.com> writes: >>>>> We're seeing a problem where occasionally a process appears to be granted a >>>>> lock but miss its semaphore signal. >>>> Kernel bug maybe? What's the platform? > > I've written a synthetic test program to check for lost semaphore wakeups. I > can't seem to produce any on my machine but haven't had a chance to run it yet > on the benchmark machine that's been showing the problem. > > If I can't produce any lost wakeups with a program like this it looks more > like it might be a Postgres or GCC bug than a Linux bug. > > It would be helpful if people could run this on various architectures and > various versions of Linux (or other OSes). I've been running it with 40 > processes for an hour, but even shorter runs would be useful. It will drive > the load on your machine through the roof but doesn't cause any i/o. doesn't work on OpenBSD: $ gcc -o ipctest ipctest.c -lpthread $ ./ipctest 40 3600 running with 40 processes for 3600s sem_init: Operation not permitted "This implementation does not support shared semaphores, and reports this fact by setting errno to EPERM. This is perhaps a stretch of the intention of POSIX, but is compliant, with the caveat that sem_init() always reports a permissions error when an attempt to create a shared semaphore is made." Stefan
2007/8/12, Gregory Stark <stark@enterprisedb.com>: > > "Gregory Stark" <stark@enterprisedb.com> writes: > > > >> "Tom Lane" <tgl@sss.pgh.pa.us> writes: > >> > >>> Gregory Stark <stark@enterprisedb.com> writes: > >>>> We're seeing a problem where occasionally a process appears to be granted a > >>>> lock but miss its semaphore signal. > >>> > >>> Kernel bug maybe? What's the platform? > > I've written a synthetic test program to check for lost semaphore wakeups. I > can't seem to produce any on my machine but haven't had a chance to run it yet > on the benchmark machine that's been showing the problem. > > If I can't produce any lost wakeups with a program like this it looks more > like it might be a Postgres or GCC bug than a Linux bug. > > It would be helpful if people could run this on various architectures and > various versions of Linux (or other OSes). I've been running it with 40 > processes for an hour, but even shorter runs would be useful. It will drive > the load on your machine through the roof but doesn't cause any i/o. > > $ gcc -Wall ipctest.c -lrt > $ ./a.out 40 3600 > > > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > > > Hello [pavel@localhost ~]$ cat /proc/sys/kernel/osrelease 2.6.22.1-41.fc7 [pavel@localhost tmp]$ ./a.out 40 3600 running with 40 processes for 3600s [pavel@localhost tmp]$ without any problem Regards Pavel Stehule
Gregory Stark <stark@enterprisedb.com> writes: > I've written a synthetic test program to check for lost semaphore wakeups. Seems to me this proves nothing much, since it doesn't use the same SysV semaphore API PG does. Please adjust so that it looks more like our code --- in particular there should be multiple processes having semaphores in the same semid group. Also, I think you have race conditions at shutdown --- the appearance of the "thread %d lost a wakeup" message would not convince me there was a bug in the least. You need to make sure the workers exit at a known point in their loop. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> I've written a synthetic test program to check for lost semaphore wakeups. > > Seems to me this proves nothing much, since it doesn't use the same SysV > semaphore API PG does. Please adjust so that it looks more like our > code --- in particular there should be multiple processes having > semaphores in the same semid group. I was trying to copy the semaphore API exactly assuming USE_NAMED_POSIX_SEMAPHORES was *not* defined. According to the comments we prefer not to use named semaphores if possible. > Also, I think you have race conditions at shutdown --- the appearance of > the "thread %d lost a wakeup" message would not convince me there was > a bug in the least. You need to make sure the workers exit at a known > point in their loop. I intended to try to recreate the dynamics of the deadlock timeout timer signal. This was just a first cut. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> Seems to me this proves nothing much, since it doesn't use the same SysV >> semaphore API PG does. > I was trying to copy the semaphore API exactly assuming > USE_NAMED_POSIX_SEMAPHORES was *not* defined. According to the comments we > prefer not to use named semaphores if possible. What you seem to have copied is the posix_sema.c code, which AFAIK is only used on Darwin. sysv_sema.c is what to look at ... unless your benchmark machine is a Mac. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> "Tom Lane" <tgl@sss.pgh.pa.us> writes: >>> Seems to me this proves nothing much, since it doesn't use the same SysV >>> semaphore API PG does. > >> I was trying to copy the semaphore API exactly assuming >> USE_NAMED_POSIX_SEMAPHORES was *not* defined. According to the comments we >> prefer not to use named semaphores if possible. > > What you seem to have copied is the posix_sema.c code, which AFAIK is > only used on Darwin. sysv_sema.c is what to look at ... unless your > benchmark machine is a Mac. I switched the code over to the sysv_sema style api. It's gotten a bit grotty and I would clean it up if it weren't a temporary test program. If we find a real problem perhaps I should add a test case like this to the "smoke test" in ipc_test.c so people can check their OS. I did add something like the setitimer deadlock timeout to detect a process stuck waiting. There is a race condition there if a process is woken up just as the timer fires but if the timeout is large enough the chances of that are pretty remote. Judging by the first thread the whole loop excluding the usleep takes about 3ms. I've been using a timeout of 10 seconds. As such: $ ./a.out 40 900 10 running with 40 processes for 900s with timeout of 10000ms telling threads to exit run done cleaning up semaphores and shared memory -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Attachment
Gregory Stark wrote: > I switched the code over to the sysv_sema style api. It's gotten a bit grotty > and I would clean it up if it weren't a temporary test program. If we find a > real problem perhaps I should add a test case like this to the "smoke test" in > ipc_test.c so people can check their OS. So did you discover anything? I ran your test program and it worked successfully for several different configurations. Not enough times maybe, though. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
"Alvaro Herrera" <alvherre@commandprompt.com> writes: > Gregory Stark wrote: > >> I switched the code over to the sysv_sema style api. It's gotten a bit grotty >> and I would clean it up if it weren't a temporary test program. If we find a >> real problem perhaps I should add a test case like this to the "smoke test" in >> ipc_test.c so people can check their OS. > > So did you discover anything? I ran your test program and it worked > successfully for several different configurations. Not enough times > maybe, though. I haven't been able to find any kernel problem which would explain the timeouts. The test program seems to work fine on all the machines I've tested it on except one where it turned up seemingly unrelated (and far worse) problems. But looking over the old test results from other machines I can see occasional transaction response times which exactly match the deadlock_timeout even though there should be no deadlocks. Apparently this happens with older releases of Postgres too. So I am fairly stumped here. There's really no way I can see where we would have the deadlock signal handler firing, not doing anything, but causing a semaphore wait to return. I've updated the kernel and will be running more benchmarks with the updated kernel next week. But I don't expect the results to change. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com