Thread: Problem with locks

Problem with locks

From
Gregory Stark
Date:
We're seeing a problem where occasionally a process appears to be granted a
lock but miss its semaphore signal. It eventually gets the 60s deadlock signal
and notices that it has been granted the lock and gets back to work.

I haven't gotten any further than this in the forensic analysis and it's a bit
tricky as it only happens in fairly long benchmark runs. It seems to happen
every few hours of running at 3000+ transactions per minute...

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: Problem with locks

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> We're seeing a problem where occasionally a process appears to be granted a
> lock but miss its semaphore signal.

Kernel bug maybe?  What's the platform?
        regards, tom lane


Re: Problem with locks

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> We're seeing a problem where occasionally a process appears to be granted a
>> lock but miss its semaphore signal.
>
> Kernel bug maybe?  What's the platform?

It does sound like it given the way my description went. I was worried it may
be some code path not setting waitStatus properly or the compiler caching it
incorrectly somehow.

But now that I check I see it's a pretty old kernel version (Linux 2.6.5) on
the machine. The schedule was completely rewritten by Nick Piggin since then.
I've suggested upgrading it but that may leave us in a bit of a pickle. If we
update it then that throws away all the benchmark history since we can't
really compare new benchmarks against old ones any more.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: Problem with locks

From
Gregory Stark
Date:
"Gregory Stark" <stark@enterprisedb.com> writes:

> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>
>> Gregory Stark <stark@enterprisedb.com> writes:
>>> We're seeing a problem where occasionally a process appears to be granted a
>>> lock but miss its semaphore signal.
>>
>> Kernel bug maybe?  What's the platform?
>
> It does sound like it given the way my description went. I was worried it may
> be some code path not setting waitStatus properly or the compiler caching it
> incorrectly somehow.
>
> But now that I check I see it's a pretty old kernel version (Linux 2.6.5) 

For what it's worth we've reproduced the problem with 2.6.16.21 which is
"only" about a year old. I want to rerun this with a shiny new 2.6.22 kernel
but really 2.6.16 is recent enough that I don't know of any major bugs fixed
in IPC handling since then (with the exception of hugetlb interaction which
we're not using on this machine) .

So now this is probably either an ongoing kernel bug affecting Postgres or
it's elsewhere -- either in Postgres or GCC.

I'm really concerned about this because while the behaviour with
deadlock_timeout set quite high (we have it set to 60s on this machine) is bad
enough -- the behaviour with it set to the default 1s is far more scary.

On the default 1s timeout on a machine undergoing lock waits which are mostly
under 1s you will probably never notice anything recognizably similar to this.
You'll occasionally have some lock waits which last a second for no good
reason but you'll never notice that. 

*But* if you should have a lock wait which lasts more than 1s before it's
granted, then when it's granted the semaphore gets lost you're in serious doo
doo. The deadlock timeout only fires once and then nothing's going to wake up
that process ever again.

IIRC we've actually gotten a couple reports of people claiming they've got a
"deadlock" when there was no evidence of a deadlock in pg_locks. We always
chalked it down to a single long-lived process holding the lock and blocking,
but never did much analysis on those reports to see if that was really the
case. It's quite possible we had users already observing this problem.

If it's a real problem then we're in a bit of a bind. Even if we find and fix
a Linux kernel problem we'll still have users on versions of the kernel prior
to 2.6.23 or whatever has the bug fixed. We may be best off including an
option to have the deadlock timer refire every deadlock_timeout interval
instead of just firing once. Then we could print a message any time it occurs
and include a HINT about upgrading to a kernel with the bug fixed.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: Problem with locks

From
Gregory Stark
Date:
"Gregory Stark" <stark@enterprisedb.com> writes:

> "Gregory Stark" <stark@enterprisedb.com> writes:
>
>> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>>
>>> Gregory Stark <stark@enterprisedb.com> writes:
>>>> We're seeing a problem where occasionally a process appears to be granted a
>>>> lock but miss its semaphore signal.
>>>
>>> Kernel bug maybe?  What's the platform?

I've written a synthetic test program to check for lost semaphore wakeups. I
can't seem to produce any on my machine but haven't had a chance to run it yet
on the benchmark machine that's been showing the problem.

If I can't produce any lost wakeups with a program like this it looks more
like it might be a Postgres or GCC bug than a Linux bug.

It would be helpful if people could run this on various architectures and
various versions of Linux (or other OSes). I've been running it with 40
processes for an hour, but even shorter runs would be useful. It will drive
the load on your machine through the roof but doesn't cause any i/o.

$ gcc -Wall ipctest.c -lrt
$ ./a.out 40 3600


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Attachment

Re: Problem with locks

From
Stefan Kaltenbrunner
Date:
Gregory Stark wrote:
> "Gregory Stark" <stark@enterprisedb.com> writes:
> 
>> "Gregory Stark" <stark@enterprisedb.com> writes:
>>
>>> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>>>
>>>> Gregory Stark <stark@enterprisedb.com> writes:
>>>>> We're seeing a problem where occasionally a process appears to be granted a
>>>>> lock but miss its semaphore signal.
>>>> Kernel bug maybe?  What's the platform?
> 
> I've written a synthetic test program to check for lost semaphore wakeups. I
> can't seem to produce any on my machine but haven't had a chance to run it yet
> on the benchmark machine that's been showing the problem.
> 
> If I can't produce any lost wakeups with a program like this it looks more
> like it might be a Postgres or GCC bug than a Linux bug.
> 
> It would be helpful if people could run this on various architectures and
> various versions of Linux (or other OSes). I've been running it with 40
> processes for an hour, but even shorter runs would be useful. It will drive
> the load on your machine through the roof but doesn't cause any i/o.

doesn't work on OpenBSD:

$ gcc -o ipctest ipctest.c -lpthread
$ ./ipctest 40 3600
running with 40 processes for 3600s
sem_init: Operation not permitted


"This implementation does not support shared semaphores, and reports
this fact by setting errno to EPERM.  This is perhaps a stretch of the
intention of POSIX, but is compliant, with the caveat that sem_init()
always reports a permissions error when an attempt to create a shared
semaphore is made."


Stefan


Re: Problem with locks

From
"Pavel Stehule"
Date:
2007/8/12, Gregory Stark <stark@enterprisedb.com>:
> > "Gregory Stark" <stark@enterprisedb.com> writes:
> >
> >> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
> >>
> >>> Gregory Stark <stark@enterprisedb.com> writes:
> >>>> We're seeing a problem where occasionally a process appears to be granted a
> >>>> lock but miss its semaphore signal.
> >>>
> >>> Kernel bug maybe?  What's the platform?
>
> I've written a synthetic test program to check for lost semaphore wakeups. I
> can't seem to produce any on my machine but haven't had a chance to run it yet
> on the benchmark machine that's been showing the problem.
>
> If I can't produce any lost wakeups with a program like this it looks more
> like it might be a Postgres or GCC bug than a Linux bug.
>
> It would be helpful if people could run this on various architectures and
> various versions of Linux (or other OSes). I've been running it with 40
> processes for an hour, but even shorter runs would be useful. It will drive
> the load on your machine through the roof but doesn't cause any i/o.
>
> $ gcc -Wall ipctest.c -lrt
> $ ./a.out 40 3600
>
>
> --
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>
>
>
Hello
[pavel@localhost ~]$ cat /proc/sys/kernel/osrelease
2.6.22.1-41.fc7

[pavel@localhost tmp]$ ./a.out 40 3600
running with 40 processes for 3600s
[pavel@localhost tmp]$

without any problem

Regards
Pavel Stehule


Re: Problem with locks

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> I've written a synthetic test program to check for lost semaphore wakeups.

Seems to me this proves nothing much, since it doesn't use the same SysV
semaphore API PG does.  Please adjust so that it looks more like our
code --- in particular there should be multiple processes having
semaphores in the same semid group.

Also, I think you have race conditions at shutdown --- the appearance of
the "thread %d lost a wakeup" message would not convince me there was
a bug in the least.  You need to make sure the workers exit at a known
point in their loop.
        regards, tom lane


Re: Problem with locks

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> I've written a synthetic test program to check for lost semaphore wakeups.
>
> Seems to me this proves nothing much, since it doesn't use the same SysV
> semaphore API PG does.  Please adjust so that it looks more like our
> code --- in particular there should be multiple processes having
> semaphores in the same semid group.

I was trying to copy the semaphore API exactly assuming
USE_NAMED_POSIX_SEMAPHORES was *not* defined. According to the comments we
prefer not to use named semaphores if possible.

> Also, I think you have race conditions at shutdown --- the appearance of
> the "thread %d lost a wakeup" message would not convince me there was
> a bug in the least.  You need to make sure the workers exit at a known
> point in their loop.

I intended to try to recreate the dynamics of the deadlock timeout timer
signal. This was just a first cut.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: Problem with locks

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> Seems to me this proves nothing much, since it doesn't use the same SysV
>> semaphore API PG does.

> I was trying to copy the semaphore API exactly assuming
> USE_NAMED_POSIX_SEMAPHORES was *not* defined. According to the comments we
> prefer not to use named semaphores if possible.

What you seem to have copied is the posix_sema.c code, which AFAIK is
only used on Darwin.  sysv_sema.c is what to look at ... unless your
benchmark machine is a Mac.
        regards, tom lane


Re: Problem with locks

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>>> Seems to me this proves nothing much, since it doesn't use the same SysV
>>> semaphore API PG does.
>
>> I was trying to copy the semaphore API exactly assuming
>> USE_NAMED_POSIX_SEMAPHORES was *not* defined. According to the comments we
>> prefer not to use named semaphores if possible.
>
> What you seem to have copied is the posix_sema.c code, which AFAIK is
> only used on Darwin.  sysv_sema.c is what to look at ... unless your
> benchmark machine is a Mac.

I switched the code over to the sysv_sema style api. It's gotten a bit grotty
and I would clean it up if it weren't a temporary test program. If we find a
real problem perhaps I should add a test case like this to the "smoke test" in
ipc_test.c so people can check their OS.

I did add something like the setitimer deadlock timeout to detect a process
stuck waiting. There is a race condition there if a process is woken up just
as the timer fires but if the timeout is large enough the chances of that are
pretty remote. Judging by the first thread the whole loop excluding the usleep
takes about 3ms. I've been using a timeout of 10 seconds. As such:

$ ./a.out 40 900 10
running with 40 processes for 900s with timeout of 10000ms
telling threads to exit
run done
cleaning up semaphores and shared memory




--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Attachment

Re: Problem with locks

From
Alvaro Herrera
Date:
Gregory Stark wrote:

> I switched the code over to the sysv_sema style api. It's gotten a bit grotty
> and I would clean it up if it weren't a temporary test program. If we find a
> real problem perhaps I should add a test case like this to the "smoke test" in
> ipc_test.c so people can check their OS. 

So did you discover anything?  I ran your test program and it worked
successfully for several different configurations.  Not enough times
maybe, though.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Problem with locks

From
Gregory Stark
Date:
"Alvaro Herrera" <alvherre@commandprompt.com> writes:

> Gregory Stark wrote:
>
>> I switched the code over to the sysv_sema style api. It's gotten a bit grotty
>> and I would clean it up if it weren't a temporary test program. If we find a
>> real problem perhaps I should add a test case like this to the "smoke test" in
>> ipc_test.c so people can check their OS. 
>
> So did you discover anything?  I ran your test program and it worked
> successfully for several different configurations.  Not enough times
> maybe, though.

I haven't been able to find any kernel problem which would explain the
timeouts. The test program seems to work fine on all the machines I've tested
it on except one where it turned up seemingly unrelated (and far worse)
problems.

But looking over the old test results from other machines I can see occasional
transaction response times which exactly match the deadlock_timeout even
though there should be no deadlocks. Apparently this happens with older
releases of Postgres too.

So I am fairly stumped here. There's really no way I can see where we would
have the deadlock signal handler firing, not doing anything, but causing a
semaphore wait to return.

I've updated the kernel and will be running more benchmarks with the updated
kernel next week. But I don't expect the results to change.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com