Thread: stuck spin lock with many concurrent users

stuck spin lock with many concurrent users

From
Tatsuo Ishii
Date:
I have seen problems with extremely many concurrent users.
I run pgbench:

pgbench -c 1000 -t 1 test

And I get stuck spin lock errors. This is 100% reproducable (i.e. I
have nerver succeeded in pgbench -c 1000).

This is Linux kernel 2.2.18. Followings are some resource settings
that seem crytical to me.

kernel.shmall = 134217728
kernel.shmmax = 134217728
fs.file-max = 65536

There are 1GB physical memory and 2GB swap space. Note that same
things happen in Sparc/Solaris with much less users (500 concurrent
users).
--
Tatsuo Ishii


Re: stuck spin lock with many concurrent users

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> I have seen problems with extremely many concurrent users.
> I run pgbench:

> pgbench -c 1000 -t 1 test

> And I get stuck spin lock errors. This is 100% reproducable (i.e. I
> have nerver succeeded in pgbench -c 1000).

Is it actually stuck, or just timing out due to huge contention?
You could try increasing the timeout intervals in s_lock.c to
make sure.  If it is stuck, on which lock(s)?
        regards, tom lane


Re: stuck spin lock with many concurrent users

From
Tatsuo Ishii
Date:
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > I have seen problems with extremely many concurrent users.
> > I run pgbench:
> 
> > pgbench -c 1000 -t 1 test
> 
> > And I get stuck spin lock errors. This is 100% reproducable (i.e. I
> > have nerver succeeded in pgbench -c 1000).
> 
> Is it actually stuck, or just timing out due to huge contention?
> You could try increasing the timeout intervals in s_lock.c to
> make sure.  

I believe it's an actual stuck. From s_lock.c:

#define DEFAULT_TIMEOUT (100*1000000)    /* default timeout: 100 sec */

So even if there are 1000 contentions, 100 sec should be enough (100
msec for each backend).

> If it is stuck, on which lock(s)?

How can I check it? In that situation, it's very hard to attacth a
debugger to the backend process. 1000 backends consum all CPU time.
--
Tatsuo Ishii



Re: stuck spin lock with many concurrent users

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> If it is stuck, on which lock(s)?

> How can I check it?

The 'stuck' message should at least give you a code location...
        regards, tom lane


Re: stuck spin lock with many concurrent users

From
Tatsuo Ishii
Date:
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> >> If it is stuck, on which lock(s)?
> 
> > How can I check it?
> 
> The 'stuck' message should at least give you a code location...

Here is the actual message:

FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting.

Last several queries before stuck spinlock are:

DEBUG:  query: update branches set bbalance = bbalance + 436 where bid = 1
DEBUG:  query: update tellers set tbalance = tbalance + 230 where tid = 17

DEBUG:  query: update tellers set tbalance = tbalance + 740 where tid = 7

DEBUG:  query: update tellers set tbalance = tbalance + 243 where tid = 13

DEBUG:  query: select abalance from accounts where aid = 177962
DEBUG:  query: update tellers set tbalance = tbalance + 595 where tid = 18

DEBUG:  query: update branches set bbalance = bbalance + 595 where bid = 1
DEBUG:  query: update tellers set tbalance = tbalance + 252 where tid = 15

I'm trying now is increasing the timeout to 10 times longer. Will
report in next email...
--
Tatsuo Ishii


Re: stuck spin lock with many concurrent users

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>>> How can I check it?
>> 
>> The 'stuck' message should at least give you a code location...

> FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting.

Hmm, that's SpinAcquire, so it's one of the predefined spinlocks
(and not, say, a buffer spinlock).  You could try adding some
debug logging here, although the output would be voluminous.
But what would really be useful is a stack trace for the stuck
process.  Consider changing the s_lock code to abort() when it
gets a stuck spinlock --- then you could gdb the coredump.
        regards, tom lane


Re: stuck spin lock with many concurrent users

From
Tatsuo Ishii
Date:
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> >>> How can I check it?
> >> 
> >> The 'stuck' message should at least give you a code location...
> 
> > FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting.
> 
> Hmm, that's SpinAcquire, so it's one of the predefined spinlocks
> (and not, say, a buffer spinlock).  You could try adding some
> debug logging here, although the output would be voluminous.
> But what would really be useful is a stack trace for the stuck
> process.  Consider changing the s_lock code to abort() when it
> gets a stuck spinlock --- then you could gdb the coredump.

Nice idea. I will try that.
--
Tatsuo Ishii


Re: stuck spin lock with many concurrent users

From
Tatsuo Ishii
Date:
> > > FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting.
> > 
> > Hmm, that's SpinAcquire, so it's one of the predefined spinlocks
> > (and not, say, a buffer spinlock).  You could try adding some
> > debug logging here, although the output would be voluminous.
> > But what would really be useful is a stack trace for the stuck
> > process.  Consider changing the s_lock code to abort() when it
> > gets a stuck spinlock --- then you could gdb the coredump.
> 
> Nice idea. I will try that.

I got an interesting result. If I compile backend with -g (and without
-O2), I get no stuck spin lock errors. However, if s_lock.c is
compiled with -O2 enabled, I got the error again. It seems only
s_lock.c is related to this phenomenon.
--
Tatsuo Ishii


Re: stuck spin lock with many concurrent users

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> I got an interesting result. If I compile backend with -g (and without
> -O2), I get no stuck spin lock errors. However, if s_lock.c is
> compiled with -O2 enabled, I got the error again. It seems only
> s_lock.c is related to this phenomenon.

That's very interesting.  Could optimization be breaking the TAS
sequence on your platform?  What is your platform, anyway?
Might need to burrow into the assembly code to see just what's
happening.
        regards, tom lane


Re: stuck spin lock with many concurrent users

From
Tatsuo Ishii
Date:
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > I got an interesting result. If I compile backend with -g (and without
> > -O2), I get no stuck spin lock errors. However, if s_lock.c is
> > compiled with -O2 enabled, I got the error again. It seems only
> > s_lock.c is related to this phenomenon.
> 
> That's very interesting.  Could optimization be breaking the TAS
> sequence on your platform?  What is your platform, anyway?
> Might need to burrow into the assembly code to see just what's
> happening.

As I said, it's a x86 Linux (more precisely, kernel 2.2.18 with 2
processors, egcs 2.91). I suspect that the inlined TAS code might be
incompatible with the caller, s_lock.
--
Tatsuo Ishii


Re: stuck spin lock with many concurrent users

From
Tatsuo Ishii
Date:
> > Tatsuo Ishii <t-ishii@sra.co.jp> writes
> > >>> How can I check it?
> > >> 
> > >> The 'stuck' message should at least give you a code location...
> > 
> > > FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting.
> > 
> > Hmm, that's SpinAcquire, so it's one of the predefined spinlocks
> > (and not, say, a buffer spinlock).  You could try adding some
> > debug logging here, although the output would be voluminous.
> > But what would really be useful is a stack trace for the stuck
> > process.  Consider changing the s_lock code to abort() when it
> > gets a stuck spinlock --- then you could gdb the coredump.
> 
> Nice idea. I will try that.

It appeared that the deadlock checking timer seems to be the source of
the problem. With the default settings, it checks deadlocks every 1
second PER backend. So if there are 1000 backends, every 1 msec
there's a signal and a shared memory locking in average. That would be
too much. If increase the dealock_timeout to , say 100000, the problem
seems gone. Also the performance increased SIGNIFICANTLY. Before that
I got only 1-2 TPS, but now I get ~20 TPS using pgbench -c 1000.

Here is the backtrace:

#0  0x2ab56d21 in __kill () from /lib/libc.so.6
#1  0x2ab56996 in raise (sig=6) at ../sysdeps/posix/raise.c:27
#2  0x2ab580b8 in abort () at ../sysdeps/generic/abort.c:88
#3  0x80ece1a in s_lock_stuck (lock=0x2ac2d016 "\001",    file=0x816e7bc "spin.c", line=158) at s_lock.c:70
#4  0x80ecf3e in s_lock_sleep (spins=20001, timeout=100000000, microsec=5000,    lock=0x2ac2d016 "\001", file=0x816e7bc
"spin.c",line=158) at s_lock.c:109
 
#5  0x80ecfa3 in s_lock (lock=0x2ac2d016 "\001", file=0x816e7bc "spin.c",    line=158) at s_lock.c:136
#6  0x80efb4d in SpinAcquire (lockid=6) at spin.c:158
#7  0x80f2305 in HandleDeadLock (postgres_signal_arg=14) at proc.c:819
#8  <signal handler called>
#9  0x2abeb134 in semop (semid=32786, sops=0x7fffeebc, nsops=1)   at ../sysdeps/unix/sysv/linux/semop.c:34
#10 0x80ee460 in IpcSemaphoreLock (semId=32786, sem=13, interruptOK=1 '\001')   at ipc.c:426
#11 0x80f217f in ProcSleep (lockMethodTable=0x81c1708, lockmode=6,    lock=0x2ce0ab18, holder=0x2ce339b0) at
proc.c:666
#12 0x80f14ff in WaitOnLock (lockmethod=1, lockmode=6, lock=0x2ce0ab18,    holder=0x2ce339b0) at lock.c:955
#13 0x80f1298 in LockAcquire (lockmethod=1, locktag=0x7fffeffc, xid=130139,    lockmode=6) at lock.c:739
#14 0x80f0a23 in LockPage (relation=0x2dbeb9d0, blkno=0, lockmode=6)
#15 0x8071ceb in RelationGetBufferForTuple (relation=0x2dbeb9d0, len=132)   at hio.c:97
#16 0x8070293 in heap_update (relation=0x2dbeb9d0, otid=0x7ffff114,    newtup=0x82388c8, ctid=0x7ffff0b0) at
heapam.c:1737
#17 0x80b6825 in ExecReplace (slot=0x823af60, tupleid=0x7ffff114,    estate=0x8238a58) at execMain.c:1450
#18 0x80b651e in ExecutePlan (estate=0x8238a58, plan=0x8238d00,    operation=CMD_UPDATE, numberTuples=0,
direction=ForwardScanDirection,   destfunc=0x823b680) at execMain.c:1125
 
#19 0x80b5af3 in ExecutorRun (queryDesc=0x8239080, estate=0x8238a58,    feature=3, count=0) at execMain.c:233
#20 0x80f6d93 in ProcessQuery (parsetree=0x822bc18, plan=0x8238d00,    dest=Remote) at pquery.c:295
#21 0x80f599b in pg_exec_query_string (   query_string=0x822b8c0 "update accounts set abalance = abalance + 277 where
aid= 41148\n", dest=Remote, parse_context=0x81fc850) at postgres.c:810
 
#22 0x80f68c6 in PostgresMain (argc=4, argv=0x7ffff380, real_argc=3,    real_argv=0x7ffffc94, username=0x81cd981
"t-ishii")at postgres.c:1908
 
#23 0x80e1ee3 in DoBackend (port=0x81cd718) at postmaster.c:2120
#24 0x80e1acc in BackendStartup (port=0x81cd718) at postmaster.c:1903
#25 0x80e0e26 in ServerLoop () at postmaster.c:995
#26 0x80e0853 in PostmasterMain (argc=3, argv=0x7ffffc94) at postmaster.c:685
#27 0x80c4865 in main (argc=3, argv=0x7ffffc94) at main.c:175
#28 0x2ab509cb in __libc_start_main (main=0x80c4750 <main>, argc=3,    argv=0x7ffffc94, init=0x80656c4 <_init>,
fini=0x81395ac<_fini>,    rtld_fini=0x2aab5ea0 <_dl_fini>, stack_end=0x7ffffc8c)   at
../sysdeps/generic/libc-start.c:92


Re: stuck spin lock with many concurrent users

From
Hiroshi Inoue
Date:
Tatsuo Ishii wrote:
> 
> > > Tatsuo Ishii <t-ishii@sra.co.jp> writes
> > > >>> How can I check it?
> > > >>
> > > >> The 'stuck' message should at least give you a code location...
> > >
> > > > FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting.
> > >
> > > Hmm, that's SpinAcquire, so it's one of the predefined spinlocks
> > > (and not, say, a buffer spinlock).  You could try adding some
> > > debug logging here, although the output would be voluminous.
> > > But what would really be useful is a stack trace for the stuck
> > > process.  Consider changing the s_lock code to abort() when it
> > > gets a stuck spinlock --- then you could gdb the coredump.
> >
> > Nice idea. I will try that.
> 
> It appeared that the deadlock checking timer seems to be the source of
> the problem. With the default settings, it checks deadlocks every 1
> second PER backend. 

IIRC deadlock check was called only once per backend.
It seems to have been changed between 7.0 and 7.1.
Does it take effect to disable timer at the beginging of
HandleDeadLock() ?

regards,
Hiroshi Inoue


Re: stuck spin lock with many concurrent users

From
Tatsuo Ishii
Date:
> Tatsuo Ishii wrote:
> > 
> > > > Tatsuo Ishii <t-ishii@sra.co.jp> writes
> > > > >>> How can I check it?
> > > > >>
> > > > >> The 'stuck' message should at least give you a code location...
> > > >
> > > > > FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting.
> > > >
> > > > Hmm, that's SpinAcquire, so it's one of the predefined spinlocks
> > > > (and not, say, a buffer spinlock).  You could try adding some
> > > > debug logging here, although the output would be voluminous.
> > > > But what would really be useful is a stack trace for the stuck
> > > > process.  Consider changing the s_lock code to abort() when it
> > > > gets a stuck spinlock --- then you could gdb the coredump.
> > >
> > > Nice idea. I will try that.
> > 
> > It appeared that the deadlock checking timer seems to be the source of
> > the problem. With the default settings, it checks deadlocks every 1
> > second PER backend. 
> 
> IIRC deadlock check was called only once per backend.

In my understanding the deadlock check is performed every time the
backend aquires lock. Once the it aquires, it kill the timer. However,
under heavy transactions such as pgbench generates, chances are that
the checking fires, and it tries to aquire a spin lock. That seems the
situation.
--
Tatsuo Ishii


Re: stuck spin lock with many concurrent users

From
Hiroshi Inoue
Date:
Tatsuo Ishii wrote:
> 
> > Tatsuo Ishii wrote:
> > >
> > > > > Tatsuo Ishii <t-ishii@sra.co.jp> writes
> > > > > >>> How can I check it?
> > > > > >>
> > > > > >> The 'stuck' message should at least give you a code location...
> > > > >
> > > > > > FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting.
> > > > >
> > > > > Hmm, that's SpinAcquire, so it's one of the predefined spinlocks
> > > > > (and not, say, a buffer spinlock).  You could try adding some
> > > > > debug logging here, although the output would be voluminous.
> > > > > But what would really be useful is a stack trace for the stuck
> > > > > process.  Consider changing the s_lock code to abort() when it
> > > > > gets a stuck spinlock --- then you could gdb the coredump.
> > > >
> > > > Nice idea. I will try that.
> > >
> > > It appeared that the deadlock checking timer seems to be the source of
> > > the problem. With the default settings, it checks deadlocks every 1
> > > second PER backend.
> >
> > IIRC deadlock check was called only once per backend.
> 
> In my understanding the deadlock check is performed every time the
> backend aquires lock. Once the it aquires, it kill the timer. 

Yes, but deadlock check is needed only once and timer should
be disabled then also.

regards,
Hiroshi Inoue


Re: stuck spin lock with many concurrent users

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> It appeared that the deadlock checking timer seems to be the source of
> the problem. With the default settings, it checks deadlocks every 1
> second PER backend.

I don't believe it.  setitimer with it_interval = 0 should produce one
interrupt, no more.
        regards, tom lane


Re: stuck spin lock with many concurrent users

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> In my understanding the deadlock check is performed every time the
> backend aquires lock. Once the it aquires, it kill the timer. However,
> under heavy transactions such as pgbench generates, chances are that
> the checking fires, and it tries to aquire a spin lock. That seems the
> situation.

It could be that with ~1000 backends all waiting for the same lock, the
deadlock-checking code just plain takes too long to run.  It might have
an O(N^2) or worse behavior in the length of the queue; I don't think
the code was ever analyzed for such problems.

Do you want to try adding some instrumentation to HandleDeadlock to see
how long it runs on each call?
        regards, tom lane


Re: stuck spin lock with many concurrent users

From
Tatsuo Ishii
Date:
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > In my understanding the deadlock check is performed every time the
> > backend aquires lock. Once the it aquires, it kill the timer. However,
> > under heavy transactions such as pgbench generates, chances are that
> > the checking fires, and it tries to aquire a spin lock. That seems the
> > situation.
> 
> It could be that with ~1000 backends all waiting for the same lock, the
> deadlock-checking code just plain takes too long to run.  It might have
> an O(N^2) or worse behavior in the length of the queue; I don't think
> the code was ever analyzed for such problems.
> 
> Do you want to try adding some instrumentation to HandleDeadlock to see
> how long it runs on each call?

I added some codes into HandleDeadLock to measure how long
LockLockTable and DeadLOckCheck calls take. Followings are the result
in running pgbench -c 1000 (it failed with stuck spin lock
error). "real time" shows how long they actually run (using
gettimeofday). "user time" and "system time" are measured by calling
getrusage. The time unit is milli second.
LockLockTable: real time
min |  max   |        avg        
-----+--------+-------------------  0 | 867873 | 152874.9015151515
LockLockTable: user time
min | max |     avg      
-----+-----+--------------  0 |  30 | 1.2121212121
LockLockTable: system time
min | max  |      avg       
-----+------+----------------  0 | 2140 | 366.5909090909

DeadLockCheck: real time
min |  max  |       avg       
-----+-------+-----------------  0 | 87671 | 3463.6996197719
DeadLockCheck: user time
min | max |      avg      
-----+-----+---------------  0 | 330 | 14.2205323194
DeadLockCheck: system time
min | max |     avg      
-----+-----+--------------  0 | 100 | 2.5095057034


Re: stuck spin lock with many concurrent users

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> I added some codes into HandleDeadLock to measure how long
> LockLockTable and DeadLOckCheck calls take. Followings are the result
> in running pgbench -c 1000 (it failed with stuck spin lock
> error). "real time" shows how long they actually run (using
> gettimeofday). "user time" and "system time" are measured by calling
> getrusage. The time unit is milli second.

>  LockLockTable: real time

>  min |  max   |        avg        
> -----+--------+-------------------
>    0 | 867873 | 152874.9015151515

>  LockLockTable: user time

>  min | max |     avg      
> -----+-----+--------------
>    0 |  30 | 1.2121212121

>  LockLockTable: system time

>  min | max  |      avg       
> -----+------+----------------
>    0 | 2140 | 366.5909090909


>  DeadLockCheck: real time

>  min |  max  |       avg       
> -----+-------+-----------------
>    0 | 87671 | 3463.6996197719

>  DeadLockCheck: user time

>  min | max |      avg      
> -----+-----+---------------
>    0 | 330 | 14.2205323194

>  DeadLockCheck: system time

>  min | max |     avg      
> -----+-----+--------------
>    0 | 100 | 2.5095057034

Hm.  It doesn't seem that DeadLockCheck is taking very much of the time.
I have to suppose that the problem is (once again) our inefficient
spinlock code.

If you think about it, on a typical platform where processes waiting for
a time delay are released at a clock tick, what's going to be happening
is that a whole lot of spinblocked processes will all be awoken in the
same clock tick interrupt.  The first one of these that gets to run will
acquire the spinlock, if it's free, and the rest will go back to sleep
and try again at the next tick.  This could be highly unfair depending
on just how the kernel's scheduler works --- for example, one could
easily believe that the waiters might be awoken in process-number order,
in which case backends with high process numbers might never get to
acquire the spinlock, or at least would have such low probability of
winning that they are prone to "stuck spinlock" timeout.

We really need to look at replacing the spinlock mechanism with
something more efficient.
        regards, tom lane


Re: stuck spin lock with many concurrent users

From
Tom Lane
Date:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:

> DeadLockCheck: real time
>> 
> min |  max  |       avg
> -----+-------+-----------------
> 0 | 87671 | 3463.6996197719
>> 
> DeadLockCheck: user time
>> 
> min | max |      avg
> -----+-----+---------------
> 0 | 330 | 14.2205323194
>> 
> DeadLockCheck: system time
>> 
> min | max |     avg
> -----+-----+--------------
> 0 | 100 | 2.5095057034
>> 
>> Hm.  It doesn't seem that DeadLockCheck is taking very much of the time.

> Isn't the real time big ?

Yes, it sure is, but remember that the guy getting useful work done
(DeadLockCheck) is having to share the CPU with 999 other processes
that are waking up on every clock tick for just long enough to fail
to get the spinlock.  I think it's those useless process wakeups that
are causing the problem.

If you estimate that a process dispatch cycle is ~ 10 microseconds,
then waking 999 useless processes every 10 msec is just about enough
to consume 100% of the CPU doing nothing useful... so what should be
a few-millisecond check takes a long time, which makes things worse
because the 999 wannabees are spinning for that much more time.
        regards, tom lane


Re: stuck spin lock with many concurrent users

From
Hiroshi Inoue
Date:
Tom Lane wrote:
> 
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > I added some codes into HandleDeadLock to measure how long
> > LockLockTable and DeadLOckCheck calls take. Followings are the result
> > in running pgbench -c 1000 (it failed with stuck spin lock
> > error). "real time" shows how long they actually run (using
> > gettimeofday). "user time" and "system time" are measured by calling
> > getrusage. The time unit is milli second.
> 
> >  LockLockTable: real time
> 
> >  min |  max   |        avg
> > -----+--------+-------------------
> >    0 | 867873 | 152874.9015151515
> 

[snip]

> 
> >  DeadLockCheck: real time
> 
> >  min |  max  |       avg
> > -----+-------+-----------------
> >    0 | 87671 | 3463.6996197719
> 
> >  DeadLockCheck: user time
> 
> >  min | max |      avg
> > -----+-----+---------------
> >    0 | 330 | 14.2205323194
> 
> >  DeadLockCheck: system time
> 
> >  min | max |     avg
> > -----+-----+--------------
> >    0 | 100 | 2.5095057034
> 
> Hm.  It doesn't seem that DeadLockCheck is taking very much of the time.

Isn't the real time big ?
Isn't 14.22msec big enough for the spinlocking process to
pass the time slice to other processes ?

regards,
Hiroshi Inoue


Re: stuck spin lock with many concurrent users

From
Bruce Momjian
Date:
> Yes, it sure is, but remember that the guy getting useful work done
> (DeadLockCheck) is having to share the CPU with 999 other processes
> that are waking up on every clock tick for just long enough to fail
> to get the spinlock.  I think it's those useless process wakeups that
> are causing the problem.
> 
> If you estimate that a process dispatch cycle is ~ 10 microseconds,
> then waking 999 useless processes every 10 msec is just about enough
> to consume 100% of the CPU doing nothing useful... so what should be
> a few-millisecond check takes a long time, which makes things worse
> because the 999 wannabees are spinning for that much more time.

Don't we back off the sleeps or was that code removed?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: stuck spin lock with many concurrent users

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> If you estimate that a process dispatch cycle is ~ 10 microseconds,
>> then waking 999 useless processes every 10 msec is just about enough
>> to consume 100% of the CPU doing nothing useful...

> Don't we back off the sleeps or was that code removed?

Not enough to affect this calculation.
        regards, tom lane


Re: stuck spin lock with many concurrent users

From
Tatsuo Ishii
Date:
> >> Hm.  It doesn't seem that DeadLockCheck is taking very much of the time.
> 
> > Isn't the real time big ?
> 
> Yes, it sure is, but remember that the guy getting useful work done
> (DeadLockCheck) is having to share the CPU with 999 other processes
> that are waking up on every clock tick for just long enough to fail
> to get the spinlock.  I think it's those useless process wakeups that
> are causing the problem.
> 
> If you estimate that a process dispatch cycle is ~ 10 microseconds,
> then waking 999 useless processes every 10 msec is just about enough
> to consume 100% of the CPU doing nothing useful... so what should be
> a few-millisecond check takes a long time, which makes things worse
> because the 999 wannabees are spinning for that much more time.

If so, what about increase the dead lock timer proportional to the
length of the waiting holder queue?

Here are the patches against current to increase the dealock timer by
queue_length secconds.

*** proc.c.orig    Thu Jul  5 16:12:32 2001
--- proc.c    Thu Jul  5 16:20:22 2001
***************
*** 506,511 ****
--- 506,512 ----     int            myHeldLocks = MyProc->heldLocks;     PROC       *proc;     int            i;
+     int    msec;  #ifndef __BEOS__     struct itimerval timeval,
***************
*** 625,638 ****      * Need to zero out struct to set the interval and the microseconds      * fields to 0.      */
#ifndef__BEOS__     MemSet(&timeval, 0, sizeof(struct itimerval));
 
!     timeval.it_value.tv_sec = DeadlockTimeout / 1000;
!     timeval.it_value.tv_usec = (DeadlockTimeout % 1000) * 1000;     if (setitimer(ITIMER_REAL, &timeval, &dummy))
   elog(FATAL, "ProcSleep: Unable to set timer for process wakeup"); #else
 
!     time_interval = DeadlockTimeout * 1000000;    /* usecs */     if (set_alarm(time_interval,
B_ONE_SHOT_RELATIVE_ALARM)< 0)         elog(FATAL, "ProcSleep: Unable to set timer for process wakeup"); #endif
 
--- 626,642 ----      * Need to zero out struct to set the interval and the microseconds      * fields to 0.      */
+ 
+     msec = DeadlockTimeout + waitQueue->size * 1000;
+  #ifndef __BEOS__     MemSet(&timeval, 0, sizeof(struct itimerval));
!     timeval.it_value.tv_sec = msec / 1000;
!     timeval.it_value.tv_usec = (msec % 1000) * 1000;     if (setitimer(ITIMER_REAL, &timeval, &dummy))
elog(FATAL,"ProcSleep: Unable to set timer for process wakeup"); #else
 
!     time_interval = msec * 1000000;    /* usecs */     if (set_alarm(time_interval, B_ONE_SHOT_RELATIVE_ALARM) < 0)
     elog(FATAL, "ProcSleep: Unable to set timer for process wakeup"); #endif
 


Re: stuck spin lock with many concurrent users

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> If so, what about increase the dead lock timer proportional to the
> length of the waiting holder queue?

I don't think that's a good idea; it's not solving the problem, only
reducing performance, and in a fairly arbitrary way at that.  (The
length of the particular wait queue you happen to be on is no measure
of the total number of processes waiting for locks.)

The real problem is in the spinlock implementation --- deadlock checking
is only one place where lots of processes might gang up on the same
spinlock.  The bufmgr lock is another one.
        regards, tom lane