Re: stuck spin lock with many concurrent users - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: stuck spin lock with many concurrent users
Date
Msg-id 20010626175304R.t-ishii@sra.co.jp
Whole thread Raw
In response to Re: stuck spin lock with many concurrent users  (Tatsuo Ishii <t-ishii@sra.co.jp>)
Responses Re: stuck spin lock with many concurrent users  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> > Tatsuo Ishii <t-ishii@sra.co.jp> writes
> > >>> How can I check it?
> > >> 
> > >> The 'stuck' message should at least give you a code location...
> > 
> > > FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting.
> > 
> > Hmm, that's SpinAcquire, so it's one of the predefined spinlocks
> > (and not, say, a buffer spinlock).  You could try adding some
> > debug logging here, although the output would be voluminous.
> > But what would really be useful is a stack trace for the stuck
> > process.  Consider changing the s_lock code to abort() when it
> > gets a stuck spinlock --- then you could gdb the coredump.
> 
> Nice idea. I will try that.

It appeared that the deadlock checking timer seems to be the source of
the problem. With the default settings, it checks deadlocks every 1
second PER backend. So if there are 1000 backends, every 1 msec
there's a signal and a shared memory locking in average. That would be
too much. If increase the dealock_timeout to , say 100000, the problem
seems gone. Also the performance increased SIGNIFICANTLY. Before that
I got only 1-2 TPS, but now I get ~20 TPS using pgbench -c 1000.

Here is the backtrace:

#0  0x2ab56d21 in __kill () from /lib/libc.so.6
#1  0x2ab56996 in raise (sig=6) at ../sysdeps/posix/raise.c:27
#2  0x2ab580b8 in abort () at ../sysdeps/generic/abort.c:88
#3  0x80ece1a in s_lock_stuck (lock=0x2ac2d016 "\001",    file=0x816e7bc "spin.c", line=158) at s_lock.c:70
#4  0x80ecf3e in s_lock_sleep (spins=20001, timeout=100000000, microsec=5000,    lock=0x2ac2d016 "\001", file=0x816e7bc
"spin.c",line=158) at s_lock.c:109
 
#5  0x80ecfa3 in s_lock (lock=0x2ac2d016 "\001", file=0x816e7bc "spin.c",    line=158) at s_lock.c:136
#6  0x80efb4d in SpinAcquire (lockid=6) at spin.c:158
#7  0x80f2305 in HandleDeadLock (postgres_signal_arg=14) at proc.c:819
#8  <signal handler called>
#9  0x2abeb134 in semop (semid=32786, sops=0x7fffeebc, nsops=1)   at ../sysdeps/unix/sysv/linux/semop.c:34
#10 0x80ee460 in IpcSemaphoreLock (semId=32786, sem=13, interruptOK=1 '\001')   at ipc.c:426
#11 0x80f217f in ProcSleep (lockMethodTable=0x81c1708, lockmode=6,    lock=0x2ce0ab18, holder=0x2ce339b0) at
proc.c:666
#12 0x80f14ff in WaitOnLock (lockmethod=1, lockmode=6, lock=0x2ce0ab18,    holder=0x2ce339b0) at lock.c:955
#13 0x80f1298 in LockAcquire (lockmethod=1, locktag=0x7fffeffc, xid=130139,    lockmode=6) at lock.c:739
#14 0x80f0a23 in LockPage (relation=0x2dbeb9d0, blkno=0, lockmode=6)
#15 0x8071ceb in RelationGetBufferForTuple (relation=0x2dbeb9d0, len=132)   at hio.c:97
#16 0x8070293 in heap_update (relation=0x2dbeb9d0, otid=0x7ffff114,    newtup=0x82388c8, ctid=0x7ffff0b0) at
heapam.c:1737
#17 0x80b6825 in ExecReplace (slot=0x823af60, tupleid=0x7ffff114,    estate=0x8238a58) at execMain.c:1450
#18 0x80b651e in ExecutePlan (estate=0x8238a58, plan=0x8238d00,    operation=CMD_UPDATE, numberTuples=0,
direction=ForwardScanDirection,   destfunc=0x823b680) at execMain.c:1125
 
#19 0x80b5af3 in ExecutorRun (queryDesc=0x8239080, estate=0x8238a58,    feature=3, count=0) at execMain.c:233
#20 0x80f6d93 in ProcessQuery (parsetree=0x822bc18, plan=0x8238d00,    dest=Remote) at pquery.c:295
#21 0x80f599b in pg_exec_query_string (   query_string=0x822b8c0 "update accounts set abalance = abalance + 277 where
aid= 41148\n", dest=Remote, parse_context=0x81fc850) at postgres.c:810
 
#22 0x80f68c6 in PostgresMain (argc=4, argv=0x7ffff380, real_argc=3,    real_argv=0x7ffffc94, username=0x81cd981
"t-ishii")at postgres.c:1908
 
#23 0x80e1ee3 in DoBackend (port=0x81cd718) at postmaster.c:2120
#24 0x80e1acc in BackendStartup (port=0x81cd718) at postmaster.c:1903
#25 0x80e0e26 in ServerLoop () at postmaster.c:995
#26 0x80e0853 in PostmasterMain (argc=3, argv=0x7ffffc94) at postmaster.c:685
#27 0x80c4865 in main (argc=3, argv=0x7ffffc94) at main.c:175
#28 0x2ab509cb in __libc_start_main (main=0x80c4750 <main>, argc=3,    argv=0x7ffffc94, init=0x80656c4 <_init>,
fini=0x81395ac<_fini>,    rtld_fini=0x2aab5ea0 <_dl_fini>, stack_end=0x7ffffc8c)   at
../sysdeps/generic/libc-start.c:92


pgsql-hackers by date:

Previous
From: Lincoln Yeoh
Date:
Subject: Re: Encrypting pg_shadow passwords
Next
From: Hiroshi Inoue
Date:
Subject: Re: stuck spin lock with many concurrent users