Re: stuck spin lock with many concurrent users - Mailing list pgsql-hackers
From | Tatsuo Ishii |
---|---|
Subject | Re: stuck spin lock with many concurrent users |
Date | |
Msg-id | 20010626175304R.t-ishii@sra.co.jp Whole thread Raw |
In response to | Re: stuck spin lock with many concurrent users (Tatsuo Ishii <t-ishii@sra.co.jp>) |
Responses |
Re: stuck spin lock with many concurrent users
|
List | pgsql-hackers |
> > Tatsuo Ishii <t-ishii@sra.co.jp> writes > > >>> How can I check it? > > >> > > >> The 'stuck' message should at least give you a code location... > > > > > FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting. > > > > Hmm, that's SpinAcquire, so it's one of the predefined spinlocks > > (and not, say, a buffer spinlock). You could try adding some > > debug logging here, although the output would be voluminous. > > But what would really be useful is a stack trace for the stuck > > process. Consider changing the s_lock code to abort() when it > > gets a stuck spinlock --- then you could gdb the coredump. > > Nice idea. I will try that. It appeared that the deadlock checking timer seems to be the source of the problem. With the default settings, it checks deadlocks every 1 second PER backend. So if there are 1000 backends, every 1 msec there's a signal and a shared memory locking in average. That would be too much. If increase the dealock_timeout to , say 100000, the problem seems gone. Also the performance increased SIGNIFICANTLY. Before that I got only 1-2 TPS, but now I get ~20 TPS using pgbench -c 1000. Here is the backtrace: #0 0x2ab56d21 in __kill () from /lib/libc.so.6 #1 0x2ab56996 in raise (sig=6) at ../sysdeps/posix/raise.c:27 #2 0x2ab580b8 in abort () at ../sysdeps/generic/abort.c:88 #3 0x80ece1a in s_lock_stuck (lock=0x2ac2d016 "\001", file=0x816e7bc "spin.c", line=158) at s_lock.c:70 #4 0x80ecf3e in s_lock_sleep (spins=20001, timeout=100000000, microsec=5000, lock=0x2ac2d016 "\001", file=0x816e7bc "spin.c",line=158) at s_lock.c:109 #5 0x80ecfa3 in s_lock (lock=0x2ac2d016 "\001", file=0x816e7bc "spin.c", line=158) at s_lock.c:136 #6 0x80efb4d in SpinAcquire (lockid=6) at spin.c:158 #7 0x80f2305 in HandleDeadLock (postgres_signal_arg=14) at proc.c:819 #8 <signal handler called> #9 0x2abeb134 in semop (semid=32786, sops=0x7fffeebc, nsops=1) at ../sysdeps/unix/sysv/linux/semop.c:34 #10 0x80ee460 in IpcSemaphoreLock (semId=32786, sem=13, interruptOK=1 '\001') at ipc.c:426 #11 0x80f217f in ProcSleep (lockMethodTable=0x81c1708, lockmode=6, lock=0x2ce0ab18, holder=0x2ce339b0) at proc.c:666 #12 0x80f14ff in WaitOnLock (lockmethod=1, lockmode=6, lock=0x2ce0ab18, holder=0x2ce339b0) at lock.c:955 #13 0x80f1298 in LockAcquire (lockmethod=1, locktag=0x7fffeffc, xid=130139, lockmode=6) at lock.c:739 #14 0x80f0a23 in LockPage (relation=0x2dbeb9d0, blkno=0, lockmode=6) #15 0x8071ceb in RelationGetBufferForTuple (relation=0x2dbeb9d0, len=132) at hio.c:97 #16 0x8070293 in heap_update (relation=0x2dbeb9d0, otid=0x7ffff114, newtup=0x82388c8, ctid=0x7ffff0b0) at heapam.c:1737 #17 0x80b6825 in ExecReplace (slot=0x823af60, tupleid=0x7ffff114, estate=0x8238a58) at execMain.c:1450 #18 0x80b651e in ExecutePlan (estate=0x8238a58, plan=0x8238d00, operation=CMD_UPDATE, numberTuples=0, direction=ForwardScanDirection, destfunc=0x823b680) at execMain.c:1125 #19 0x80b5af3 in ExecutorRun (queryDesc=0x8239080, estate=0x8238a58, feature=3, count=0) at execMain.c:233 #20 0x80f6d93 in ProcessQuery (parsetree=0x822bc18, plan=0x8238d00, dest=Remote) at pquery.c:295 #21 0x80f599b in pg_exec_query_string ( query_string=0x822b8c0 "update accounts set abalance = abalance + 277 where aid= 41148\n", dest=Remote, parse_context=0x81fc850) at postgres.c:810 #22 0x80f68c6 in PostgresMain (argc=4, argv=0x7ffff380, real_argc=3, real_argv=0x7ffffc94, username=0x81cd981 "t-ishii")at postgres.c:1908 #23 0x80e1ee3 in DoBackend (port=0x81cd718) at postmaster.c:2120 #24 0x80e1acc in BackendStartup (port=0x81cd718) at postmaster.c:1903 #25 0x80e0e26 in ServerLoop () at postmaster.c:995 #26 0x80e0853 in PostmasterMain (argc=3, argv=0x7ffffc94) at postmaster.c:685 #27 0x80c4865 in main (argc=3, argv=0x7ffffc94) at main.c:175 #28 0x2ab509cb in __libc_start_main (main=0x80c4750 <main>, argc=3, argv=0x7ffffc94, init=0x80656c4 <_init>, fini=0x81395ac<_fini>, rtld_fini=0x2aab5ea0 <_dl_fini>, stack_end=0x7ffffc8c) at ../sysdeps/generic/libc-start.c:92
pgsql-hackers by date: