Re: autovacuum locks - Mailing list pgsql-hackers

From Gregg Jaskiewicz
Subject Re: autovacuum locks
Date
Msg-id CAJY59_gizufaLQRUXh8z4SUYzBrDVs5Cn3O51spF-GXcyK1wnA@mail.gmail.com
Whole thread Raw
In response to Re: autovacuum locks  (Alvaro Herrera <alvherre@commandprompt.com>)
Responses Re: autovacuum locks  (Gregg Jaskiewicz <gryzman@gmail.com>)
List pgsql-hackers
On 2 March 2012 11:03, Alvaro Herrera <alvherre@commandprompt.com> wrote:
>
> Excerpts from Gregg Jaskiewicz's message of vie mar 02 07:44:07 -0300 2012:
>> Folks,
>>
>> I got a system here (8.3.7), that is locked up. Few queries waiting
>> for autovacuum aquired locks on a table or two.
>> But it looks like autovacuum is also waiting for some semaphore:
>>
>>
>> #0  0x00f07410 in __kernel_vsyscall ()
>> #1  0x00252d2b in semop () from /lib/libc.so.6
>> #2  0x081ca1ce in PGSemaphoreLock ()
>> #3  0x081f590e in ProcWaitForSignal ()
>> #4  0x081e8dea in LockBufferForCleanup ()
>> #5  0x08160e68 in ?? ()
>> #6  0x08161482 in lazy_vacuum_rel ()
>> #7  0x081603b5 in ?? ()
>> #8  0x081605c4 in vacuum ()
>> #9  0x081cd154 in ?? ()
>> #10 0x081cd7f1 in ?? ()
>> #11 0x081cd911 in StartAutoVacWorker ()
>> #12 0x081d6d47 in ?? ()
>> #13 <signal handler called>
>> #14 0x00f07410 in __kernel_vsyscall ()
>> #15 0x00249fcd in ___newselect_nocancel () from /lib/libc.so.6
>> #16 0x081d3964 in ?? ()
>> #17 0x081d5105 in PostmasterMain ()
>> #18 0x0818be60 in main ()
>>
>> There's still other transcations going on , and working fine - but two
>> tables seem to be cut out by autovacuum, and other queries stuck
>> waiting for autovacuum to finish.
>
> This vacuum is waiting for some other process to release the buffer
> lock.  These locks are usually very short-lived, but maybe you have some
> other backend doing something weird that won't release it.  Do you have
> idle processes with transactions open, and if you have, what are they
> doing?  Do you have prepared transactions open? (see the
> pg_prepared_xacts view)

That view is empty, but I know that the queries that are waiting for
locks to be released - are in the middle of two phase commit
transaction.

It could be that the system is doing something, but it is just so slow
- that those locks are starved - and no longer so short lived :)

Is there any particular info I could gather to make it easier for you
guys to look at ?

Here's a backtrace from one of the autovacuums (this time with line # info):
#0  0x00f07410 in __kernel_vsyscall ()
#1  0x00252d2b in semop () from /lib/libc.so.6
#2  0x081ca1ce in PGSemaphoreLock (sema=0xb792bdf8, interruptOK=1
'\001') at pg_sema.c:420
#3  0x081f590e in ProcWaitForSignal () at proc.c:1269
#4  0x081e8dea in LockBufferForCleanup (buffer=24516) at bufmgr.c:2304
#5  0x08160e68 in lazy_vacuum_heap (onerel=0x87573c0,
vacrelstats=0x86a4768) at vacuumlazy.c:655
#6  0x08161482 in lazy_scan_heap (onerel=0x87573c0,
vacstmt=0xbfbcfb50, bstrategy=0x86fbe98) at vacuumlazy.c:591
#7  lazy_vacuum_rel (onerel=0x87573c0, vacstmt=0xbfbcfb50,
bstrategy=0x86fbe98) at vacuumlazy.c:191
#8  0x081603b5 in vacuum_rel (relid=<value optimized out>,
vacstmt=0xbfbcfb50, expected_relkind=114 'r', for_wraparound=0 '\000')
at vacuum.c:1126
#9  0x081605c4 in vacuum (vacstmt=0xbfbcfb50, relids=0x86fd310,
bstrategy=0x86fbe98, for_wraparound=0 '\000', isTopLevel=1 '\001') at
vacuum.c:427
#10 0x081cd154 in do_autovacuum () at autovacuum.c:2638
#11 0x081cd7f1 in AutoVacWorkerMain (argc=<value optimized out>,
argv=<value optimized out>) at autovacuum.c:1613
#12 0x081cd911 in StartAutoVacWorker () at autovacuum.c:1416
#13 0x081d6d47 in StartAutovacuumWorker (postgres_signal_arg=10) at
postmaster.c:4109
#14 sigusr1_handler (postgres_signal_arg=10) at postmaster.c:3845
#15 <signal handler called>
#16 0x00f07410 in __kernel_vsyscall ()
#17 0x00249fcd in ___newselect_nocancel () from /lib/libc.so.6
#18 0x081d3964 in ServerLoop () at postmaster.c:1234
#19 0x081d5105 in PostmasterMain (argc=5, argv=0x8641828) at postmaster.c:1029
#20 0x0818be60 in main (argc=5, argv=0x0) at main.c:188


And the connection that is waiting for locks to be released (the
locked up connection). That query is hanging there since ~18:00 BST:

#0  0x00f07410 in __kernel_vsyscall ()
#1  0x00252d2b in semop () from /lib/libc.so.6
#2  0x081ca1ce in PGSemaphoreLock (sema=0xb7929618, interruptOK=1
'\001') at pg_sema.c:420
#3  0x081f5c53 in ProcSleep (locallock=0x86578f0,
lockMethodTable=0x834e0f4) at proc.c:880
#4  0x081f4a5d in WaitOnLock (locallock=0x86578f0, owner=0x88676d8) at
lock.c:1147
#5  0x081f50af in LockAcquire (locktag=0xbfbcf2b8, lockmode=7,
sessionLock=0 '\000', dontWait=0 '\000') at lock.c:792
#6  0x081f27f9 in LockTuple (relation=0xa72f53a8, tid=0xbfbcf334,
lockmode=7) at lmgr.c:378
#7  0x0809b19c in heap_update (relation=0xa72f53a8, otid=0xbfbcf3da,
newtup=0x86d3be0, ctid=0xbfbcf3d4, update_xmax=0xbfbcf3e0, cid=2,
crosscheck=0x0, wait=1 '\001') at heapam.c:2376
#8  0x08167f8b in ExecutePlan (queryDesc=0x870d010,
direction=ForwardScanDirection, count=0) at execMain.c:1880
#9  ExecutorRun (queryDesc=0x870d010, direction=ForwardScanDirection,
count=0) at execMain.c:270
#10 0x0818051e in _SPI_execute_plan (plan=0x87b8948, Values=0x8e960f8,
Nulls=0x8e960e8 "  ", snapshot=0x0, crosscheck_snapshot=0x0,
read_only=0 '\000', fire_triggers=1 '\001', tcount=0) at spi.c:1845
#11 0x08180796 in SPI_execute_plan (plan=0x87b8948, Values=0x8e960f8,
Nulls=0x8e960e8 "  ", read_only=0 '\000', tcount=0) at spi.c:374
#12 0x0012d488 in exec_stmt_execsql (estate=0xbfbcf7bc,
stmt=0x87a2630) at pl_exec.c:2546
#13 0x0012f0cc in exec_stmt (estate=0xbfbcf7bc, stmts=<value optimized
out>) at pl_exec.c:1260
#14 exec_stmts (estate=0xbfbcf7bc, stmts=<value optimized out>) at
pl_exec.c:1171
#15 0x00130d19 in exec_stmt_block (estate=0xbfbcf7bc, block=0x87a2778)
at pl_exec.c:1114
#16 0x001310e7 in plpgsql_exec_trigger (func=0x8844ae0,
trigdata=0xbfbcfb78) at pl_exec.c:636
#17 0x00127082 in plpgsql_call_handler (fcinfo=0xbfbcf934) at pl_handler.c:92
#18 0x0814a63a in ExecCallTriggerFunc (trigdata=0xbfbcfb78, tgindx=8,
finfo=0x12800b, instr=0x0, per_tuple_context=0x87f5fd8) at
trigger.c:1577
#19 0x0814b69b in ExecBRUpdateTriggers (estate=0x885bd28,
relinfo=0x885bdb0, tupleid=0xbfbcfc2a, newtuple=0x8899310) at
trigger.c:2000
#20 0x0816820e in ExecutePlan (queryDesc=0x8ab81e8,
direction=ForwardScanDirection, count=0) at execMain.c:1835
#21 ExecutorRun (queryDesc=0x8ab81e8, direction=ForwardScanDirection,
count=0) at execMain.c:270
#22 0x08203054 in ProcessQuery (plan=0x8ab7f18, params=<value
optimized out>, dest=0x8ab81a0, completionTag=0xbfbcfcc8 "") at
pquery.c:179
#23 0x08203251 in PortalRunMulti (portal=0x86eb0b0, isTopLevel=<value
optimized out>, dest=0x8ab81a0, altdest=0x83bf95c,
completionTag=0xbfbcfcc8 "") at pquery.c:1242
#24 0x0820345d in FillPortalStore (portal=0x86eb0b0, isTopLevel=1
'\001') at pquery.c:1045
#25 0x08203a31 in PortalRun (portal=0x86eb0b0, count=2147483647,
isTopLevel=1 '\001', dest=0x86a16d0, altdest=0x86a16d0,
completionTag=0xbfbcffd8 "") at pquery.c:792
#26 0x08200154 in exec_execute_message (argc=4, argv=<value optimized
out>, username=0x8644890 "foo") at postgres.c:1942
#27 PostgresMain (argc=4, argv=<value optimized out>,
username=0x8644890 "foo") at postgres.c:3688
#28 0x081d4228 in BackendStartup () at postmaster.c:3207
#29 ServerLoop () at postmaster.c:1274
#30 0x081d5105 in PostmasterMain (argc=5, argv=0x8641828) at postmaster.c:1029
#31 0x0818be60 in main (argc=5, argv=0x4489ffff) at main.c:188



--
GJ


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: autovacuum locks
Next
From: Gregg Jaskiewicz
Date:
Subject: Re: autovacuum locks