Re: hung backends stuck in spinlock heavy endless loop - Mailing list pgsql-hackers
From | Merlin Moncure |
---|---|
Subject | Re: hung backends stuck in spinlock heavy endless loop |
Date | |
Msg-id | CAHyXU0zbR0n9cC4Qhq=MDd4p_EHjbZPPXsrtqoyLMbPiqcr6uA@mail.gmail.com Whole thread Raw |
In response to | Re: hung backends stuck in spinlock heavy endless loop (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: hung backends stuck in spinlock heavy endless loop
|
List | pgsql-hackers |
On Wed, Jan 14, 2015 at 9:49 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2015-01-14 09:47:19 -0600, Merlin Moncure wrote: >> On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> > If you gdb in, and type 'fin' a couple times, to wait till the function >> > finishes, is there actually any progress? I'm wondering whether it's >> > just many catalog accesses + contention, or some other >> > problem. Alternatively set a breakpoint on ScanPgRelation() or so and >> > see how often it's hit. >> >> well, i restarted the database, forgetting my looper was running which >> immediately spun up and it got stuck again with a similar profile >> (lots of cpu in spinlock): >> >> Samples: 3K of event 'cycles', Event count (approx.): 2695723228 >> 31.16% postgres [.] s_lock >> 22.32% postgres [.] tas >> 12.13% postgres [.] tas >> 5.93% postgres [.] spin_delay >> 5.69% postgres [.] LWLockRelease >> 3.75% postgres [.] LWLockAcquireCommon >> 3.61% perf [.] 0x00000000000526c4 >> 2.51% postgres [.] FunctionCall2Coll >> 1.48% libc-2.17.so [.] 0x000000000016a132 >> >> > If you gdb in, and type 'fin' a couple times, >> (gdb) fin >> Run till exit from #0 0x00007ff4c63f7a97 in semop () from >> /lib/x86_64-linux-gnu/libc.so.6 >> 0x00000000006de073 in PGSemaphoreLock () >> (gdb) fin >> Run till exit from #0 0x00000000006de073 in PGSemaphoreLock () >> >> It returned once. Second time, it didn't at least so far (minute or so). > > Hm, that's autovac though, not the normal user backends that actually do > stuff, right? If you could additionally check those, it'd be great. got it to reproduce. had to take a break to get some work done. (gdb) fin Run till exit from #0 0x00007f7b07099dc3 in select () from /lib/x86_64-linux-gnu/libc.so.6 0x00000000008d3ac4 in pg_usleep () (gdb) fin Run till exit from #0 0x00000000008d3ac4 in pg_usleep () 0x0000000000750b69 in s_lock () (gdb) fin Run till exit from #0 0x0000000000750b69 in s_lock () 0x0000000000750844 in LWLockRelease () (gdb) fin Run till exit from #0 0x0000000000750844 in LWLockRelease () 0x0000000000730000 in LockBuffer () (gdb) fin Run till exit from #0 0x0000000000730000 in LockBuffer () 0x00000000004b2db4 in _bt_relandgetbuf () (gdb) fin Run till exit from #0 0x00000000004b2db4 in _bt_relandgetbuf () 0x00000000004b7116 in _bt_moveright () (gdb) fin Run till exit from #0 0x00000000004b7116 in _bt_moveright () .... so it looks like nobody ever exits from _bt_moveright. any last requests before I start bisecting down? merlin
pgsql-hackers by date: