Re: hung backends stuck in spinlock heavy endless loop - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: hung backends stuck in spinlock heavy endless loop
Date
Msg-id CAHyXU0zbR0n9cC4Qhq=MDd4p_EHjbZPPXsrtqoyLMbPiqcr6uA@mail.gmail.com
Whole thread Raw
In response to Re: hung backends stuck in spinlock heavy endless loop  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: hung backends stuck in spinlock heavy endless loop
List pgsql-hackers
On Wed, Jan 14, 2015 at 9:49 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2015-01-14 09:47:19 -0600, Merlin Moncure wrote:
>> On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > If you gdb in, and type 'fin' a couple times, to wait till the function
>> > finishes, is there actually any progress? I'm wondering whether it's
>> > just many catalog accesses + contention, or some other
>> > problem. Alternatively set a breakpoint on ScanPgRelation() or so and
>> > see how often it's hit.
>>
>> well, i restarted the database, forgetting my looper was running which
>> immediately spun up and it got stuck again with a similar profile
>> (lots of cpu in spinlock):
>>
>> Samples: 3K of event 'cycles', Event count (approx.): 2695723228
>>  31.16%  postgres            [.] s_lock
>>  22.32%  postgres            [.] tas
>>  12.13%  postgres            [.] tas
>>   5.93%  postgres            [.] spin_delay
>>   5.69%  postgres            [.] LWLockRelease
>>   3.75%  postgres            [.] LWLockAcquireCommon
>>   3.61%  perf                [.] 0x00000000000526c4
>>   2.51%  postgres            [.] FunctionCall2Coll
>>   1.48%  libc-2.17.so        [.] 0x000000000016a132
>>
>> > If you gdb in, and type 'fin' a couple times,
>> (gdb) fin
>> Run till exit from #0  0x00007ff4c63f7a97 in semop () from
>> /lib/x86_64-linux-gnu/libc.so.6
>> 0x00000000006de073 in PGSemaphoreLock ()
>> (gdb) fin
>> Run till exit from #0  0x00000000006de073 in PGSemaphoreLock ()
>>
>> It returned once.  Second time, it didn't at least so far (minute or so).
>
> Hm, that's autovac though, not the normal user backends that actually do
> stuff, right? If you could additionally check those, it'd be great.


got it to reproduce.  had to take a break to get some work done.


(gdb) fin
Run till exit from #0  0x00007f7b07099dc3 in select ()  from /lib/x86_64-linux-gnu/libc.so.6
0x00000000008d3ac4 in pg_usleep ()
(gdb) fin
Run till exit from #0  0x00000000008d3ac4 in pg_usleep ()
0x0000000000750b69 in s_lock ()
(gdb) fin
Run till exit from #0  0x0000000000750b69 in s_lock ()
0x0000000000750844 in LWLockRelease ()
(gdb) fin
Run till exit from #0  0x0000000000750844 in LWLockRelease ()
0x0000000000730000 in LockBuffer ()
(gdb) fin
Run till exit from #0  0x0000000000730000 in LockBuffer ()
0x00000000004b2db4 in _bt_relandgetbuf ()
(gdb) fin
Run till exit from #0  0x00000000004b2db4 in _bt_relandgetbuf ()
0x00000000004b7116 in _bt_moveright ()
(gdb) fin
Run till exit from #0  0x00000000004b7116 in _bt_moveright ()
....


so it looks like nobody ever exits from _bt_moveright.   any last
requests before I start bisecting down?

merlin



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: pg_basebackup fails with long tablespace paths
Next
From: "Timmer, Marius"
Date:
Subject: Re: [PATCH] explain sortorder