On Fri, Nov 16, 2012 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Fri, Nov 16, 2012 at 8:21 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> On Fri, Nov 16, 2012 at 9:52 AM, Vlad <marchenko@gmail.com> wrote:
>>>
>>>> *) failing that, LWLOCK_STATS macro can be compiled in to give us some
>>>> information about the particular lock(s) we're binding on. Hopefully
>>>> it's a lwlock -- this will make diagnosing the problem easier.
>>>
>>>
>>> I've enabled that macro, seeing flying lwlock messages in the log (see
>>> below), even when there is no high-sys-cpu stall observed at the moment.
>>> Should I be looking for something in particular?
>>
>> We're looking for spikes in 'blk' which represents when lwlocks bump.
>
> Unfortunately LWLock contention is only a weak indication of spinlock
> contention. For example, if backends are furiously acquiring and
> releasing the same LWLock in shared mode and no one is getting an
> exclusive mode, then there can be high spinlock contention but there
> will never be a blk.
yup. this problem doesn't smell like lwlock issues. typically there
the problem manifests as low cpu performance, everybody waits.
classic spinlock contention (at least from what i've seen) is very
high *userspace* cpu utilization and low work output. this time it's
different -- OP is bogging in the kernel so it's not impossible we're
troubleshooting the symptom, not the cause.
> In 9.3 there is a new field that tells how many spin delays there were
> on the mutex that is behind each lock. That was commit
> b79ab00144e64217d41, maybe he can port that back to his version.
>
> But that only tells you about LWLock mutexes, not about all the other
> ones in PG.
>
> The attached patch logs every spin delay with where in the source it comes from.
yeah, OP should fire this off. good stuff. I'll bet lunch (if we ever
happen to meet) it's on buffer pin.
merlin