Re: Intermittent hangs with 9.2 - Mailing list pgsql-performance

From David Whittaker
Subject Re: Intermittent hangs with 9.2
Date
Msg-id CABXnLXQU9S1Jd-dwpysTkMdtwETkEUGn4a8XiHUnc98AKXDbAw@mail.gmail.com
Whole thread Raw
In response to Re: Intermittent hangs with 9.2  (David Whittaker <dave@iradix.com>)
List pgsql-performance
We haven't seen any issues since we decreased shared_buffers.  We also tuned some of the longer running / more frequently executed queries, so that may have had an effect as well, but my money would be on the shared_buffers change.  If the issue re-appears I'll try to get a perf again and post back, but if you don't hear from me again you can assume the problem is solved.

Thank you all again for the help.

-Dave

On Fri, Sep 13, 2013 at 11:05 AM, David Whittaker <dave@iradix.com> wrote:



On Fri, Sep 13, 2013 at 10:52 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Thu, Sep 12, 2013 at 3:06 PM, David Whittaker <dave@iradix.com> wrote:
> Hi All,
>
> We lowered shared_buffers to 8G and increased effective_cache_size
> accordingly.  So far, we haven't seen any issues since the adjustment.  The
> issues have come and gone in the past, so I'm not convinced it won't crop up
> again, but I think the best course is to wait a week or so and see how
> things work out before we make any other changes.
>
> Thank you all for your help, and if the problem does reoccur, we'll look
> into the other options suggested, like using a patched postmaster and
> compiling for perf -g.
>
> Thanks again, I really appreciate the feedback from everyone.

Interesting -- please respond with a follow up if/when you feel
satisfied the problem has gone away.  Andres was right; I initially
mis-diagnosed the problem (there is another issue I'm chasing that has
a similar performance presentation but originates from a different
area of the code).

That said, if reducing shared_buffers made *your* problem go away as
well, then this more evidence that we have an underlying contention
mechanic that is somehow influenced by the setting.  Speaking frankly,
under certain workloads we seem to have contention issues in the
general area of the buffer system.  I'm thinking (guessing) that the
problems is usage_count is getting incremented faster than the buffers
are getting cleared out which is then causing the sweeper to spend
more and more time examining hotly contended buffers.  This may make
no sense in the context of your issue; I haven't looked at the code
yet.  Also, I've been unable to cause this to happen in simulated
testing.  But I'm suspicious (and dollars to doughnuts '0x347ba9' is
spinlock related).

Anyways, thanks for the report and (hopefully) the follow up.

merlin

You guys have taken the time to help me through this, following up is the least I can do.  So far we're still looking good.

pgsql-performance by date:

Previous
From: Bartłomiej Romański
Date:
Subject: Re: Planner performance extremely affected by an hanging transaction (20-30 times)?
Next
From: Jeff Janes
Date:
Subject: Re: Planner performance extremely affected by an hanging transaction (20-30 times)?