Re: max_connections reached in postgres 9.3.3 - Mailing list pgsql-general

From Borislav Ivanov
Subject Re: max_connections reached in postgres 9.3.3
Date
Msg-id CAEqRsYUkrCUO6rMHCJuDPM-S5tAz4EEsqyHeXv3Xm3c9mWJ7Qg@mail.gmail.com
Whole thread Raw
In response to Re: max_connections reached in postgres 9.3.3  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-general
We do record perf data. For each incident we've had the data looks about the same. Unfortunately, I can't read much into it. Besides it getting stuck on a spinlock. But why and with what? 


### from perf report

 53.28%       postmaster  postgres               [.] s_lock                                                                        

  6.22%       postmaster  postgres               [.] 0x00000000001b4306                                                            

  2.30%       postmaster  [kernel.kallsyms]      [k] _spin_lock                                                                    

  2.23%       postmaster  postgres               [.] LWLockAcquire                                                                 

  1.79%       postmaster  postgres               [.] LWLockRelease                                                                 

  1.39%       postmaster  postgres               [.] hash_search_with_hash_value                                                   

  1.20%       postmaster  postgres               [.] SearchCatCache                                                                

  0.71%       postmaster  [kernel.kallsyms]      [k] hrtimer_interrupt                                                             

  0.56%       postmaster  [kernel.kallsyms]      [k] tick_program_event                                                            

  0.54%       postmaster  [kernel.kallsyms]      [k] schedule                                                                      

  0.44%       postmaster  [kernel.kallsyms]      [k] _spin_lock_irq 


### Then zooming on s_lock

# Annotate s_lock

...

 99.04       test   %al,%al 

...


# Zoom into postmaster(81487) thread

 55.84%  postmaster  postgres               [.] s_lock                                                                           

  6.52%  postmaster  postgres               [.] 0x00000000001b4306                                                                ▒

  2.42%  postmaster  [kernel.kallsyms]      [k] _spin_lock                                                                        ▒

  2.34%  postmaster  postgres               [.] LWLockAcquire                                                                     ▒

  1.87%  postmaster  postgres               [.] LWLockRelease                                                                     ▒

  1.46%  postmaster  postgres               [.] hash_search_with_hash_value                                                       ▒

  1.26%  postmaster  postgres               [.] SearchCatCache                                                                    ▒

  0.75%  postmaster  [kernel.kallsyms]      [k] hrtimer_interrupt                                                                 ▒

  0.59%  postmaster  [kernel.kallsyms]      [k] tick_program_event                                                                ▒

  0.57%  postmaster  [kernel.kallsyms]      [k] schedule 


# Zoom into postgres DSO

 65.75%  postmaster  [.] s_lock                                                                                                  

  7.68%  postmaster  [.] 0x00000000001b4306                                                                                       ▒

  2.75%  postmaster  [.] LWLockAcquire                                                                                            ▒

  2.20%  postmaster  [.] LWLockRelease                                                                                            ▒

  1.72%  postmaster  [.] hash_search_with_hash_value                                                                              ▒

  1.49%  postmaster  [.] SearchCatCache                                                                                           ▒

  0.54%  postmaster  [.] _bt_compare                                                                                              ▒

  0.51%  postmaster  [.] _bt_checkkeys 



On 19 June 2014 15:57, Merlin Moncure <mmoncure@gmail.com> wrote:
On Thu, Jun 19, 2014 at 5:12 PM, Borislav Ivanov <bivanov@atlassian.com> wrote:
> However, most people on our team think that the number of connections is
> purely a symptom of the actual problem. We would love to be wrong about
> this. But for now we feel the high number of connections contributes for
> preserving the problem but it's not actually triggering the problem.

This is entirely correct.  pgbouncer does not preventing database load
but about limiting damage when it occurs.  This generally necessary in
environments where application servers keep piling on connections when
the database is not clearing queries fast enough.

In your case user% is dominating system load.  Along with the high cs
this is really suggesting spinlock contention.  A 'perf top' is
essential for identifying the culprit.  It's very possible that 9.4
will fix your problem...see:
http://postgresql.1045698.n5.nabble.com/Cpu-usage-100-on-slave-s-lock-problem-td5768655.html.
There was some poorly optimized code in the wal replay.

merlin

pgsql-general by date:

Previous
From: "Vasudevan, Ramya"
Date:
Subject: Re: max_connections reached in postgres 9.3.3
Next
From: Erik van Zijst
Date:
Subject: Re: max_connections reached in postgres 9.3.3