Re: Server stalls, all CPU 100% system time - Mailing list pgsql-performance

From Bèrto ëd Sèra
Subject Re: Server stalls, all CPU 100% system time
Date
Msg-id CAKwGa_9rN8dWgr=O+isr9_5Qc4ughXbFrA1jX-EoM3EjXbBKfw@mail.gmail.com
Whole thread Raw
In response to Server stalls, all CPU 100% system time  (Andre <pg@darix.de>)
Responses Re: Server stalls, all CPU 100% system time  (Andre <pg@darix.de>)
List pgsql-performance
and your /etc/sysctl.conf is?

Cheers
Bèrto

On 24 February 2013 14:08, Andre <pg@darix.de> wrote:
> Hi,
> Since our upgrade of hardware, OS and Postgres we experience server stalls
> under certain conditions, during that time (up to 2 minutes) all CPUs show
> 100% system time. All Postgres processes show BIND in top.
> Usually the server only has a load of  < 0.5 (12 cores) with up to 30
> connections, 200-400 tps
>
> Here is top -H during the stall:
> Threads: 279 total,  25 running, 254 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  0.2 us, 99.8 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
> st
>
> This is under normal circumstances:
> Threads: 274 total,   1 running, 273 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  0.2 us,  0.2 sy,  0.0 ni, 99.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
> st
>
> iostat shows under 0.3% load on the drives.
>
> The stalls are mostly reproducible when there is the normal load on the
> server and then 20-40 new processes start executing SQLs.
> Deactivating HT seemed to have reduced the frequency and length of the
> stalls.
>
> The log shows entries for slow BINDs (8 seconds):
> ... LOG:  duration: 8452.654 ms  bind pdo_stmt_00000001: SELECT [20 columns
> selected] FROM users WHERE users.USERID=$1 LIMIT 1
>
> I have tried to create a testcase, but even starting 200 client processes
> that execute prepared statements does not reproduce this behaviour on a
> nearly idle server, only under normal workload does it stall.
>
> Hardware details:
> 2x Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
> 64 GB RAM
>
> Postgres version: 9.2.2 and 9.2.3
>
> Linux: OpenSUSE 12.2 with Kernel 3.4.6
>
> Postgres config:
> max_connections = 200
> effective_io_concurrency = 3
> max_wal_senders = 2
> wal_keep_segments = 2048
> max_locks_per_transaction = 500
> default_statistics_target = 100
> checkpoint_completion_target = 0.9
> maintenance_work_mem = 1GB
> effective_cache_size = 60GB
> work_mem = 384MB
> wal_buffers = 8MB
> checkpoint_segments = 64
> shared_buffers = 15GB
>
>
> This might be related to this topic:
> http://www.postgresql.org/message-id/CANQNgOquOGH7AkqW6ObPafrgxv=J3WsiZg-NgVvbki-qYpoY7Q@mail.gmail.com
> (Poor performance after update from SLES11 SP1 to SP2)
> I believe the old server was OpenSUSE 11.x.
>
>
> Thanks for any hint on how to fix this or diagnose the problem.
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



--
==============================
If Pac-Man had affected us as kids, we'd all be running around in a
darkened room munching pills and listening to repetitive music.


pgsql-performance by date:

Previous
From: Andre
Date:
Subject: Server stalls, all CPU 100% system time
Next
From: Andre
Date:
Subject: Re: Server stalls, all CPU 100% system time