Thread: Multi processor server overloads occationally with system process while running postgresql-9.4
Multi processor server overloads occationally with system process while running postgresql-9.4
From
ajaykbs
Date:
I am working in a public company who uses only open source applications and databases. I have a problem with our critical database which is write and read intensive. version: Postgresql-9.4 Hardware: HP DL980 (8-processor, 80 cores w/o hyper threading, 512GB RAM) Operating system: Red Hat Enterprise Linux Server release 6.4 (Santiago) uname -a : Linux host1 2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64 x86_64 x86_64 GNU/Linux Single database with separate tablespace for main-data, pg_xlog and indexes I have a database having 770GB size and expected to grow to 2TB within the next year. The database was running in a 2processor HP DL560 (16 cores) and as the transactions of the database were found increasing, we have changed the hardware to DL980 with 8 processors and 512GB RAM. Problem It is observed that at some times during moderate load the CPU usage goes up to 400% and the users are not able to complete the queries in expected time. But the load is contributed by some system process only. The average connections are normally 50. But when this happens the connections will shoot up to max-connections. The sar command output 07:20:01 IST CPU %user %nice %system %iowait %steal %idle 07:30:01 IST all 0.73 0.00 0.37 0.58 0.00 98.33 07:40:01 IST all 0.66 0.00 0.38 0.65 0.00 98.31 07:50:01 IST all 0.27 0.00 0.27 0.01 0.00 99.45 08:00:01 IST all 0.52 0.00 0.37 0.01 0.00 99.10 08:10:01 IST all 1.54 0.00 0.70 0.02 0.00 97.74 08:20:01 IST all 1.20 0.00 0.67 0.02 0.00 98.10 08:30:01 IST all 1.48 0.00 0.77 0.03 0.00 97.72 08:40:01 IST all 1.69 0.00 0.89 0.04 0.00 97.39 08:50:01 IST all 1.71 0.00 0.94 0.04 0.00 97.31 09:00:01 IST all 1.74 0.00 0.92 0.03 0.00 97.31 09:10:01 IST all 2.32 0.00 1.06 0.04 0.00 96.58 09:20:01 IST all 2.22 0.00 1.17 0.04 0.00 96.57 09:30:02 IST all 2.20 0.00 6.68 0.06 0.00 91.06 09:40:01 IST all 2.43 0.00 1.37 0.06 0.00 96.14 09:50:01 IST all 3.23 0.00 2.06 0.08 0.00 94.63 10:00:02 IST all 3.15 0.00 6.10 0.07 0.00 90.67 10:10:01 IST all 4.94 0.00 5.20 0.29 0.00 89.57 10:20:01 IST all 5.10 0.00 2.13 0.34 0.00 92.43 10:30:01 IST all 5.60 0.00 2.42 0.18 0.00 91.80 10:40:01 IST all 5.28 0.00 14.37 0.19 0.00 80.16 10:50:01 IST all 4.52 0.00 28.48 0.23 0.00 66.77 11:00:01 IST all 5.25 0.00 9.02 0.18 0.00 85.55 11:10:01 IST all 5.77 0.00 4.96 0.27 0.00 89.00 11:20:01 IST all 5.70 0.00 2.74 0.19 0.00 91.37 11:30:01 IST all 5.72 0.00 5.91 0.20 0.00 88.17 11:40:01 IST all 5.66 0.00 2.81 0.37 0.00 91.15 11:50:01 IST all 5.90 0.00 8.80 0.10 0.00 85.19 12:00:01 IST all 6.44 0.00 3.40 0.13 0.00 90.03 12:10:01 IST all 7.18 0.00 4.52 0.11 0.00 88.18 12:20:02 IST all 4.40 0.00 37.84 0.07 0.00 57.70 12:30:01 IST all 5.66 0.00 2.98 0.10 0.00 91.26 12:40:01 IST all 5.74 0.00 3.05 0.11 0.00 91.10 Average: all 1.92 0.00 2.28 0.11 0.00 95.69 Postgresql.conf max_connections = 500 (can be reduced) shared_buffers = 8500MB work_mem = 50MB maintenance_work_mem = 8064MB checkpoint_segments = 132 checkpoint_timeout = 30min checkpoint_completion_target = 0.9 This over load happens 5-6 times a day. How to trace the cause of this problem?. My thoughts. 1. some thing related to the numa systems memory management. 2. Some thing related to the size of shared buffers. Please help Ajayakumar.BS
View this message in context: Multi processor server overloads occationally with system process while running postgresql-9.4
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
View this message in context: Multi processor server overloads occationally with system process while running postgresql-9.4
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Re: Multi processor server overloads occationally with system process while running postgresql-9.4
From
Gavin Flower
Date:
On 03/10/15 21:39, ajaykbs wrote: > I am working in a public company who uses only open source > applications and databases. I have a problem with our critical > database which is write and read intensive. *version:* Postgresql-9.4 > *Hardware:* HP DL980 (8-processor, 80 cores w/o hyper threading, 512GB > RAM) *Operating system: *Red Hat Enterprise Linux Server release 6.4 > (Santiago) *uname -a* : Linux host1 2.6.32-358.el6.x86_64 #1 SMP Tue > Jan 29 11:47:41 EST 2013 x86_64 x86_64 x86_64 GNU/Linux Single > database with separate tablespace for main-data, pg_xlog and indexes I > have a database having 770GB size and expected to grow to 2TB within > the next year. The database was running in a 2processor HP DL560 (16 > cores) and as the transactions of the database were found increasing, > we have changed the hardware to DL980 with 8 processors and 512GB RAM. > *Problem* It is observed that at some times during moderate load the > CPU usage goes up to 400% and the users are not able to complete the > queries in expected time. But the load is contributed by some system > process only. The average connections are normally 50. But when this > happens the connections will shoot up to max-connections. *The sar > command output* 07:20:01 IST CPU %user %nice %system %iowait %steal > %idle 07:30:01 IST all 0.73 0.00 0.37 0.58 0.00 98.33 07:40:01 IST all > 0.66 0.00 0.38 0.65 0.00 98.31 07:50:01 IST all 0.27 0.00 0.27 0.01 > 0.00 99.45 08:00:01 IST all 0.52 0.00 0.37 0.01 0.00 99.10 08:10:01 > IST all 1.54 0.00 0.70 0.02 0.00 97.74 08:20:01 IST all 1.20 0.00 0.67 > 0.02 0.00 98.10 08:30:01 IST all 1.48 0.00 0.77 0.03 0.00 97.72 > 08:40:01 IST all 1.69 0.00 0.89 0.04 0.00 97.39 08:50:01 IST all 1.71 > 0.00 0.94 0.04 0.00 97.31 09:00:01 IST all 1.74 0.00 0.92 0.03 0.00 > 97.31 09:10:01 IST all 2.32 0.00 1.06 0.04 0.00 96.58 09:20:01 IST all > 2.22 0.00 1.17 0.04 0.00 96.57 09:30:02 IST all 2.20 0.00 6.68 0.06 > 0.00 91.06 09:40:01 IST all 2.43 0.00 1.37 0.06 0.00 96.14 09:50:01 > IST all 3.23 0.00 2.06 0.08 0.00 94.63 10:00:02 IST all 3.15 0.00 6.10 > 0.07 0.00 90.67 10:10:01 IST all 4.94 0.00 5.20 0.29 0.00 89.57 > 10:20:01 IST all 5.10 0.00 2.13 0.34 0.00 92.43 10:30:01 IST all 5.60 > 0.00 2.42 0.18 0.00 91.80 10:40:01 IST all 5.28 0.00 14.37 0.19 0.00 > 80.16 10:50:01 IST all 4.52 0.00 28.48 0.23 0.00 66.77 11:00:01 IST > all 5.25 0.00 9.02 0.18 0.00 85.55 11:10:01 IST all 5.77 0.00 4.96 > 0.27 0.00 89.00 11:20:01 IST all 5.70 0.00 2.74 0.19 0.00 91.37 > 11:30:01 IST all 5.72 0.00 5.91 0.20 0.00 88.17 11:40:01 IST all 5.66 > 0.00 2.81 0.37 0.00 91.15 11:50:01 IST all 5.90 0.00 8.80 0.10 0.00 > 85.19 12:00:01 IST all 6.44 0.00 3.40 0.13 0.00 90.03 12:10:01 IST all > 7.18 0.00 4.52 0.11 0.00 88.18 12:20:02 IST all 4.40 0.00 37.84 0.07 > 0.00 57.70 12:30:01 IST all 5.66 0.00 2.98 0.10 0.00 91.26 12:40:01 > IST all 5.74 0.00 3.05 0.11 0.00 91.10 Average: all 1.92 0.00 2.28 > 0.11 0.00 95.69 Postgresql.conf max_connections = 500 (can be reduced) > shared_buffers = 8500MB work_mem = 50MB maintenance_work_mem = 8064MB > checkpoint_segments = 132 checkpoint_timeout = 30min > checkpoint_completion_target = 0.9 This over load happens 5-6 times a > day. How to trace the cause of this problem?. My thoughts. 1. some > thing related to the numa systems memory management. 2. Some thing > related to the size of shared buffers. Please help Ajayakumar.BS > ------------------------------------------------------------------------ > View this message in context: Multi processor server overloads > occationally with system process while running postgresql-9.4 > <http://postgresql.nabble.com/Multi-processor-server-overloads-occationally-with-system-process-while-running-postgresql-9-4-tp5868474.html> > Sent from the PostgreSQL - performance mailing list archive > <http://postgresql.nabble.com/PostgreSQL-performance-f2050081.html> at > Nabble.com. A little bit of formatting might make the above a bit more readable... One paragraph is hard to parse. -Gavin
Re: Multi processor server overloads occationally with system process while running postgresql-9.4
From
Wei Shan
Date:
Are you using any connection pooler in front of the database?
On 3 Oct 2015 17:04, "Gavin Flower" <GavinFlower@archidevsys.co.nz> wrote:
On 03/10/15 21:39, ajaykbs wrote:I am working in a public company who uses only open source applications and databases. I have a problem with our critical database which is write and read intensive. *version:* Postgresql-9.4 *Hardware:* HP DL980 (8-processor, 80 cores w/o hyper threading, 512GB RAM) *Operating system: *Red Hat Enterprise Linux Server release 6.4 (Santiago) *uname -a* : Linux host1 2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64 x86_64 x86_64 GNU/Linux Single database with separate tablespace for main-data, pg_xlog and indexes I have a database having 770GB size and expected to grow to 2TB within the next year. The database was running in a 2processor HP DL560 (16 cores) and as the transactions of the database were found increasing, we have changed the hardware to DL980 with 8 processors and 512GB RAM. *Problem* It is observed that at some times during moderate load the CPU usage goes up to 400% and the users are not able to complete the queries in expected time. But the load is contributed by some system process only. The average connections are normally 50. But when this happens the connections will shoot up to max-connections. *The sar command output* 07:20:01 IST CPU %user %nice %system %iowait %steal %idle 07:30:01 IST all 0.73 0.00 0.37 0.58 0.00 98.33 07:40:01 IST all 0.66 0.00 0.38 0.65 0.00 98.31 07:50:01 IST all 0.27 0.00 0.27 0.01 0.00 99.45 08:00:01 IST all 0.52 0.00 0.37 0.01 0.00 99.10 08:10:01 IST all 1.54 0.00 0.70 0.02 0.00 97.74 08:20:01 IST all 1.20 0.00 0.67 0.02 0.00 98.10 08:30:01 IST all 1.48 0.00 0.77 0.03 0.00 97.72 08:40:01 IST all 1.69 0.00 0.89 0.04 0.00 97.39 08:50:01 IST all 1.71 0.00 0.94 0.04 0.00 97.31 09:00:01 IST all 1.74 0.00 0.92 0.03 0.00 97.31 09:10:01 IST all 2.32 0.00 1.06 0.04 0.00 96.58 09:20:01 IST all 2.22 0.00 1.17 0.04 0.00 96.57 09:30:02 IST all 2.20 0.00 6.68 0.06 0.00 91.06 09:40:01 IST all 2.43 0.00 1.37 0.06 0.00 96.14 09:50:01 IST all 3.23 0.00 2.06 0.08 0.00 94.63 10:00:02 IST all 3.15 0.00 6.10 0.07 0.00 90.67 10:10:01 IST all 4.94 0.00 5.20 0.29 0.00 89.57 10:20:01 IST all 5.10 0.00 2.13 0.34 0.00 92.43 10:30:01 IST all 5.60 0.00 2.42 0.18 0.00 91.80 10:40:01 IST all 5.28 0.00 14.37 0.19 0.00 80.16 10:50:01 IST all 4.52 0.00 28.48 0.23 0.00 66.77 11:00:01 IST all 5.25 0.00 9.02 0.18 0.00 85.55 11:10:01 IST all 5.77 0.00 4.96 0.27 0.00 89.00 11:20:01 IST all 5.70 0.00 2.74 0.19 0.00 91.37 11:30:01 IST all 5.72 0.00 5.91 0.20 0.00 88.17 11:40:01 IST all 5.66 0.00 2.81 0.37 0.00 91.15 11:50:01 IST all 5.90 0.00 8.80 0.10 0.00 85.19 12:00:01 IST all 6.44 0.00 3.40 0.13 0.00 90.03 12:10:01 IST all 7.18 0.00 4.52 0.11 0.00 88.18 12:20:02 IST all 4.40 0.00 37.84 0.07 0.00 57.70 12:30:01 IST all 5.66 0.00 2.98 0.10 0.00 91.26 12:40:01 IST all 5.74 0.00 3.05 0.11 0.00 91.10 Average: all 1.92 0.00 2.28 0.11 0.00 95.69 Postgresql.conf max_connections = 500 (can be reduced) shared_buffers = 8500MB work_mem = 50MB maintenance_work_mem = 8064MB checkpoint_segments = 132 checkpoint_timeout = 30min checkpoint_completion_target = 0.9 This over load happens 5-6 times a day. How to trace the cause of this problem?. My thoughts. 1. some thing related to the numa systems memory management. 2. Some thing related to the size of shared buffers. Please help Ajayakumar.BSA little bit of formatting might make the above a bit more readable... One paragraph is hard to parse.
------------------------------------------------------------------------
View this message in context: Multi processor server overloads occationally with system process while running postgresql-9.4 <http://postgresql.nabble.com/Multi-processor-server-overloads-occationally-with-system-process-while-running-postgresql-9-4-tp5868474.html>
Sent from the PostgreSQL - performance mailing list archive <http://postgresql.nabble.com/PostgreSQL-performance-f2050081.html> at Nabble.com.
-Gavin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Re: Multi processor server overloads occationally with system process while running postgresql-9.4
From
Andres Freund
Date:
On 2015-10-03 01:39:33 -0700, ajaykbs wrote: > It is observed that at some times during moderate load > the CPU usage goes up to 400% and the users are not able to complete the > queries in expected time. But the load is contributed by some system process > only.The average connections are normally 50. This email is nearly impossible to read. But it sounds a bit like you need to disable transparent hugepages and/or zone_reclaim mode. Greetings, Andres Freund
Re: Multi processor server overloads occationally with system process while running postgresql-9.4
From
ajaykbs
Date:
Sorry about the formatting. I am posting the same lines again. I am working in a public company who uses only open source applications and databases. I have a problem with our critical database which is write and read intensive. version: Postgresql-9.4 Hardware: HP DL980 (8-processor, 80 cores w/o hyper threading, 512GB RAM) Operating system: Red Hat Enterprise Linux Server release 6.4 (Santiago) uname -a : Linux host1 2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64 x86_64 x86_64 GNU/Linux Single database with separate tablespace for main-data, pg_xlog and indexes I have a database having 770GB size and expected to grow to 2TB within the next year. The database was running in a 2processor HP DL560 (16 cores) and as the transactions of the database were found increasing, we have changed the hardware to DL980 with 8 processors and 512GB RAM. Problem It is observed that at some times during moderate load the CPU usage goes up to 400% and the users are not able to complete the queries in expected time. But the load is contributed by some system process only. The average connections are normally 50. But when this happens the connections will shoot up to max-connections. sar command output 07:20:01 IST CPU %user %nice %system %iowait %steal %idle 07:30:01 IST all 0.73 0.00 0.37 0.58 0.00 98.33 07:40:01 IST all 0.66 0.00 0.38 0.65 0.00 98.31 07:50:01 IST all 0.27 0.00 0.27 0.01 0.00 99.45 08:00:01 IST all 0.52 0.00 0.37 0.01 0.00 99.10 08:10:01 IST all 1.54 0.00 0.70 0.02 0.00 97.74 08:20:01 IST all 1.20 0.00 0.67 0.02 0.00 98.10 08:30:01 IST all 1.48 0.00 0.77 0.03 0.00 97.72 08:40:01 IST all 1.69 0.00 0.89 0.04 0.00 97.39 08:50:01 IST all 1.71 0.00 0.94 0.04 0.00 97.31 09:00:01 IST all 1.74 0.00 0.92 0.03 0.00 97.31 09:10:01 IST all 2.32 0.00 1.06 0.04 0.00 96.58 09:20:01 IST all 2.22 0.00 1.17 0.04 0.00 96.57 09:30:02 IST all 2.20 0.00 6.68 0.06 0.00 91.06 09:40:01 IST all 2.43 0.00 1.37 0.06 0.00 96.14 09:50:01 IST all 3.23 0.00 2.06 0.08 0.00 94.63 10:00:02 IST all 3.15 0.00 6.10 0.07 0.00 90.67 10:10:01 IST all 4.94 0.00 5.20 0.29 0.00 89.57 10:20:01 IST all 5.10 0.00 2.13 0.34 0.00 92.43 10:30:01 IST all 5.60 0.00 2.42 0.18 0.00 91.80 10:40:01 IST all 5.28 0.00 14.37 0.19 0.00 80.16 10:50:01 IST all 4.52 0.00 28.48 0.23 0.00 66.77 11:00:01 IST all 5.25 0.00 9.02 0.18 0.00 85.55 11:10:01 IST all 5.77 0.00 4.96 0.27 0.00 89.00 11:20:01 IST all 5.70 0.00 2.74 0.19 0.00 91.37 11:30:01 IST all 5.72 0.00 5.91 0.20 0.00 88.17 11:40:01 IST all 5.66 0.00 2.81 0.37 0.00 91.15 11:50:01 IST all 5.90 0.00 8.80 0.10 0.00 85.19 12:00:01 IST all 6.44 0.00 3.40 0.13 0.00 90.03 12:10:01 IST all 7.18 0.00 4.52 0.11 0.00 88.18 12:20:02 IST all 4.40 0.00 37.84 0.07 0.00 57.70 12:30:01 IST all 5.66 0.00 2.98 0.10 0.00 91.26 12:40:01 IST all 5.74 0.00 3.05 0.11 0.00 91.10 Average: all 1.92 0.00 2.28 0.11 0.00 95.69 Postgresql.conf max_connections = 500 (can be reduced) shared_buffers = 8500MB work_mem = 50MB maintenance_work_mem = 8064MB checkpoint_segments = 132 checkpoint_timeout = 30min checkpoint_completion_target = 0.9 I am not using a connection pooler. This over load happens 5-6 times a day. How to trace the cause of this problem?. My thoughts. 1. some thing related to the numa systems memory management. 2. Some thing related to the size of shared buffers. Please help Ajayakumar.BS -- View this message in context: http://postgresql.nabble.com/Multi-processor-server-overloads-occationally-with-system-process-while-running-postgresql-9-4-tp5868474p5868480.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Re: Multi processor server overloads occationally with system process while running postgresql-9.4
From
ajaykbs
Date:
I have checked the transparent huge pages and zone reclaim mode and those are already disabled. As a trial and error method, I have reduced the shared buffer size from 8500MB to 3000MB. The CPU i/o wait is icreased a little. But the periodical over load has not occurred afterwards. (3 days passed without such situation). I shall report further developments. Thank you all for the great help. -- View this message in context: http://postgresql.nabble.com/Multi-processor-server-overloads-occationally-with-system-process-while-running-postgresql-9-4-tp5868474p5869047.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Re: Re: Multi processor server overloads occationally with system process while running postgresql-9.4
From
Scott Marlowe
Date:
On Tue, Oct 6, 2015 at 11:08 PM, ajaykbs <ajayakumarbs@gmail.com> wrote: > I have checked the transparent huge pages and zone reclaim mode and those are > already disabled. > > As a trial and error method, I have reduced the shared buffer size from > 8500MB to 3000MB. > The CPU i/o wait is icreased a little. But the periodical over load has not > occurred afterwards. (3 days passed without such situation). I shall report > further developments. Reduce max connections to something more reasonable like < 100 and get a connection pooler in place (pgbouncer is simple to setup and use)
Re: Re: Multi processor server overloads occationally with system process while running postgresql-9.4
From
Kevin Grittner
Date:
On Saturday, October 3, 2015 4:36 AM, ajaykbs <ajayakumarbs@gmail.com> wrote: > version: Postgresql-9.4 > Hardware: HP DL980 (8-processor, 80 cores w/o hyper threading, 512GB RAM) > Operating system: Red Hat Enterprise Linux Server release 6.4 (Santiago) > uname -a : Linux host1 2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST > 2013 x86_64 x86_64 x86_64 GNU/Linux Single database with separate tablespace > for main-data, pg_xlog and indexes > > I have a database having 770GB size and expected to grow to 2TB within the > next year. The database was running in a 2processor HP DL560 (16 cores) and > as the transactions of the database were found increasing, we have changed > the hardware to DL980 with 8 processors and 512GB RAM. > > Problem It is observed that at some times during moderate load the CPU > usage goes up to 400% and the users are not able to complete the queries in > expected time. But the load is contributed by some system process only. The > average connections are normally 50. But when this happens the connections > will shoot up to max-connections. You might find this thread interesting: http://www.postgresql.org/message-id/flat/55783940.8080302@wi3ck.info#55783940.8080302@wi3ck.info The short version is that in existing production versions you can easily run in to such symptoms when you get to 8 or more CPU packages. The problem seems to be solved in the development versions of 9.5 (with changes not suitable for back-patching to a stable branch). -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company