Thread: High CPU usage / load average after upgrading to Ubuntu 12.04

High CPU usage / load average after upgrading to Ubuntu 12.04

From

Dan Kogan

Date:

12 February 2013, 17:26:01

Hello,

We upgraded from Ubuntu 11.04 to Ubuntu 12.04 and almost immediately obeserved increased CPU usage and significantly higher load average on our database server.

At the time we were on Postgres 9.0.5. We decided to upgrade to Postgres 9.2 to see if that resolves the issue, but unfortunately it did not.

Just for illustration purposes, below are a few links to cpu and load graphs pre and post upgrade.

https://s3.amazonaws.com/iqtell.ops/Load+Average+Post+Upgrade.png

https://s3.amazonaws.com/iqtell.ops/Load+Average+Pre+Upgrade.png

https://s3.amazonaws.com/iqtell.ops/Server+CPU+Post+Upgrade.png

https://s3.amazonaws.com/iqtell.ops/Server+CPU+Pre+Upgrade.png

We also tried tweaking kernel parameters as mentioned here - http://www.postgresql.org/message-id/50E4AAB1.9040902@optionshouse.com, but have not seen any improvement.

Any advice on how to trace what could be causing the change in CPU usage and load average is appreciated.

Our postgres version is:

PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

OS:

Linux ip-10-189-175-25 3.2.0-37-virtual #58-Ubuntu SMP Thu Jan 24 15:48:03 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Hardware (this an Amazon Ec2 High memory quadruple extra large instance):

8 core Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz

68 GB RAM

RAID10 with 8 drives using xfs

Drives are EBS with provisioned IOPS, with 1000 iops each

Postgres Configuration:

archive_command = rsync -a %p slave:/var/lib/postgresql/replication_load/%f

archive_mode = on

checkpoint_completion_target = 0.9

checkpoint_segments = 64

checkpoint_timeout = 30min

default_text_search_config = pg_catalog.english

external_pid_file = /var/run/postgresql/9.2-main.pid

lc_messages = en_US.UTF-8

lc_monetary = en_US.UTF-8

lc_numeric = en_US.UTF-8

lc_time = en_US.UTF-8

listen_addresses = *

log_checkpoints=on

log_destination=stderr

log_line_prefix = %t [%p]: [%l-1]

log_min_duration_statement =500

max_connections=300

max_stack_depth=2MB

max_wal_senders=5

shared_buffers=4GB

synchronous_commit=off

unix_socket_directory=/var/run/postgresql

wal_keep_segments=128

wal_level=hot_standby

work_mem=8MB

Thanks,

Dan

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Dan Kogan

Date:

12 February 2013, 19:59:31

Thanks for the reply. We are still using postgresql-9.0-801.jdbc4.jar. It seemed to us that this is more related to the OS than the JDBC, version as we had the issue before we upgraded to 9.2.

It might still be worth a try.

Just out of curiosity, has anyone else experienced performance issues (or even tried) with the 9.0 jdbc driver against 9.2 server?

Dan

From: Eric Haertel [mailto:eric.haertel@groupon.com]
Sent: Tuesday, February 12, 2013 12:52 PM
To: Dan Kogan
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04

I don't know if it helps, but I had after update from 8.4 to 9.1 extrem problems with my local test until I changed the JDBC driver to the propper version. I'm not shure if the load occured on the client or the server side as the local integration test run on my machine.

2013/2/12 Dan Kogan <dan@iqtell.com>

Hello,

We upgraded from Ubuntu 11.04 to Ubuntu 12.04 and almost immediately obeserved increased CPU usage and significantly higher load average on our database server.

At the time we were on Postgres 9.0.5. We decided to upgrade to Postgres 9.2 to see if that resolves the issue, but unfortunately it did not.

Just for illustration purposes, below are a few links to cpu and load graphs pre and post upgrade.

https://s3.amazonaws.com/iqtell.ops/Load+Average+Post+Upgrade.png

https://s3.amazonaws.com/iqtell.ops/Load+Average+Pre+Upgrade.png

https://s3.amazonaws.com/iqtell.ops/Server+CPU+Post+Upgrade.png

https://s3.amazonaws.com/iqtell.ops/Server+CPU+Pre+Upgrade.png

We also tried tweaking kernel parameters as mentioned here - http://www.postgresql.org/message-id/50E4AAB1.9040902@optionshouse.com, but have not seen any improvement.

Any advice on how to trace what could be causing the change in CPU usage and load average is appreciated.

Our postgres version is:

PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

OS:

Linux ip-10-189-175-25 3.2.0-37-virtual #58-Ubuntu SMP Thu Jan 24 15:48:03 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Hardware (this an Amazon Ec2 High memory quadruple extra large instance):

8 core Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz

68 GB RAM

RAID10 with 8 drives using xfs

Drives are EBS with provisioned IOPS, with 1000 iops each

Postgres Configuration:

archive_command = rsync -a %p slave:/var/lib/postgresql/replication_load/%f

archive_mode = on

checkpoint_completion_target = 0.9

checkpoint_segments = 64

checkpoint_timeout = 30min

default_text_search_config = pg_catalog.english

external_pid_file = /var/run/postgresql/9.2-main.pid

lc_messages = en_US.UTF-8

lc_monetary = en_US.UTF-8

lc_numeric = en_US.UTF-8

lc_time = en_US.UTF-8

listen_addresses = *

log_checkpoints=on

log_destination=stderr

log_line_prefix = %t [%p]: [%l-1]

log_min_duration_statement =500

max_connections=300

max_stack_depth=2MB

max_wal_senders=5

shared_buffers=4GB

synchronous_commit=off

unix_socket_directory=/var/run/postgresql

wal_keep_segments=128

wal_level=hot_standby

work_mem=8MB

Thanks,

Dan

Eric Härtel
Senior Software Developer

Tel.: +49 (0) 30 240 20 40 35

Mobil: +49 (0) 174 43 38 614
Email: eric.haertel@groupon.com

Groupon GmbH & Co. Service KG | Oberwallstraße 6 | 10117 Berlin
persönlich haftende Gesellschafterin: Groupon Verwaltungs GmbH, HRB 131594 B

Geschäftsführer: Mark S. Hoyt | Bradley Downes | Daniel Köllner
Eingetragen beim Amtsgericht Charlottenburg Berlin, HRA 45265 B | USt.-ID Nr. DE 279 803 459

Attachment

image001.jpg

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Dan Kogan

Date:

13 February 2013, 01:28:55

Hi Will,

Yes, I think we’ve seen some discussions on that. Our servers our hosted on Amazon Ec2 and upgrading the kernel does not seem so straight forward.

We did a benchmark using pgbench on 3.5 vs 3.2 and saw an improvement. Unfortunately our production server would not boot off 3.5 so we had to revert back to 3.2.

At this point we are contemplating whether it’s better to go back to 11.04 or upgrade to 12.10 (which comes with kernel version 3.5).

Any thoughts on that would be appreciated.

Dan

From: Will Ferguson [mailto:WFerguson@northplains.com]
Sent: Tuesday, February 12, 2013 5:20 PM
To: Dan Kogan; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04

Hey Dan,

If I recall correctly there were some discussions on here related to performance issues with the 3.2 kernel. I'm away at the moment so can't dig them out but there have been much discussions lately about kernel performance in 3.2 which don't seem present in 3.4. I'll see if I can find them when I'm next at my desk.

Will

Sent from Samsung Mobile

-------- Original message --------
From: Dan Kogan <dan@iqtell.com>
Date:
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04

Thanks for the reply. We are still using postgresql-9.0-801.jdbc4.jar. It seemed to us that this is more related to the OS than the JDBC, version as we had the issue before we upgraded to 9.2.

It might still be worth a try.

Just out of curiosity, has anyone else experienced performance issues (or even tried) with the 9.0 jdbc driver against 9.2 server?

Dan

2013/2/12 Dan Kogan <dan@iqtell.com>

Hello,

We upgraded from Ubuntu 11.04 to Ubuntu 12.04 and almost immediately obeserved increased CPU usage and significantly higher load average on our database server.

At the time we were on Postgres 9.0.5. We decided to upgrade to Postgres 9.2 to see if that resolves the issue, but unfortunately it did not.

Just for illustration purposes, below are a few links to cpu and load graphs pre and post upgrade.

https://s3.amazonaws.com/iqtell.ops/Load+Average+Post+Upgrade.png

https://s3.amazonaws.com/iqtell.ops/Load+Average+Pre+Upgrade.png

https://s3.amazonaws.com/iqtell.ops/Server+CPU+Post+Upgrade.png

https://s3.amazonaws.com/iqtell.ops/Server+CPU+Pre+Upgrade.png

We also tried tweaking kernel parameters as mentioned here - http://www.postgresql.org/message-id/50E4AAB1.9040902@optionshouse.com, but have not seen any improvement.

Any advice on how to trace what could be causing the change in CPU usage and load average is appreciated.

Our postgres version is:

PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

OS:

Linux ip-10-189-175-25 3.2.0-37-virtual #58-Ubuntu SMP Thu Jan 24 15:48:03 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Hardware (this an Amazon Ec2 High memory quadruple extra large instance):

8 core Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz

68 GB RAM

RAID10 with 8 drives using xfs

Drives are EBS with provisioned IOPS, with 1000 iops each

Postgres Configuration:

archive_command = rsync -a %p slave:/var/lib/postgresql/replication_load/%f

archive_mode = on

checkpoint_completion_target = 0.9

checkpoint_segments = 64

checkpoint_timeout = 30min

default_text_search_config = pg_catalog.english

external_pid_file = /var/run/postgresql/9.2-main.pid

lc_messages = en_US.UTF-8

lc_monetary = en_US.UTF-8

lc_numeric = en_US.UTF-8

lc_time = en_US.UTF-8

listen_addresses = *

log_checkpoints=on

log_destination=stderr

log_line_prefix = %t [%p]: [%l-1]

log_min_duration_statement =500

max_connections=300

max_stack_depth=2MB

max_wal_senders=5

shared_buffers=4GB

synchronous_commit=off

unix_socket_directory=/var/run/postgresql

wal_keep_segments=128

wal_level=hot_standby

work_mem=8MB

Thanks,

Dan

Eric Härtel
Senior Software Developer

Tel.: +49 (0) 30 240 20 40 35

Mobil: +49 (0) 174 43 38 614
Email: eric.haertel@groupon.com

Groupon GmbH & Co. Service KG | Oberwallstraße 6 | 10117 Berlin
persönlich haftende Gesellschafterin: Groupon Verwaltungs GmbH, HRB 131594 B

Geschäftsführer: Mark S. Hoyt | Bradley Downes | Daniel Köllner
Eingetragen beim Amtsgericht Charlottenburg Berlin, HRA 45265 B | USt.-ID Nr. DE 279 803 459

Attachment

image001.jpg

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Josh Berkus

Date:

13 February 2013, 19:24:49

On 02/12/2013 05:28 PM, Dan Kogan wrote:
> Hi Will,
>
> Yes, I think we've seen some discussions on that.  Our servers our hosted on Amazon Ec2 and upgrading the kernel does
notseem so straight forward. 
> We did a benchmark using pgbench on 3.5 vs 3.2 and saw an improvement.  Unfortunately our production server would not
bootoff 3.5 so we had to revert back to 3.2. 
>
> At this point we are contemplating whether it's better to go back to 11.04 or upgrade to 12.10 (which comes with
kernelversion 3.5). 
> Any thoughts on that would be appreciated.

I have a machine running the same version of Ubuntu.  I'll run some
tests and tell you what I find.


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Josh Berkus

Date:

14 February 2013, 00:26:00

On 02/13/2013 11:24 AM, Josh Berkus wrote:
> On 02/12/2013 05:28 PM, Dan Kogan wrote:
>> Hi Will,
>>
>> Yes, I think we've seen some discussions on that.  Our servers our hosted on Amazon Ec2 and upgrading the kernel
doesnot seem so straight forward. 
>> We did a benchmark using pgbench on 3.5 vs 3.2 and saw an improvement.  Unfortunately our production server would
notboot off 3.5 so we had to revert back to 3.2. 
>>
>> At this point we are contemplating whether it's better to go back to 11.04 or upgrade to 12.10 (which comes with
kernelversion 3.5). 
>> Any thoughts on that would be appreciated.
>
> I have a machine running the same version of Ubuntu.  I'll run some
> tests and tell you what I find.

So I'm running a pgbench.  However, I don't really have anything to
compare the stats I'm seeing.  CPU usage and load average was high (load
7.9), but that was on -j 8 -c 32, with a TPS of 8500.

What numbers are you seeing, exactly?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Dan Kogan

Date:

14 February 2013, 01:30:52

Just to be clear - I was describing the current situation in our production.

We were running pgbench on different Ununtu versions today.  I don’t have 12.04 setup at the moment, but I do have
12.10,which seems to be performing about the same as 12.04 in our tests with pgbench.

Running pgbench with 8 jobs and 32 clients resulted in load average of about 15 and TPS was 51350.

Question - how many cores does your server have?  Ours has 8 cores.

Thanks,
Dan 

-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Josh Berkus
Sent: Wednesday, February 13, 2013 7:26 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04

On 02/13/2013 11:24 AM, Josh Berkus wrote:
> On 02/12/2013 05:28 PM, Dan Kogan wrote:
>> Hi Will,
>>
>> Yes, I think we've seen some discussions on that.  Our servers our hosted on Amazon Ec2 and upgrading the kernel
doesnot seem so straight forward.

>> We did a benchmark using pgbench on 3.5 vs 3.2 and saw an improvement.  Unfortunately our production server would
notboot off 3.5 so we had to revert back to 3.2.

>>
>> At this point we are contemplating whether it's better to go back to 11.04 or upgrade to 12.10 (which comes with
kernelversion 3.5).

>> Any thoughts on that would be appreciated.
> 
> I have a machine running the same version of Ubuntu.  I'll run some 
> tests and tell you what I find.

So I'm running a pgbench.  However, I don't really have anything to compare the stats I'm seeing.  CPU usage and load
averagewas high (load 7.9), but that was on -j 8 -c 32, with a TPS of 8500.

What numbers are you seeing, exactly?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Merlin Moncure

Date:

14 February 2013, 14:07:56

On Tue, Feb 12, 2013 at 11:25 AM, Dan Kogan <dan@iqtell.com> wrote:
> Hello,
>
>
>
> We upgraded from Ubuntu 11.04 to Ubuntu 12.04 and almost immediately
> obeserved increased CPU usage and significantly higher load average on our
> database server.
>
> At the time we were on Postgres 9.0.5.  We decided to upgrade to Postgres
> 9.2 to see if that resolves the issue, but unfortunately it did not.
>
>
>
> Just for illustration purposes, below are a few links to cpu and load graphs
> pre and post upgrade.
>
>
>
> https://s3.amazonaws.com/iqtell.ops/Load+Average+Post+Upgrade.png
>
> https://s3.amazonaws.com/iqtell.ops/Load+Average+Pre+Upgrade.png
>
>
>
> https://s3.amazonaws.com/iqtell.ops/Server+CPU+Post+Upgrade.png
>
> https://s3.amazonaws.com/iqtell.ops/Server+CPU+Pre+Upgrade.png
>
>
>
> We also tried tweaking kernel parameters as mentioned here -
> http://www.postgresql.org/message-id/50E4AAB1.9040902@optionshouse.com, but
> have not seen any improvement.
>
>
>
>
>
> Any advice on how to trace what could be causing the change in CPU usage and
> load average is appreciated.
>
>
>
> Our postgres version is:
>
>
>
> PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro
> 4.6.3-1ubuntu5) 4.6.3, 64-bit
>
>
>
> OS:
>
>
>
> Linux ip-10-189-175-25 3.2.0-37-virtual #58-Ubuntu SMP Thu Jan 24 15:48:03
> UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
>
>
>
> Hardware (this an Amazon Ec2 High memory quadruple extra large instance):
>
>
>
> 8 core Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
>
> 68 GB RAM
>
> RAID10 with 8 drives using xfs
>
> Drives are EBS with provisioned IOPS, with 1000 iops each
>
>
>
> Postgres Configuration:
>
>
>
> archive_command = rsync -a %p slave:/var/lib/postgresql/replication_load/%f
>
> archive_mode = on
>
> checkpoint_completion_target = 0.9
>
> checkpoint_segments = 64
>
> checkpoint_timeout = 30min
>
> default_text_search_config = pg_catalog.english
>
> external_pid_file = /var/run/postgresql/9.2-main.pid
>
> lc_messages = en_US.UTF-8
>
> lc_monetary = en_US.UTF-8
>
> lc_numeric = en_US.UTF-8
>
> lc_time = en_US.UTF-8
>
> listen_addresses = *
>
> log_checkpoints=on
>
> log_destination=stderr
>
> log_line_prefix = %t [%p]: [%l-1]
>
> log_min_duration_statement =500
>
> max_connections=300
>
> max_stack_depth=2MB
>
> max_wal_senders=5
>
> shared_buffers=4GB
>
> synchronous_commit=off
>
> unix_socket_directory=/var/run/postgresql
>
> wal_keep_segments=128
>
> wal_level=hot_standby
>
> work_mem=8MB

does your application have a lot of concurrency?  history has shown
that postgres is highly sensitive to changes in the o/s scheduler
(which changes a lot from release to release).

also check this:
zone reclaim (http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html)

merlin

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Dan Kogan

Date:

14 February 2013, 17:27:27

Thanks for the info.
Our application does have a lot of concurrency.  We checked the zone reclaim parameter and it is turn off (that was the
default,we did not have to change it). 

Dan

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com]
Sent: Thursday, February 14, 2013 9:08 AM
To: Dan Kogan
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04

On Tue, Feb 12, 2013 at 11:25 AM, Dan Kogan <dan@iqtell.com> wrote:
> Hello,
>
>
>
> We upgraded from Ubuntu 11.04 to Ubuntu 12.04 and almost immediately
> obeserved increased CPU usage and significantly higher load average on
> our database server.
>
> At the time we were on Postgres 9.0.5.  We decided to upgrade to
> Postgres
> 9.2 to see if that resolves the issue, but unfortunately it did not.
>
>
>
> Just for illustration purposes, below are a few links to cpu and load
> graphs pre and post upgrade.
>
>
>
> https://s3.amazonaws.com/iqtell.ops/Load+Average+Post+Upgrade.png
>
> https://s3.amazonaws.com/iqtell.ops/Load+Average+Pre+Upgrade.png
>
>
>
> https://s3.amazonaws.com/iqtell.ops/Server+CPU+Post+Upgrade.png
>
> https://s3.amazonaws.com/iqtell.ops/Server+CPU+Pre+Upgrade.png
>
>
>
> We also tried tweaking kernel parameters as mentioned here -
> http://www.postgresql.org/message-id/50E4AAB1.9040902@optionshouse.com
> , but have not seen any improvement.
>
>
>
>
>
> Any advice on how to trace what could be causing the change in CPU
> usage and load average is appreciated.
>
>
>
> Our postgres version is:
>
>
>
> PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc
> (Ubuntu/Linaro
> 4.6.3-1ubuntu5) 4.6.3, 64-bit
>
>
>
> OS:
>
>
>
> Linux ip-10-189-175-25 3.2.0-37-virtual #58-Ubuntu SMP Thu Jan 24
> 15:48:03 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
>
>
>
> Hardware (this an Amazon Ec2 High memory quadruple extra large instance):
>
>
>
> 8 core Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
>
> 68 GB RAM
>
> RAID10 with 8 drives using xfs
>
> Drives are EBS with provisioned IOPS, with 1000 iops each
>
>
>
> Postgres Configuration:
>
>
>
> archive_command = rsync -a %p
> slave:/var/lib/postgresql/replication_load/%f
>
> archive_mode = on
>
> checkpoint_completion_target = 0.9
>
> checkpoint_segments = 64
>
> checkpoint_timeout = 30min
>
> default_text_search_config = pg_catalog.english
>
> external_pid_file = /var/run/postgresql/9.2-main.pid
>
> lc_messages = en_US.UTF-8
>
> lc_monetary = en_US.UTF-8
>
> lc_numeric = en_US.UTF-8
>
> lc_time = en_US.UTF-8
>
> listen_addresses = *
>
> log_checkpoints=on
>
> log_destination=stderr
>
> log_line_prefix = %t [%p]: [%l-1]
>
> log_min_duration_statement =500
>
> max_connections=300
>
> max_stack_depth=2MB
>
> max_wal_senders=5
>
> shared_buffers=4GB
>
> synchronous_commit=off
>
> unix_socket_directory=/var/run/postgresql
>
> wal_keep_segments=128
>
> wal_level=hot_standby
>
> work_mem=8MB

does your application have a lot of concurrency?  history has shown that postgres is highly sensitive to changes in the
o/sscheduler (which changes a lot from release to release). 

also check this:
zone reclaim (http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html)

merlin

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Josh Berkus

Date:

14 February 2013, 18:38:22

On 02/13/2013 05:30 PM, Dan Kogan wrote:
> Just to be clear - I was describing the current situation in our production.
>
> We were running pgbench on different Ununtu versions today.  I don’t have 12.04 setup at the moment, but I do have
12.10,which seems to be performing about the same as 12.04 in our tests with pgbench. 
> Running pgbench with 8 jobs and 32 clients resulted in load average of about 15 and TPS was 51350.

What size database?

>
> Question - how many cores does your server have?  Ours has 8 cores.

32

I suppose I could throw multiple pgbenches at it.  I just dont' see the
load numbers as unusual, but I don't have a similar pre-12.04 server to
compare with.


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Dan Kogan

Date:

14 February 2013, 20:41:42

We used scale factor of 3600.  
Yeah, maybe other people see similar load average, we were not sure.
However, we saw a clear difference right after the upgrade.  
We are trying to determine whether it makes sense for us to go to 11.04 or maybe there is something here we are
missing.

-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Josh Berkus
Sent: Thursday, February 14, 2013 1:38 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04

On 02/13/2013 05:30 PM, Dan Kogan wrote:
> Just to be clear - I was describing the current situation in our production.
> 
> We were running pgbench on different Ununtu versions today.  I don’t have 12.04 setup at the moment, but I do have
12.10,which seems to be performing about the same as 12.04 in our tests with pgbench.

> Running pgbench with 8 jobs and 32 clients resulted in load average of about 15 and TPS was 51350.

What size database?

> 
> Question - how many cores does your server have?  Ours has 8 cores.

32

I suppose I could throw multiple pgbenches at it.  I just dont' see the load numbers as unusual, but I don't have a
similarpre-12.04 server to compare with.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Josh Berkus

Date:

14 February 2013, 23:58:00

On 02/14/2013 12:41 PM, Dan Kogan wrote:
> We used scale factor of 3600.
> Yeah, maybe other people see similar load average, we were not sure.
> However, we saw a clear difference right after the upgrade.
> We are trying to determine whether it makes sense for us to go to 11.04 or maybe there is something here we are
missing.

Well, I'm seeing a higher system % on CPU than I expect (around 15% on
each core), and a MUCH higher context-switch than I expect (up to 500K).
 Is that anything like you're seeing?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Dan Kogan

Date:

15 February 2013, 04:32:27

Yes, we are seeing higher system % on the CPU, not sure how to quantify in terms of % right now - will check into that
tomorrow.
We were not checking the context switch numbers during our benchmark, will check that tomorrow as well.

-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Josh Berkus
Sent: Thursday, February 14, 2013 6:58 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04

On 02/14/2013 12:41 PM, Dan Kogan wrote:
> We used scale factor of 3600.  
> Yeah, maybe other people see similar load average, we were not sure.
> However, we saw a clear difference right after the upgrade.  
> We are trying to determine whether it makes sense for us to go to 11.04 or maybe there is something here we are
missing.

Well, I'm seeing a higher system % on CPU than I expect (around 15% on each core), and a MUCH higher context-switch
thanI expect (up to 500K).

 Is that anything like you're seeing?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Scott Marlowe

Date:

15 February 2013, 04:47:48

If you run your benchmarks for more than a few minutes I highly
recommend enabling sysstat service data collection, then you can look
at it after the fact with sar.  VERY useful stuff both for
benchmarking and post mortem on live servers.

On Thu, Feb 14, 2013 at 9:32 PM, Dan Kogan <dan@iqtell.com> wrote:
> Yes, we are seeing higher system % on the CPU, not sure how to quantify in terms of % right now - will check into
thattomorrow. 
> We were not checking the context switch numbers during our benchmark, will check that tomorrow as well.
>
> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Josh Berkus
> Sent: Thursday, February 14, 2013 6:58 PM
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04
>
> On 02/14/2013 12:41 PM, Dan Kogan wrote:
>> We used scale factor of 3600.
>> Yeah, maybe other people see similar load average, we were not sure.
>> However, we saw a clear difference right after the upgrade.
>> We are trying to determine whether it makes sense for us to go to 11.04 or maybe there is something here we are
missing.
>
> Well, I'm seeing a higher system % on CPU than I expect (around 15% on each core), and a MUCH higher context-switch
thanI expect (up to 500K). 
>  Is that anything like you're seeing?
>
> --
> Josh Berkus
> PostgreSQL Experts Inc.
> http://pgexperts.com
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



--
To understand recursion, one must first understand recursion.

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Josh Berkus

Date:

15 February 2013, 18:26:15

On 02/14/2013 08:47 PM, Scott Marlowe wrote:
> If you run your benchmarks for more than a few minutes I highly
> recommend enabling sysstat service data collection, then you can look
> at it after the fact with sar.  VERY useful stuff both for
> benchmarking and post mortem on live servers.

Well, background sar, by default on Linux, only collects every 30min.
For a benchmark run, you want to generate your own sar file, for example:

sar -o hddrun2.sar -A 10 90 &

which says "collect all stats every 10 seconds and write them to the
file hddrun2.sar for 15 minutes"


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Scott Marlowe

Date:

15 February 2013, 18:52:37

On Fri, Feb 15, 2013 at 11:26 AM, Josh Berkus <josh@agliodbs.com> wrote:
> On 02/14/2013 08:47 PM, Scott Marlowe wrote:
>> If you run your benchmarks for more than a few minutes I highly
>> recommend enabling sysstat service data collection, then you can look
>> at it after the fact with sar.  VERY useful stuff both for
>> benchmarking and post mortem on live servers.
>
> Well, background sar, by default on Linux, only collects every 30min.
> For a benchmark run, you want to generate your own sar file, for example:

On all my machines (debian and ubuntu) it collects every 5.

> sar -o hddrun2.sar -A 10 90 &
>
> which says "collect all stats every 10 seconds and write them to the
> file hddrun2.sar for 15 minutes"

Not a bad idea. esp when benchmarking.

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Josh Berkus

Date:

19 February 2013, 00:19:32

So, our drop in performance is now clearly due to pathological OS
behavior during checkpoints.  Still trying to pin down what's going on,
but it's not system load; it's clearly related to the IO system.

Anyone else see this?  I'm getting it both on 3.2 and 3.4.  We're using
LSI Megaraid.


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Josh Berkus

Date:

19 February 2013, 00:40:08

Scott,

> So do you have generally slow IO, or is it fsync behavior etc?

All tests except pgBench show this system as superfast.  Bonnie++ and DD
tests are good (200 to 300mb/s), and test_fsync shows 14K/second.
Basically it has no issues until checkpoint kicks in, at which time the
entire system basically halts for the duration of the checkpoint.

For that matter, if I run a pgbench and halt it just before checkpoint
kicks in, I get around 12000TPS, which is what I'd expect on this system.

At this point, we've tried 3.2.0.26, 3.2.0.27, 3.4.0, and tried updating
the RAID driver, and changing the IO scheduler.  Nothing seems to affect
the behavior.   Testing using Ext4 (instead of XFS) next.


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Jon Nelson

Date:

19 February 2013, 00:47:19

On Mon, Feb 18, 2013 at 6:39 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Scott,
>
>> So do you have generally slow IO, or is it fsync behavior etc?
>
> All tests except pgBench show this system as superfast.  Bonnie++ and DD
> tests are good (200 to 300mb/s), and test_fsync shows 14K/second.
> Basically it has no issues until checkpoint kicks in, at which time the
> entire system basically halts for the duration of the checkpoint.
>
> For that matter, if I run a pgbench and halt it just before checkpoint
> kicks in, I get around 12000TPS, which is what I'd expect on this system.
>
> At this point, we've tried 3.2.0.26, 3.2.0.27, 3.4.0, and tried updating
> the RAID driver, and changing the IO scheduler.  Nothing seems to affect
> the behavior.   Testing using Ext4 (instead of XFS) next.

Did you try turning barriers on or off *manually* (explicitly)? With
LSI and barriers *on* and ext4 I had less-optimal performance. With
Linux MD or (some) 3Ware configurations I had no performance hit.

--
Jon

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Josh Berkus

Date:

19 February 2013, 00:52:09

> Did you try turning barriers on or off *manually* (explicitly)? With
> LSI and barriers *on* and ext4 I had less-optimal performance. With
> Linux MD or (some) 3Ware configurations I had no performance hit.

They're off in fstab.

/dev/sdd1 on /data type xfs (rw,noatime,nodiratime,nobarrier)


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Scott Marlowe

Date:

19 February 2013, 03:41:30

On Mon, Feb 18, 2013 at 5:39 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Scott,
>
>> So do you have generally slow IO, or is it fsync behavior etc?
>
> All tests except pgBench show this system as superfast.  Bonnie++ and DD
> tests are good (200 to 300mb/s), and test_fsync shows 14K/second.
> Basically it has no issues until checkpoint kicks in, at which time the
> entire system basically halts for the duration of the checkpoint.

I assume you've made attemtps at write levelling to reduce impacts of
checkpoints etc.

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Mark Kirkwood

Date:

19 February 2013, 04:23:29

On 19/02/13 13:39, Josh Berkus wrote:
> Scott,
>
>> So do you have generally slow IO, or is it fsync behavior etc?
> All tests except pgBench show this system as superfast.  Bonnie++ and DD
> tests are good (200 to 300mb/s), and test_fsync shows 14K/second.
> Basically it has no issues until checkpoint kicks in, at which time the
> entire system basically halts for the duration of the checkpoint.
>
> For that matter, if I run a pgbench and halt it just before checkpoint
> kicks in, I get around 12000TPS, which is what I'd expect on this system.
>
> At this point, we've tried 3.2.0.26, 3.2.0.27, 3.4.0, and tried updating
> the RAID driver, and changing the IO scheduler.  Nothing seems to affect
> the behavior.   Testing using Ext4 (instead of XFS) next.
>
>

Might be worth looking at your vm.dirty_ratio, vm.dirty_background_ratio
and friends settings. We managed to choke up a system with 16x SSD by
leaving them at their defaults...

Cheers

Mark

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Josh Berkus

Date:

19 February 2013, 17:51:29

On 02/18/2013 08:28 PM, Mark Kirkwood wrote:
> Might be worth looking at your vm.dirty_ratio, vm.dirty_background_ratio
> and friends settings. We managed to choke up a system with 16x SSD by
> leaving them at their defaults...

Yeah?  Any settings you'd recommend specifically?  What did you use on
the SSD system?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Mark Kirkwood

Date:

19 February 2013, 23:12:48

On 20/02/13 06:51, Josh Berkus wrote:
> On 02/18/2013 08:28 PM, Mark Kirkwood wrote:
>> Might be worth looking at your vm.dirty_ratio, vm.dirty_background_ratio
>> and friends settings. We managed to choke up a system with 16x SSD by
>> leaving them at their defaults...
> Yeah?  Any settings you'd recommend specifically?  What did you use on
> the SSD system?
>

We set:

vm.dirty_background_ratio = 0
vm.dirty_background_bytes = 1073741824
vm.dirty_ratio = 0
vm.dirty_bytes = 2147483648

i.e 1G for dirty_background and 2G for dirty. We didn't spend much time
afterwards fiddling with the size much. I'm guessing the we could have
made it bigger - however the SSD were happier to be constantly writing a
few G than being handed (say) 50G of buffers to write at once . The
system has 512G of ram and 32 cores (no hyperthreading).

regards

Mark

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Josh Berkus

Date:

19 February 2013, 23:24:38

On 02/19/2013 09:51 AM, Josh Berkus wrote:
> On 02/18/2013 08:28 PM, Mark Kirkwood wrote:
>> Might be worth looking at your vm.dirty_ratio, vm.dirty_background_ratio
>> and friends settings. We managed to choke up a system with 16x SSD by
>> leaving them at their defaults...
>
> Yeah?  Any settings you'd recommend specifically?  What did you use on
> the SSD system?
>

NM, I tested lowering dirty_background_ratio, and it didn't help,
because checkpoints are kicking in before pdflush ever gets there.

So the issue seems to be that if you have this combination of factors:

1. large RAM
2. many/fast CPUs
3. a database which fits in RAM but is larger than the RAID controller's
WB cache
4. pg_xlog on the same volume as pgdata

... then you'll see checkpoint "stalls" and spread checkpoint will
actually make them worse by making the stalls longer.

Moving pg_xlog to a separate partition makes this better.  Making
bgwriter more aggressive helps a bit more on top of that.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Mark Kirkwood

Date:

19 February 2013, 23:33:05

On 20/02/13 12:24, Josh Berkus wrote:
>
> NM, I tested lowering dirty_background_ratio, and it didn't help,
> because checkpoints are kicking in before pdflush ever gets there.
>
> So the issue seems to be that if you have this combination of factors:
>
> 1. large RAM
> 2. many/fast CPUs
> 3. a database which fits in RAM but is larger than the RAID controller's
> WB cache
> 4. pg_xlog on the same volume as pgdata
>
> ... then you'll see checkpoint "stalls" and spread checkpoint will
> actually make them worse by making the stalls longer.
>
> Moving pg_xlog to a separate partition makes this better.  Making
> bgwriter more aggressive helps a bit more on top of that.
>

We have pg_xlog on a pair of PCIe SSD. Also we running the deadline io
scheduler.

Regards

Mark

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Scott Marlowe

Date:

20 February 2013, 03:15:26

On Tue, Feb 19, 2013 at 4:24 PM, Josh Berkus <josh@agliodbs.com> wrote:
> ... then you'll see checkpoint "stalls" and spread checkpoint will
> actually make them worse by making the stalls longer.

Wait, if they're spread enough then there won't be a checkpoint, so to
speak.  Are you saying that spreading them out means that they still
kind of pile up, even with say a completion target of 1.0 etc?

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Josh Berkus

Date:

20 February 2013, 22:44:46

On 02/19/2013 07:15 PM, Scott Marlowe wrote:
> On Tue, Feb 19, 2013 at 4:24 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> ... then you'll see checkpoint "stalls" and spread checkpoint will
>> actually make them worse by making the stalls longer.
>
> Wait, if they're spread enough then there won't be a checkpoint, so to
> speak.  Are you saying that spreading them out means that they still
> kind of pile up, even with say a completion target of 1.0 etc?

I'm saying that spreading them makes things worse, because they get
intermixed with the fsyncs for the WAL and causes commits to stall.  I
tried setting checkpoint_completion_target = 0.0 and throughput got
about 10% better.

I'm beginning to think that checkpoint_completion_target should be 0.0,
by default.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Josh Berkus

Date:

21 February 2013, 03:14:21

> Sounds to me like your IO system is stalling on fsyncs or something
> like that.  On machines with plenty of IO cranking up completion
> target usuall smooths things out.

It certainly seems like it does.  However, I can't demonstrate the issue
using any simpler tool than pgbench ... even running four test_fsyncs in
parallel didn't show any issues, nor do standard FS testing tools.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Jeff Frost

Date:

25 February 2013, 22:30:08

On 02/20/13 19:14, Josh Berkus wrote:
>> Sounds to me like your IO system is stalling on fsyncs or something
>> like that.  On machines with plenty of IO cranking up completion
>> target usuall smooths things out.
> It certainly seems like it does.  However, I can't demonstrate the issue
> using any simpler tool than pgbench ... even running four test_fsyncs in
> parallel didn't show any issues, nor do standard FS testing tools.
>

We were really starting to think that the system had an IO problem that we
couldn't tickle with any synthetic tools.  Then one of our other customers who
upgraded to Ubuntu 12.04 LTS and is also experiencing issues came across the
following LKML thread regarding pdflush on 3.0+ kernels:

https://lkml.org/lkml/2012/10/9/210

So, I went and built a couple custom kernels with this patch removed:

https://patchwork.kernel.org/patch/825212/

and the bad behavior stopped.   Best performance was with a 3.5 kernel with
the patch removed.

--
Jeff Frost <jeff@pgexperts.com>
CTO, PostgreSQL Experts, Inc.
Phone: 1-888-PG-EXPRT x506
FAX: 415-762-5122
http://www.pgexperts.com/

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Jeff Janes

Date:

26 February 2013, 21:30:46

On Fri, Feb 15, 2013 at 10:52 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Fri, Feb 15, 2013 at 11:26 AM, Josh Berkus <josh@agliodbs.com> wrote:
> On 02/14/2013 08:47 PM, Scott Marlowe wrote:
>> If you run your benchmarks for more than a few minutes I highly
>> recommend enabling sysstat service data collection, then you can look
>> at it after the fact with sar. VERY useful stuff both for
>> benchmarking and post mortem on live servers.
>
> Well, background sar, by default on Linux, only collects every 30min.
> For a benchmark run, you want to generate your own sar file, for example:

On all my machines (debian and ubuntu) it collects every 5.

All of mine were 10, but once I figured out to edit /etc/cron.d/sysstat they are now every 1 minute.

sar has some remarkably opaque documentation, but I'm glad I tracked that down.

Cheers,

Jeff

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Scott Marlowe

Date:

26 February 2013, 21:46:34

On Tue, Feb 26, 2013 at 2:30 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Fri, Feb 15, 2013 at 10:52 AM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Feb 15, 2013 at 11:26 AM, Josh Berkus <josh@agliodbs.com> wrote:
>> > On 02/14/2013 08:47 PM, Scott Marlowe wrote:
>> >> If you run your benchmarks for more than a few minutes I highly
>> >> recommend enabling sysstat service data collection, then you can look
>> >> at it after the fact with sar.  VERY useful stuff both for
>> >> benchmarking and post mortem on live servers.
>> >
>> > Well, background sar, by default on Linux, only collects every 30min.
>> > For a benchmark run, you want to generate your own sar file, for
>> > example:
>>
>> On all my machines (debian and ubuntu) it collects every 5.
>
>
> All of mine were 10, but once I figured out to edit /etc/cron.d/sysstat they
> are now every 1 minute.

oh yeah it's every 10 on the 5s.  I too need to go to 1minute intervals.

> sar has some remarkably opaque documentation, but I'm glad I tracked that
> down.

It's so incredibly useful.  When a machine is acting up often getting
it back online is more important than fixing it right then, and most
of the system state stuff is lost on reboot / fix.

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Scott Marlowe

Date:

28 February 2013, 17:00:00

On Wed, Feb 20, 2013 at 3:44 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 02/19/2013 07:15 PM, Scott Marlowe wrote:
>> On Tue, Feb 19, 2013 at 4:24 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> ... then you'll see checkpoint "stalls" and spread checkpoint will
>>> actually make them worse by making the stalls longer.
>>
>> Wait, if they're spread enough then there won't be a checkpoint, so to
>> speak.  Are you saying that spreading them out means that they still
>> kind of pile up, even with say a completion target of 1.0 etc?
>
> I'm saying that spreading them makes things worse, because they get
> intermixed with the fsyncs for the WAL and causes commits to stall.  I
> tried setting checkpoint_completion_target = 0.0 and throughput got
> about 10% better.

Sounds to me like your IO system is stalling on fsyncs or something
like that.  On machines with plenty of IO cranking up completion
target usuall smooths things out.  I've got some new big servers
coming in at work over the next few months so I'm gonna test and
compare Ubuntu 10.04 and 12.04 and see if I can see this behaviour.
We have a 12.04 machine in production but honestly it's not working
very hard right now.  But it's in production so I can't benchmark it
without causing problems.

Re: High CPU usage / load average after upgrading to Ubuntu 12.04

From

Glyn Astill

Date:

28 February 2013, 17:13:11

> From: Josh Berkus <josh@agliodbs.com>
>To: Scott Marlowe <scott.marlowe@gmail.com>
>Cc: pgsql-performance@postgresql.org
>Sent: Thursday, 21 February 2013, 3:14
>Subject: Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04
>
>
>> Sounds to me like your IO system is stalling on fsyncs or something
>> like that.  On machines with plenty of IO cranking up completion
>> target usuall smooths things out.
>
>It certainly seems like it does.  However, I can't demonstrate the issue
>using any simpler tool than pgbench ... even running four test_fsyncs in
>parallel didn't show any issues, nor do standard FS testing tools.
>


I've missed a load of this thread and just scanned through what I can see, so apologies if I'm repeating anything.

If the suspicion is the IO system and you've tuned everything you can think of; is there anything interesting
in meminfo/iostat/vmstatbefore/during the stalls? If so can you cause anything similar via bonnie++ with the "-b"
option?