Home > mailing lists

Random performance hit, unknown cause. - Mailing list pgsql-performance

From	Brian Fehrle
Subject	Random performance hit, unknown cause.
Date	April 12, 2012 15:41:46
Msg-id	4F8721C4.8090300@consistentstate.com Whole thread Raw
Responses	Re: Random performance hit, unknown cause. Re: Random performance hit, unknown cause.
List	pgsql-performance

Tree view

Hi all,

OS: Linux 64 bit 2.6.32
PostgreSQL 9.0.5 installed from Ubuntu packages.
8 CPU cores
64 GB system memory
Database cluster is on raid 10 direct attached drive, using a HP p800 controller card.

I have a system that has been having occasional performance hits, where the load on the system skyrockets, all queries take longer to execute and a hot standby slave I have set up via streaming replication starts to get behind. I'm having trouble pinpointing where the exact issue is.

This morning, during our nightly backup process (where we grab a copy of the data directory), we started having this same issue. The main thing that I see in all of these is a high disk wait on the system. When we are performing 'well', the %wa from top is usually around 30%, and our load is around 12 - 15. This morning we saw a load 21 - 23, and an %wa jumping between 60% and 75%.

The top process pretty much at all times is the WAL Sender Process, is this normal?

From what I can tell, my access patterns on the database has not changed, same average number of inserts, updates, deletes, and had nothing on the system changed in any way. No abnormal autovacuum processes that aren't normally already running.

So what things can I do to track down what an issue is? Currently the system has returned to a 'good' state, and performance looks great. But I would like to know how to prevent this, as well as be able to grab good stats if it does happen again in the future.

Has anyone had any issues with the HP p800 controller card in a postgres environment? Is there anything that can help us maximise the performance to disk in this case, as it seems to be one of our major bottlenecks? I do plan on moving the pg_xlog to a separate drive down the road, the cluster is extremely active so that will help out a ton.

some IO stats:

$ iostat -d -x 5 3
Device:        rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await svctm %util
dev1        1.99    75.24 651.06 438.04 41668.57 8848.18    46.38     0.60    3.68   0.70 76.36
dev2            0.00     0.00 653.05 513.43 41668.57 8848.18    43.31     2.18    4.78   0.65 76.35

Device:        rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await svctm %util
dev1        0.00    35.20 676.20 292.00 35105.60 5688.00    42.13    67.76   70.73   1.03 100.00
dev2            0.00     0.00 671.80 295.40 35273.60 4843.20    41.48    73.41   76.62   1.03 100.00

Device:        rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await svctm %util
dev1          1.20    40.80 865.40 424.80 51355.20 8231.00    46.18    37.87   29.22   0.77 99.80
dev2            0.00     0.00 867.40 465.60 51041.60 8231.00    44.47    38.28   28.58   0.75 99.80

Thanks in advance,
Brian F

pgsql-performance by date:

From: Steve Crawford
Date: 12 April 2012, 12:48:16
Subject: Re: Linux machine aggressively clearing cache

From: Claudio Freire
Date: 12 April 2012, 15:50:02
Subject: Re: Random performance hit, unknown cause.

Random performance hit, unknown cause. - Mailing list pgsql-performance

Previous

Next