Thread: High load,

High load,

From
Michael Kohl
Date:
Hi all,

we are running a fairly big Ruby on Rails application on Postgres 8.4.
Our traffic grew quite a bit lately, and since then we are facing DB
performance issues. System load occasionally explodes (around 170
yesterday on a 16 core system), which seems to be caused by disk I/O
(iowait in our Munin graphs goes up significantly during these
periods). At other times the laod stays rather low under pretty much
the same circumstances.

There are 6 application servers with 18 unicorns each, as well as 12
beanstalk workers talking to the DB. I know the problem description is
very vague, but so far we haven't consistently managed to reproduce
the problem. Turning of the beanstalk workers usually leads to a great
decreases in writes and system loads, but during yesterday's debugging
session they obviously ran fine (thanks, Murphy).

Below you'll find our system information and Postgres config, maybe
someone could be so kind as to point out any obvious flaws in our
current configuration while I'm trying to get a better description of
the underlying problem.

Postgres version: 8.4.6

Number of logical CPUs: 16 (4x Quadcore Xeon E5520  @ 2.27GHz)

RAM: 16GB

             total       used       free     shared    buffers     cached
Mem:      16461012   16399520      61492          0      72392   12546112
-/+ buffers/cache:    3781016   12679996
Swap:       999992     195336     804656

HDD: 2x 120 GB OCZ Vertex 2 SSD; RAID 1

Concurrent connections (according to our monitoring tool): 7 (min), 74
(avg), 197 (max)

Our config (all other settings at default value):

max_connections = 200
ssl = true
shared_buffers = 4096MB
work_mem = 256MB
maintenance_work_mem = 512MB
synchronous_commit = off
wal_buffers = 8MB
checkpoint_segments = 30
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
random_page_cost = 2.0
effective_cache_size = 8192MB
logging_collector = on
log_directory = '/var/log/postgresql'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_min_duration_statement = 1000
log_connections = on
log_disconnections = on
log_line_prefix = '%t '
datestyle = 'iso, mdy'
gin_fuzzy_search_limit = 10000

The config options are a mix of the article "Configuring PostgreSQL
for Pretty Good Performance" [1] and the talk "PostgreSQL as a secret
weapon for high-performance Ruby on Rails applications" [2].

Thanks,
Michael

[1] http://www.linux.com/learn/tutorials/394523-configuring-postgresql-for-pretty-good-performance
[2] http://www.pgcon.org/2010/schedule/events/210.en.html

Re: High load,

From
Cédric Villemain
Date:
2011/1/27 Michael Kohl <michael.kohl@tupalo.com>:
> Hi all,
>
> we are running a fairly big Ruby on Rails application on Postgres 8.4.
> Our traffic grew quite a bit lately, and since then we are facing DB
> performance issues. System load occasionally explodes (around 170
> yesterday on a 16 core system), which seems to be caused by disk I/O
> (iowait in our Munin graphs goes up significantly during these
> periods). At other times the laod stays rather low under pretty much
> the same circumstances.
>
> There are 6 application servers with 18 unicorns each, as well as 12
> beanstalk workers talking to the DB. I know the problem description is
> very vague, but so far we haven't consistently managed to reproduce
> the problem. Turning of the beanstalk workers usually leads to a great
> decreases in writes and system loads, but during yesterday's debugging
> session they obviously ran fine (thanks, Murphy).
>
> Below you'll find our system information and Postgres config, maybe
> someone could be so kind as to point out any obvious flaws in our
> current configuration while I'm trying to get a better description of
> the underlying problem.
>
> Postgres version: 8.4.6
>
> Number of logical CPUs: 16 (4x Quadcore Xeon E5520  @ 2.27GHz)
>
> RAM: 16GB
>
>             total       used       free     shared    buffers     cached
> Mem:      16461012   16399520      61492          0      72392   12546112
> -/+ buffers/cache:    3781016   12679996
> Swap:       999992     195336     804656

you have swap used, IO on the swap partition ?
can you paste the /proc/meminfo ?
Also turn on log_checkpoint if it is not already and check the
duration to write the data.

You didn't said the DB size (and size of active part of it), it would help here.

>
> HDD: 2x 120 GB OCZ Vertex 2 SSD; RAID 1
>
> Concurrent connections (according to our monitoring tool): 7 (min), 74
> (avg), 197 (max)
>
> Our config (all other settings at default value):
>
> max_connections = 200
> ssl = true
> shared_buffers = 4096MB
> work_mem = 256MB

it is too much with 200 connections. you may experiment case where you
try to use more than the memory available.
see http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
--> work_mem maintainance_work_mem

> maintenance_work_mem = 512MB

128MB is usualy enough

> synchronous_commit = off
> wal_buffers = 8MB

16MB should work well

> checkpoint_segments = 30
> checkpoint_timeout = 15min
> checkpoint_completion_target = 0.9
> random_page_cost = 2.0
> effective_cache_size = 8192MB

12-14GB looks better

> logging_collector = on
> log_directory = '/var/log/postgresql'
> log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
> log_min_duration_statement = 1000
> log_connections = on
> log_disconnections = on
> log_line_prefix = '%t '
> datestyle = 'iso, mdy'
> gin_fuzzy_search_limit = 10000

you use full_text_search ?

>
> The config options are a mix of the article "Configuring PostgreSQL
> for Pretty Good Performance" [1] and the talk "PostgreSQL as a secret
> weapon for high-performance Ruby on Rails applications" [2].

do you monitor the 'locks' ? and the commit/rollbacks  ?

>
> Thanks,
> Michael
>
> [1] http://www.linux.com/learn/tutorials/394523-configuring-postgresql-for-pretty-good-performance
> [2] http://www.pgcon.org/2010/schedule/events/210.en.html
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Re: High load,

From
Michael Kohl
Date:
Cédric, thanks a lot for your answer so far!

On Thu, Jan 27, 2011 at 12:24 PM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:

> you have swap used, IO on the swap partition ?

Memory-wise we are fine.

> can you paste the /proc/meminfo ?

Sure:

# cat /proc/meminfo
MemTotal:       16461012 kB
MemFree:          280440 kB
Buffers:           60984 kB
Cached:         13757080 kB
SwapCached:         6112 kB
Active:          7049744 kB
Inactive:        7716308 kB
Active(anon):    2743696 kB
Inactive(anon):  2498056 kB
Active(file):    4306048 kB
Inactive(file):  5218252 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:        999992 kB
SwapFree:         989496 kB
Dirty:              3500 kB
Writeback:             0 kB
AnonPages:        943752 kB
Mapped:          4114916 kB
Shmem:           4293312 kB
Slab:             247036 kB
SReclaimable:     212788 kB
SUnreclaim:        34248 kB
KernelStack:        3144 kB
PageTables:       832768 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     9230496 kB
Committed_AS:    5651528 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       51060 kB
VmallocChunk:   34350787468 kB
HardwareCorrupted:     0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        7936 kB
DirectMap2M:    16760832 kB

> Also turn on log_checkpoint if it is not already and check the
> duration to write the data.

Will do, thanks!

> You didn't said the DB size (and size of active part of it), it would help here.

=> select pg_size_pretty(pg_database_size('xxx'));
 pg_size_pretty
----------------
 32 GB
(1 row)

> it is too much with 200 connections. you may experiment case where you
> try to use more than the memory available.

So far memory never really was a problem, but I'll keep these
suggestions in mind.

> 16MB should work well

We already thought of increasing that, will do so now.

>> effective_cache_size = 8192MB
>
> 12-14GB looks better

Thank you, I was rather unsure on this on.

> you use full_text_search ?

Not anymore, probably a leftover.

> do you monitor the 'locks' ? and the commit/rollbacks  ?

No, but I'll look into doing that.

Thanks a lot for the feedback again,
Michael

Re: High load,

From
Cédric Villemain
Date:
2011/1/27 Michael Kohl <michael.kohl@tupalo.com>:
> Cédric, thanks a lot for your answer so far!
>
> On Thu, Jan 27, 2011 at 12:24 PM, Cédric Villemain
> <cedric.villemain.debian@gmail.com> wrote:
>
>> you have swap used, IO on the swap partition ?
>
> Memory-wise we are fine.
>
>> can you paste the /proc/meminfo ?
>
> Sure:
>
> # cat /proc/meminfo
> MemTotal:       16461012 kB
> MemFree:          280440 kB
> Buffers:           60984 kB
> Cached:         13757080 kB
> SwapCached:         6112 kB
> Active:          7049744 kB
> Inactive:        7716308 kB
> Active(anon):    2743696 kB
> Inactive(anon):  2498056 kB
> Active(file):    4306048 kB
> Inactive(file):  5218252 kB
> Unevictable:           0 kB
> Mlocked:               0 kB
> SwapTotal:        999992 kB
> SwapFree:         989496 kB
> Dirty:              3500 kB
> Writeback:             0 kB
> AnonPages:        943752 kB
> Mapped:          4114916 kB
> Shmem:           4293312 kB
> Slab:             247036 kB
> SReclaimable:     212788 kB
> SUnreclaim:        34248 kB
> KernelStack:        3144 kB
> PageTables:       832768 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:     9230496 kB

the commitlimit looks to low, it is because your swap partition is small.

You need to either enlarge the swap partition, or change the
vm.overcommit_ratio if you want to be able to use more of your mermory
 sanely.
(
see kernel/Documentation/filesystems/proc.txt for the explanations on
the formula :
CommitLimit = ('vm.overcommit_ratio' * Physical RAM) + Swap
)

> Committed_AS:    5651528 kB

this is way under CommitLimit so you are good. (it is rare to be
limited by that anyway, and your perf issues are not relative to that)

> VmallocTotal:   34359738367 kB
> VmallocUsed:       51060 kB
> VmallocChunk:   34350787468 kB
> HardwareCorrupted:     0 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
> DirectMap4k:        7936 kB
> DirectMap2M:    16760832 kB
>
>> Also turn on log_checkpoint if it is not already and check the
>> duration to write the data.
>
> Will do, thanks!
>
>> You didn't said the DB size (and size of active part of it), it would help here.
>
> => select pg_size_pretty(pg_database_size('xxx'));
>  pg_size_pretty
> ----------------
>  32 GB
> (1 row)
>
>> it is too much with 200 connections. you may experiment case where you
>> try to use more than the memory available.
>
> So far memory never really was a problem, but I'll keep these
> suggestions in mind.
>
>> 16MB should work well
>
> We already thought of increasing that, will do so now.
>
>>> effective_cache_size = 8192MB
>>
>> 12-14GB looks better
>
> Thank you, I was rather unsure on this on.
>
>> you use full_text_search ?
>
> Not anymore, probably a leftover.
>
>> do you monitor the 'locks' ? and the commit/rollbacks  ?
>
> No, but I'll look into doing that.

It may help to find what is the issue.

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Re: High load,

From
Andres Freund
Date:
On Thursday, January 27, 2011 12:24:10 PM Cédric Villemain wrote:
> > maintenance_work_mem = 512MB
> 128MB is usualy enough
Uhm, I don't want to be picky, but thats not really my experience. Sorts for
index creation are highly dependent on a high m_w_m. Quite regularly I find the
existing 1GB limit a probleme here...

Andres

Re: High load,

From
Michael Kohl
Date:
On Thu, Jan 27, 2011 at 1:30 PM, Justin Pitts <justinpitts@gmail.com> wrote:
> That is a foot-gun waiting to go off.

Thanks, I had already changed this after Cedric's mail.

>> HDD: 2x 120 GB OCZ Vertex 2 SSD; RAID 1
>> random_page_cost = 2.0
> I thought these drives were a lot better at random IO than this gives
> them credit for.

I'll look into that.

Thanks a lot,
Michael

Re: High load,

From
Cédric Villemain
Date:
2011/1/27 Andres Freund <andres@anarazel.de>:
> On Thursday, January 27, 2011 12:24:10 PM Cédric Villemain wrote:
>> > maintenance_work_mem = 512MB
>> 128MB is usualy enough
> Uhm, I don't want to be picky, but thats not really my experience. Sorts for
> index creation are highly dependent on a high m_w_m. Quite regularly I find the
> existing 1GB limit a probleme here...

That is right for index creation, but not for 'pure' maintenance
stuff. Once the database is running as usual, there is no really point
to give auto-vacuum or auto-analyze much more (depend on the raid card
memory too ...)

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Re: High load,

From
Andres Freund
Date:
On Thursday, January 27, 2011 02:23:48 PM Cédric Villemain wrote:
> 2011/1/27 Andres Freund <andres@anarazel.de>:
> > On Thursday, January 27, 2011 12:24:10 PM Cédric Villemain wrote:
> >> > maintenance_work_mem = 512MB
> >>
> >> 128MB is usualy enough
> >
> > Uhm, I don't want to be picky, but thats not really my experience. Sorts
> > for index creation are highly dependent on a high m_w_m. Quite regularly
> > I find the existing 1GB limit a probleme here...
>
> That is right for index creation, but not for 'pure' maintenance
> stuff. Once the database is running as usual, there is no really point
> to give auto-vacuum or auto-analyze much more (depend on the raid card
> memory too ...)
Even that I cannot agree with, sorry ;-). If you have a database with much
churn a high m_w_m helps to avoid multiple scans during vacuum of the database
because the amount of dead tuples doesn't fit m_w_m.

Andres

Re: High load,

From
Justin Pitts
Date:
> Number of logical CPUs: 16 (4x Quadcore Xeon E5520  @ 2.27GHz)
> RAM: 16GB
> Concurrent connections (according to our monitoring tool): 7 (min), 74
> (avg), 197 (max)

Your current issue may be IO wait, but a connection pool isn't far off
in your future either.

> max_connections = 200
> work_mem = 256MB

That is a foot-gun waiting to go off. If 32 queries manage to
simultaneously each need 256MB to sort, your cache is blown out and
the server is out of RAM. If your application is like most, you need a
huge work_mem for, maybe, 1% of your queries. You can request it high
on a per connection/per query basis for the queries that need it, and
set the default to a low, safe figure.

> HDD: 2x 120 GB OCZ Vertex 2 SSD; RAID 1
> random_page_cost = 2.0
I thought these drives were a lot better at random IO than this gives
them credit for. The are certainly no better at sequential IO than
(good) conventional drives. You might have a lot of room to turn this
down even smaller.

Re: High load,

From
Andy Colson
Date:
On 1/27/2011 4:31 AM, Michael Kohl wrote:
> Hi all,
>
> we are running a fairly big Ruby on Rails application on Postgres 8.4.
> Our traffic grew quite a bit lately, and since then we are facing DB
> performance issues. System load occasionally explodes (around 170
> yesterday on a 16 core system), which seems to be caused by disk I/O
> (iowait in our Munin graphs goes up significantly during these
> periods). At other times the laod stays rather low under pretty much
> the same circumstances.
>
> There are 6 application servers with 18 unicorns each, as well as 12
> beanstalk workers talking to the DB. I know the problem description is
> very vague, but so far we haven't consistently managed to reproduce
> the problem. Turning of the beanstalk workers usually leads to a great
> decreases in writes and system loads, but during yesterday's debugging
> session they obviously ran fine (thanks, Murphy).
>
> Below you'll find our system information and Postgres config, maybe
> someone could be so kind as to point out any obvious flaws in our
> current configuration while I'm trying to get a better description of
> the underlying problem.
>
<SNIP>

If the suggestions below are not enough, you might have to check some of
your sql statements and make sure they are all behaving.  You may not
notice a table scan when the user count is low, but you will when it
gets higher.

Have you run each of your queries through explain analyze lately?

Have you checked for bloat?

You are vacuuming/autovacuuming, correct?

-Andy

Re: High load,

From
Michael Kohl
Date:
On Thu, Jan 27, 2011 at 4:06 PM, Andy Colson <andy@squeakycode.net> wrote:
> Have you run each of your queries through explain analyze lately?

A code review including checking of queries is on our agenda.

> You are vacuuming/autovacuuming, correct?

Sure :-)

Thank you,
Michael

Re: High load,

From
Scott Marlowe
Date:
On Thu, Jan 27, 2011 at 8:09 AM, Michael Kohl <michael.kohl@tupalo.com> wrote:
> On Thu, Jan 27, 2011 at 4:06 PM, Andy Colson <andy@squeakycode.net> wrote:
>> Have you run each of your queries through explain analyze lately?
>
> A code review including checking of queries is on our agenda.

A good method to start is to log long running queries and then explain
analyze just them.

Re: High load,

From
Andy Colson
Date:
On 1/27/2011 9:09 AM, Michael Kohl wrote:
> On Thu, Jan 27, 2011 at 4:06 PM, Andy Colson<andy@squeakycode.net>  wrote:
>> Have you run each of your queries through explain analyze lately?
>
> A code review including checking of queries is on our agenda.
>
>> You are vacuuming/autovacuuming, correct?
>
> Sure :-)
>
> Thank you,
> Michael
>

Oh, also, when the box is really busy, have you watched vmstat to see if
you start swapping?

-Andy

Re: High load,

From
Stephen Frost
Date:
* Michael Kohl (michael.kohl@tupalo.com) wrote:
> HDD: 2x 120 GB OCZ Vertex 2 SSD; RAID 1

I'm amazed no one else has mentioned this yet, but you should look into
splitting your data and your WALs.  Obviously, having another set of
SSDs to put your WALs on would be ideal.

You should probably also be looking into adjustments to the background
writer.  It sounds like you're getting hit by large checkpoint i/o
(if you turn on logging of that, as someone else suggested, you'll be
able to corrollate the times), which can be helped by increasing the
amount of writing done between checkpoints, so that the checkpoints
aren't as big and painful.  That can be done by making the background
writer more aggressive.

    Thanks,

        Stephen

Attachment

Re: High load,

From
Scott Marlowe
Date:
On Thu, Jan 27, 2011 at 10:54 AM, Stephen Frost <sfrost@snowman.net> wrote:
> * Michael Kohl (michael.kohl@tupalo.com) wrote:
>> HDD: 2x 120 GB OCZ Vertex 2 SSD; RAID 1
>
> I'm amazed no one else has mentioned this yet, but you should look into
> splitting your data and your WALs.  Obviously, having another set of
> SSDs to put your WALs on would be ideal.

Actually spinning media would be a better choice.  A pair of fast
15krpm drives in a mirror will almost always outrun an SSD for
sequential write speed.  Even meh-grade 7200RPM SATA drives will win.

> You should probably also be looking into adjustments to the background
> writer.  It sounds like you're getting hit by large checkpoint i/o
> (if you turn on logging of that, as someone else suggested, you'll be
> able to corrollate the times), which can be helped by increasing the
> amount of writing done between checkpoints, so that the checkpoints
> aren't as big and painful.  That can be done by making the background
> writer more aggressive.

This++.  Increasing checkpoint segments can make a huge difference.
We run 64 segments in production and it's a world of difference from
the stock setting.

Re: High load,

From
Scott Marlowe
Date:
On Thu, Jan 27, 2011 at 10:20 AM, Andy Colson <andy@squeakycode.net> wrote:
> On 1/27/2011 9:09 AM, Michael Kohl wrote:
>>
>> On Thu, Jan 27, 2011 at 4:06 PM, Andy Colson<andy@squeakycode.net>  wrote:
>>>
>>> Have you run each of your queries through explain analyze lately?
>>
>> A code review including checking of queries is on our agenda.
>>
>>> You are vacuuming/autovacuuming, correct?
>>
>> Sure :-)
>>
>> Thank you,
>> Michael
>>
>
> Oh, also, when the box is really busy, have you watched vmstat to see if you
> start swapping?

Setting sysstat service to run so you can see what your disks were
doing in the last 7 days is useful too.  Makes it much easier to
figure things out afterwards when you have history of what has been
happening.

Re: High load,

From
Andres Freund
Date:
On Thursday, January 27, 2011 07:13:17 PM Scott Marlowe wrote:
> On Thu, Jan 27, 2011 at 10:54 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > * Michael Kohl (michael.kohl@tupalo.com) wrote:
> >> HDD: 2x 120 GB OCZ Vertex 2 SSD; RAID 1
> >
> > I'm amazed no one else has mentioned this yet, but you should look into
> > splitting your data and your WALs.  Obviously, having another set of
> > SSDs to put your WALs on would be ideal.
>
> Actually spinning media would be a better choice.  A pair of fast
> 15krpm drives in a mirror will almost always outrun an SSD for
> sequential write speed.  Even meh-grade 7200RPM SATA drives will win.
Unless he is bulk loading or running with synchronous_commit=off sequential
speed wont be the limit for WAL. The number of syncs will be the limit.

Andres

Re: High load,

From
"Ing. Marcos Ortiz Valmaseda"
Date:
Another advice is to look the presentation of Alexander Dymo, on the RailsConf2009 called: Advanced Performance
Optimizationof Rails Applications available on  
http://en.oreilly.com/rails2009/public/schedule/detail/8615
This talk are focused on Rails and PostgreSQL, based on the development of the Acunote ´s Project Management Platform

http://blog.pluron.com


----- Mensaje original -----
De: "Andy Colson" <andy@squeakycode.net>
Para: "Michael Kohl" <michael.kohl@tupalo.com>
CC: pgsql-performance@postgresql.org
Enviados: Jueves, 27 de Enero 2011 12:20:18 GMT -05:00 Región oriental EE. UU./Canadá
Asunto: Re: [PERFORM] High load,

On 1/27/2011 9:09 AM, Michael Kohl wrote:
> On Thu, Jan 27, 2011 at 4:06 PM, Andy Colson<andy@squeakycode.net>  wrote:
>> Have you run each of your queries through explain analyze lately?
>
> A code review including checking of queries is on our agenda.
>
>> You are vacuuming/autovacuuming, correct?
>
> Sure :-)
>
> Thank you,
> Michael
>

Oh, also, when the box is really busy, have you watched vmstat to see if
you start swapping?

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

--
Ing. Marcos Luís Ortíz Valmaseda
System Engineer -- Database Administrator

Centro de Tecnologías de Gestión de Datos (DATEC)
Universidad de las Ciencias Informáticas
http://postgresql.uci.cu


Re: High load,

From
Michael Kohl
Date:
On Thu, Jan 27, 2011 at 6:05 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> A good method to start is to log long running queries and then explain
> analyze just them.

We are already doing the logging part, we are just a bit behind on the
"explain analyze" part of things. One day soon...

Thanks,
Michael

Re: High load,

From
Mladen Gogala
Date:
Michael Kohl wrote:
> We are already doing the logging part, we are just a bit behind on the
> "explain analyze" part of things. One day soon...
>
>
There is, of course, the auto_explain module which will do that for you.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


Re: High load,

From
Ivan Voras
Date:
On 27/01/2011 11:31, Michael Kohl wrote:
> Hi all,
>
> we are running a fairly big Ruby on Rails application on Postgres 8.4.
> Our traffic grew quite a bit lately, and since then we are facing DB
> performance issues. System load occasionally explodes (around 170
> yesterday on a 16 core system), which seems to be caused by disk I/O
> (iowait in our Munin graphs goes up significantly during these
> periods). At other times the laod stays rather low under pretty much
> the same circumstances.

Is there any way you can moderate the number of total active connections
to the database to approximately match the number of (logical) CPU cores
on your system? I.e. some kind of connection pool or connection
limiting? This should help you in more ways than one (limit PG lock
contention, limit parallel disk IO).


Re: High load,

From
Robert Haas
Date:
On Thu, Jan 27, 2011 at 5:31 AM, Michael Kohl <michael.kohl@tupalo.com> wrote:
> we are running a fairly big Ruby on Rails application on Postgres 8.4.
> Our traffic grew quite a bit lately, and since then we are facing DB
> performance issues. System load occasionally explodes (around 170
> yesterday on a 16 core system), which seems to be caused by disk I/O
> (iowait in our Munin graphs goes up significantly during these
> periods). At other times the laod stays rather low under pretty much
> the same circumstances.
[...]
> [1] http://www.linux.com/learn/tutorials/394523-configuring-postgresql-for-pretty-good-performance
> [2] http://www.pgcon.org/2010/schedule/events/210.en.html

At the risk of shameless self-promotion, you might also find this helpful:

http://rhaas.blogspot.com/2010/12/troubleshooting-database.html

It's fairly basic but it might at least get you pointed in the right
direction...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: High load,

From
Greg Smith
Date:
Michael Kohl wrote:
> HDD: 2x 120 GB OCZ Vertex 2 SSD; RAID 1
>

As a general warning here, as far as I know the regular Vertex 2 SSD
doesn't cache writes properly for database use.  It's possible to have a
crash that leaves the database corrupted, if the drive has writes queued
up in its cache.  The Vertex 2 Pro resolves this issue with a supercap,
you may have a model with concerns here.  See
http://wiki.postgresql.org/wiki/Reliable_Writes for more information.

In addition to the log_checkpoints suggestion already made, I'd also
recommend turning on log_lock_waits and log_temp_files on your server.
All three of those--checkpoints, locks, and unexpected temp file
use--can cause the sort of issue you're seeing.  Well, not locks so much
given you're seeing heavy disk I/O, but it's good to start logging those
issues before they get bad, too.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: High load,

From
Jignesh Shah
Date:
On Thu, Jan 27, 2011 at 6:36 AM, Michael Kohl <michael.kohl@tupalo.com> wrote:
> Cédric, thanks a lot for your answer so far!
>
> On Thu, Jan 27, 2011 at 12:24 PM, Cédric Villemain
> <cedric.villemain.debian@gmail.com> wrote:
>
>> you have swap used, IO on the swap partition ?
>
> Memory-wise we are fine.
>
>> can you paste the /proc/meminfo ?
>
> Sure:
>
> # cat /proc/meminfo
> MemTotal:       16461012 kB
> MemFree:          280440 kB
> Buffers:           60984 kB
> Cached:         13757080 kB
> SwapCached:         6112 kB
> Active:          7049744 kB
> Inactive:        7716308 kB
> Active(anon):    2743696 kB
> Inactive(anon):  2498056 kB
> Active(file):    4306048 kB
> Inactive(file):  5218252 kB
> Unevictable:           0 kB
> Mlocked:               0 kB
> SwapTotal:        999992 kB
> SwapFree:         989496 kB
> Dirty:              3500 kB
> Writeback:             0 kB
> AnonPages:        943752 kB
> Mapped:          4114916 kB
> Shmem:           4293312 kB
> Slab:             247036 kB
> SReclaimable:     212788 kB
> SUnreclaim:        34248 kB
> KernelStack:        3144 kB
> PageTables:       832768 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:     9230496 kB
> Committed_AS:    5651528 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:       51060 kB
> VmallocChunk:   34350787468 kB
> HardwareCorrupted:     0 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
> DirectMap4k:        7936 kB
> DirectMap2M:    16760832 kB
>
>> Also turn on log_checkpoint if it is not already and check the
>> duration to write the data.
>
> Will do, thanks!
>
>> You didn't said the DB size (and size of active part of it), it would help here.
>
> => select pg_size_pretty(pg_database_size('xxx'));
>  pg_size_pretty
> ----------------
>  32 GB
> (1 row)
>


Here I am still a big fan of setting
shared_buffers=8GB

for dbsize of 32GB that is a 25% in bufferpool ration
effective cache  size then will be more like 8GB.

The only time this will hurt is you have more sequential access than
random which wont be populated in the shared_buffer but chances of
that being the problem is lowered with your random_page_cost set to
2.0 or lower.

Also I am a big fan of separating the WAL and data separately which
gives two advantages and monitoring the IO that way so you know where
your IO are coming from.. WAL or DATA  and then further tuning can be
done according to what you see.

Also SSDs sometimes have trouble with varying sizes of WAL writes so
response times for WAL writes varies quite a bit and can confuse SSDs.

-Jignesh


>> it is too much with 200 connections. you may experiment case where you
>> try to use more than the memory available.
>
> So far memory never really was a problem, but I'll keep these
> suggestions in mind.
>
>> 16MB should work well
>
> We already thought of increasing that, will do so now.
>
>>> effective_cache_size = 8192MB
>>
>> 12-14GB looks better
>
> Thank you, I was rather unsure on this on.
>
>> you use full_text_search ?
>
> Not anymore, probably a leftover.
>
>> do you monitor the 'locks' ? and the commit/rollbacks  ?
>
> No, but I'll look into doing that.
>
> Thanks a lot for the feedback again,
> Michael
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>