Re: CPU spikes and transactions - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: CPU spikes and transactions
Date
Msg-id 525DB79B.7050406@fuzzy.cz
Whole thread Raw
In response to Re: CPU spikes and transactions  (Tony Kay <tony@teamunify.com>)
List pgsql-performance
On 15.10.2013 21:07, Tony Kay wrote:
>
> On Tue, Oct 15, 2013 at 10:26 AM, Julien Cigar <jcigar@ulb.ac.be
> <mailto:jcigar@ulb.ac.be>> wrote:
>
>
> for two reasons:
>
> - some of the overhead of bgwriter and checkpoints is more or less
> linear in the size of shared_buffers, for example it could be
> possible that a large quantity of data could be dirty when a
> checkpoint occurs).
>
> - the OS cache is also being used for reads and writes, the larger
> shared_buffers is, the more you risk double buffering (same blocks
> in the OS cache and in the database buffer cache).
>
>
> Excellent. Thank you for the information. My suspicion has always
> been that the shared_buffers are "level 1 cache", so it seems to me
> that you'd want that to be large enough to hold your entire database
> if you could. However, I'm now realizing that I was _also_ assuming
> the IPC shared memory was also being locked via mlock to prevent
> swapping it out, and I'm now getting the feeling that this isn't
> true, which means the double buffering could lead to swap space use
> on buffer cache pressure...which I do occasionally see in ways I had
> not expected.
>
> We do streaming replication and also store them for snapshot PITR
> backups, so I am intimately familiar with our write load, and I can
> say it is pretty low (we ship a 16MB WAL file about every 10-15
> minutes during our busiest times).
>
> That said, I can see how an import that is doing a bunch of writes
> could possibly spread those over a large area that would then
> consume a lot of CPU on the writer and checkpoint; however, I do not
> see how either of those would cause 40-60 different postgres
> backgroud processes (all running a normally "light query") to spin
> off into oblivion unless the write work load is somehow threaded into
> the background workers (which I'm sure it isn't). So, I think we're
> still dealing with a spinlock issue.
>
> We're going to upgrade to 9.1.10 (with debug symbols) Thursday night
> and add another 64GB of RAM. I'll tune shared_buffers down to 2GB at
> that time and bump effective_cache_size up at the same time. My
> large history of sar data will make it apparent pretty quickly if
> that is a win/lose/tie.

Don't be too aggressive, though. You haven't identified the bottleneck
yet and 2GB might be too low. For example we're running 9.1.x too, and
we're generally quite happy with 10GB shared buffers (on machines with
96GB of RAM). So although 22GB is definitely too much, but 2GB might be
too low, especially if you add more RAM into the machine.

What you may do is inspect the buffer cache with this contrib module:

  http://www.postgresql.org/docs/9.1/interactive/pgbuffercache.html

Doing something as simple as this:

  select (reldatabase is not null), count(*)
    from pg_buffercache group by 1;

  select usagecount, count(*)
    from pg_buffercache where reldatabase is not null group by 1;

  select isdirty, count(*)
    from pg_buffercache where reldatabase is not null group by 1;

should tell you some very basic metrics, i.e. what portion of buffers
you actually use, what is the LRU usage count histogram and what portion
of shared buffers is dirty.

Again, you'll have to run this repeatedly during the import job, to get
an idea of what's going on and size the shared buffers reasonably for
your workload.

Be careful - this needs to acquire some locks to get a consistent
result, so don't run that too frequently.

Tomas


pgsql-performance by date:

Previous
From: sparikh
Date:
Subject: Hot Standby performance issue
Next
From: Merlin Moncure
Date:
Subject: Re: CPU spikes and transactions