Thread: Bgwriter and pg_stat_bgwriter.buffers_clean aspects

Bgwriter and pg_stat_bgwriter.buffers_clean aspects

From
"Dmitry Koterov"
Date:
Hello.

(Sorry, I have sent this letter to pgsql-general@ first and only then - noticed that there is a special performance mailing list. So I post it here now.)

I am trying to tune PostgreSQL write parameters to make writing operation fast on a heavy-loaded database (a lot of inserts/updates).
After resetting the pg_stat_bgwriter statistics (I do it by deleting global/pgstat.stat file and restarting PostgreSQL) I monitor the following:

# select * from pg_stat_bgwriter;
 checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc
-------------------+-----------------+--------------------+---------------+------------------+-----------------+---------------
                 8 |               0 |              19092 |             0 |                0 |            2285 |         30148

So, you see that some time after resetting the statistics we have:
- a large value in buffers_backend;
- a zero buffers_clean.says that "buffers_backend ... [is a number of]  times a database backend (probably the client itself) had to write a page in order to make space for the new allocation", and "buffers_clean ... [means that] the background writer cleaned ... buffers (cleaned=wrote out dirty ones) during that time".

What I am trying to achieve is that all writing operation are performed asynchronously and mostly flushed to the disk before a CHECKPOINT occurred, so CHECKPOINT is cheap thanks to bgwiter work.

Could you please explain what happened and what large buffers_backend and zero buffers_clean mean?


Related parameters:

shared_buffers = 512MB
fsync = on             
synchronous_commit = off
wal_writer_delay = 2000ms
checkpoint_segments = 20         
checkpoint_timeout = 1min        
checkpoint_completion_target = 0.8
checkpoint_warning = 1min        
bgwriter_delay = 10ms      
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 10


Re: Bgwriter and pg_stat_bgwriter.buffers_clean aspects

From
Greg Smith
Date:
On Fri, 26 Dec 2008, Dmitry Koterov wrote:

> checkpoint_timeout = 1min

Your system is having a checkpoint every minute.  You can't do that and
expect the background writer to do anything useful.  As shown in your
stats, all the dirty buffers are getting written out by those constant
checkpoints.

> What I am trying to achieve is that all writing operation are performed
> asynchronously and mostly flushed to the disk before a CHECKPOINT
> occurred, so CHECKPOINT is cheap thanks to bgwiter work.

The background writer only tries to write out things that haven't been
accessed recently, because the tests we did suggested the duplicated
writes from any other approach negated the benefits from writing them
earlier.  So it's not possible to get all the buffers clean before the
checkpoint starts, the ones that have been recently used can't get written
except during a checkpoint.

What does work instead is to spread the checkpoint writes over a long
period, such that they are an asynchronous trickle of smaller writes.
For that to work, you need to set checkpoint_timeout to a fairly long
period (at least the default of 5 minutes if not longer) and
checkpoint_segments to something fairly large.  You can know the segments
are large enough when most of the checkpoints show up in the
checkpoints_timed count.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Bgwriter and pg_stat_bgwriter.buffers_clean aspects

From
"Dmitry Koterov"
Date:
Thank you.

But why buffers_backend is so high? As I understood from your article, buffers_backend shows the number of writes immediately caused by any write operations, e.g. when an INSERT has to flush something on disk, because it has no space left for a new data in shared buffers. I suppose these flushes slow down operating greatly, and I realy see this: in my environment INSERT is usually performed in 1-2 ms, but sometimes it is executed in 5-6 seconds or even more (10 seconds), which touches statement_timeout barrier and fails the whole transaction.

The main purpose is to minimize INSERT/UPDATE time or, at least, make it more predictable.

Could you please give an advice how to achieve this?


On Mon, Dec 29, 2008 at 1:04 AM, Greg Smith <gsmith@gregsmith.com> wrote:
On Fri, 26 Dec 2008, Dmitry Koterov wrote:

checkpoint_timeout = 1min

Your system is having a checkpoint every minute.  You can't do that and expect the background writer to do anything useful.  As shown in your stats, all the dirty buffers are getting written out by those constant checkpoints.


What I am trying to achieve is that all writing operation are performed asynchronously and mostly flushed to the disk before a CHECKPOINT occurred, so CHECKPOINT is cheap thanks to bgwiter work.

The background writer only tries to write out things that haven't been accessed recently, because the tests we did suggested the duplicated writes from any other approach negated the benefits from writing them earlier.  So it's not possible to get all the buffers clean before the checkpoint starts, the ones that have been recently used can't get written except during a checkpoint.

What does work instead is to spread the checkpoint writes over a long period, such that they are an asynchronous trickle of smaller writes. For that to work, you need to set checkpoint_timeout to a fairly long period (at least the default of 5 minutes if not longer) and checkpoint_segments to something fairly large.  You can know the segments are large enough when most of the checkpoints show up in the checkpoints_timed count.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Bgwriter and pg_stat_bgwriter.buffers_clean aspects

From
Greg Smith
Date:
On Tue, 6 Jan 2009, Dmitry Koterov wrote:

> But why buffers_backend is so high? As I understood from your article,
> buffers_backend shows the number of writes immediately caused by any write
> operations, e.g. when an INSERT has to flush something on disk, because it
> has no space left for a new data in shared buffers. I suppose these flushes
> slow down operating greatly

In normal operation, those writes are cached by the operating system, such
that most backend writes will return very quickly.

> I realy see this: in my environment INSERT is usually performed in 1-2
> ms, but sometimes it is executed in 5-6 seconds or even more (10
> seconds)

When activity blocks like this, the most likely cause is because
everything is blocked waiting for the fsync at the end of a checkpoint
that forces all writes out to disk.  The only good way to make that go
away is to spread the checkpoint over a long period of time.  Your
configuration is forcing such a syncronization every minute, which makes
that sort of blocking more likely to happen, merely because there so many
chances for it.

Standard good practice here for 8.3 is to set checkpoint_timeout and
checkpoint_segments to as high as you can stand, where the downsides to
increasing them is that more disk space is wasted and recovery time goes
up.  I think you're chasing after the wrong cause here and ignoring the
obvious one.  Backend writes should not cause a long stall, and tuning up
the background writer to the extreme you have is counterproductive (all
your bgwriter_* parameters would be far better off at the default than the
extremely aggressive ones you've set them to).  Meanwhile, reducing
checkpoint_timeout can absolutely cause what you're seeing.

One other thing:  if this is a Linux system running a kernel before 2.6.22
and you have a lot of RAM, there's a known problem with that combination
that can cause writes to hang for a long time.  I've got a long article
about it http://www.westnet.com/~gsmith/content/linux-pdflush.htm and a
quicker run through identifying if you're running into issue and resolving
it at
http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD